| Author |
Topic  |
|
AskSQLTeam
Ask SQLTeam Question
USA
0 Posts |
|
|
Anonymous
Starting Member
0 Posts |
|
|
Anonymous
Starting Member
0 Posts |
|
|
Anonymous
Starting Member
0 Posts |
|
|
graz
Chief SQLTeam Crack Dealer
USA
4128 Posts |
|
|
nr
SQLTeam MVY
United Kingdom
12543 Posts |
|
|
nr
SQLTeam MVY
United Kingdom
12543 Posts |
Posted - 04/09/2001 : 06:28:08
|
| Some more ways of deleting duplicates. These are some ways of deleting dups. Some are better than others depending on: size of table. whether system can be brought down to carry out the action Space available Number of duplicates How much time is available - both to carry out the action and to write the script to do it.
I create any temp table by a select into but this should be replaced by a create statement. I have also tried to place transactions in the correct place but have omitted error processing. To have less impact on a live system a pause could be put in the loops
(This comes without any guarantee but should be close)
drop table #a go create table #a (i int, j int, k int)
insert #a select 1,1,1 insert #a select 1,1,1 insert #a select 1,1,1 insert #a select 1,1,1 insert #a select 2,1,1 insert #a select 2,1,1 insert #a select 2,2,1 insert #a select 2,2,2 insert #a select 2,2,3 insert #a select 2,2,4 insert #a select 3,3,3 insert #a select 3,3,3 insert #a select 3,3,3
1. simple table recreate select * into #b from #a where 1 = 0 insert #b select distinct * from #a begin tran delete #a insert #a select * from #b commit tran drop table #b
2. delete and replace duplicates select * into #b from #a where 1 = 0 insert #b select i,j,k from #a group by i,j,k having count(*) > 1 begin tran delete #a from #b where #a.i = #b.i and #a.j = #b.j and #a.k = #b.k insert #a select * from #b commit tran drop table #b
3. delete duplicates one by one leaving single row set rowcount 1 select 1 while @@rowcount > 0 delete #a where 1 < (select count(*) from #a a2 where #a.i = a2.i and #a.j = a2.j and #a.k = a2.k) set rowcount 0
4. delete all duplicates for one row value at a time select *, cnt = 0 into #b from #a where 1 = 0 declare @rowcount int select 1 while @@rowcount <> 0 begin insert #b (i,j,k,cnt) select top 1 i,j,k, count(*) - 1 from #a group by i,j,k having count(*) > 1 select @rowcount = cnt from #b set rowcount @rowcount delete |
 |
|
|
nr
SQLTeam MVY
United Kingdom
12543 Posts |
Posted - 04/09/2001 : 06:29:24
|
| Continued 4. delete all duplicates for one row value at a time select *, cnt = 0 into #b from #a where 1 = 0 declare @rowcount int select 1 while @@rowcount <> 0 begin insert #b (i,j,k,cnt) select top 1 i,j,k, count(*) - 1 from #a group by i,j,k having count(*) > 1 select @rowcount = cnt from #b set rowcount @rowcount delete #a from #b where #a.i = #b.i and #a.j = #b.j and #a.k = #b.k set rowcount 0 delete #b end drop table #b
5. similar to above but all rows to delete gathered at beginning select *, cnt = 0, identity(int,1,1) as id into #b from #a where 1 = 0 insert #b (i,j,k,cnt) select i,j,k, count(*) - 1 from #a group by i,j,k having count(*) > 1 declare @id int, @rowcount int while exists (select * from #b) begin select @id = min(id) from #b select @rowcount = cnt from #b set rowcount @rowcount delete #a from #b where #a.i = #b.i and #a.j = #b.j and #a.k = #b.k and #b.id = @id set rowcount 0 delete #b where id = @id end drop table #b
|
 |
|
|
brianegge
Starting Member
USA
1 Posts |
Posted - 06/01/2001 : 10:35:56
|
Here's how I solve this problem. If the table does not have an identity column, add one, so all the rows are numbered. Here's an example script:
Create table ##Test (a int not null, b int not null, c int not null, id int not null identity) on [Primary] GO INSERT INTO ##Test (A,B,C) VALUES (1,1,1) INSERT INTO ##Test (A,B,C) VALUES (1,1,1) INSERT INTO ##Test (A,B,C) VALUES (1,1,1)
INSERT INTO ##Test (A,B,C) VALUES (1,2,3) INSERT INTO ##Test (A,B,C) VALUES (1,2,3) INSERT INTO ##Test (A,B,C) VALUES (1,2,3)
INSERT INTO ##Test (A,B,C) VALUES (4,5,6) GO Select * from ##Test GO Delete from ##Test where id < (Select Max(id) from ##Test t where ##Test.a = t.a and ##Test.b = t.b and ##Test.c = t.c) GO Select * from ##Test GO
---- I think this is the fastest and simpliest method.
|
 |
|
|
aiken
Aged Yak Warrior
USA
525 Posts |
Posted - 02/08/2002 : 13:14:48
|
And then there's the dynamic SQL approach which uses no temp tables or identity keys. Probably not great for thousands or millions of duplicates, but very nice for trimming those pesky hundred duplicates from a million row table.
Here's the query as it would be for the demo DB in the article:
select 'delete from dup_authors where au_id= (select top 1 au_id from dup_authors where au_lname=' + au_lname + ' and au_fname=' + au_fname + ' and city=' + city + ' and state=' + state + ')' from dup_authors group by by au_lname, au_fname, city, state having count(*) > 1
...each time it runs it will delete one of the duplicates; you have to run it a couple of times if one row is duplicated multiple times.
Cheers -b
|
 |
|
|
MakeYourDaddyProud
Posting Yak Master
United Kingdom
184 Posts |
Posted - 05/30/2002 : 11:12:15
|
-- delete dups i have done this b4
-- 1) assuming table has a unique identifier preferably keyed. -- to delete all duplicate names keeping the one with highest ID
create table #Namez (id int primary key, name char(10) not null) go
insert into #Namez values (1, "Mom") insert into #Namez values (2, "Mom") insert into #Namez values (3, "Sister") insert into #Namez values (4, "Sister") insert into #Namez values (5, "Sister") insert into #Namez values (6, "Sister") insert into #Namez values (7, "Sister") insert into #Namez values (8, "Sister") insert into #Namez values (9, "Dad") go
select * from #Namez go
DELETE #Namez FROM #Namez JOIN (select [name], max([id]) AS MaxID from #Namez group by [Name]) AS G ON G.[Name] = #Namez.[Name] WHERE #Namez.[ID] < G.[MaxID] AND #Namez.[Name] = G.[Name]
select * from #Namez go
The correlated subquery agreggate establishes the link id in which to reference grouped deletes on name with a lesser id
HTH
Daniel Small CEO www.danielsmall.com
|
 |
|
|
karbon
Starting Member
9 Posts |
Posted - 12/09/2003 : 10:34:34
|
Why do you use long way????
Easy way ------------------ Example table ------------------ Create table EG ( ID int, Value1 int, Value2 int )
declare @ID int declare @Count integer declare CursorDuplicates Cursor for SELECT ID FROM EG open CursorDuplicates fetch next from CursorDuplicates into @ID while @@fetch_status=0 begin select @Count = count(ID) from EG where ID = @ID if @Count > 1 begin DELETE EG WHERE CURRENT OF CursorDuplicates end fetch next from CursorDuplicates into @ID end close CursorDuplicates deallocate CursorDuplicates |
 |
|
|
robvolk
Most Valuable Yak
USA
15559 Posts |
Posted - 12/09/2003 : 23:57:16
|
You are kidding us, right?
Or do you prefer to make 3 left turns instead of one right turn? |
 |
|
|
byrmol
Shed Building SQL Farmer
Australia
1591 Posts |
Posted - 12/10/2003 : 00:07:30
|
WHERE CURRENT OF
I vote this the "Worst Operator ever implemented"
DavidM
"SQL-3 is an abomination.." |
 |
|
|
Deep Blue
Starting Member
1 Posts |
Posted - 06/28/2004 : 13:09:42
|
Hi!
The query helped me a bit. I've just rewritten it like this and works better in my case (my table having dups already has an autonumeric Primary Key)
delete dim_zonas_venta where zona_venta_key not in ( select max(zona_venta_key) as zona_venta_key from dim_zonas_venta group by cliente_key, mercado_grupo_key, cod_zona_venta, zona_venta )
|
 |
|
|
Deep Blue
Starting Member
1 Posts |
Posted - 06/28/2004 : 13:13:39
|
Hi!
The query helped me a bit. I've just rewritten it like this and works better in my case (my table having dups already has an autonumeric Primary Key)
delete dim_zonas_venta where zona_venta_key not in ( select max(zona_venta_key) as zona_venta_key from dim_zonas_venta group by cliente_key, mercado_grupo_key, cod_zona_venta, zona_venta )
|
 |
|
|
darrendorlando
Starting Member
1 Posts |
Posted - 05/09/2005 : 18:18:43
|
http://com-hpdevelopersolutions-2s.wwwa.com/members/remoteLoginCheck.cfm?id=MTMzMDAsYmxhY2tmaW4sL21lbWJlcnMvZXF1aXBtZW50L2luZGV4LmNmbQ_
This works for me and is way more efficient! Give'r a try, but remember to back up first and test the results to make sure you what you are expecting happens!
DELETE FROM [Table with Duplicates] WHERE [Primary Key Field] IN ( SELECT a.[Primary Key Field] FROM [Table with Duplicates] a, [Table with Duplicates] b WHERE a.[Primary Key Field]!= b.[Primary Key Field] -- i.e. Userkey AND a.[Value to check]= b.[Value to Check] -- i.e. Lastname AND a.[Second Value to Check] = b.[Second Value to Check] -- i.e. Firstname AND a.[Primary Key Field] < b.[Primary Key Field] -- i.e. Userkey )
dd ddorlando@gmail.com
|
 |
|
|
rosieq13
Starting Member
1 Posts |
Posted - 06/14/2005 : 16:37:29
|
Your instructions on Deleting Duplicate Records are just great. I followed them and was able to accomplish the deletion. Usually I don't rate articles, but I cannot help saying Thanks to this one. |
 |
|
|
speedadkt
Starting Member
4 Posts |
Posted - 11/18/2005 : 16:18:21
|
This is by far the most straight forward, efficient method of the bunch. Works for any number of dupes too which is great.
quote: Originally posted by darrendorlando
http://com-hpdevelopersolutions-2s.wwwa.com/members/remoteLoginCheck.cfm?id=MTMzMDAsYmxhY2tmaW4sL21lbWJlcnMvZXF1aXBtZW50L2luZGV4LmNmbQ_
This works for me and is way more efficient! Give'r a try, but remember to back up first and test the results to make sure you what you are expecting happens!
DELETE FROM [Table with Duplicates] WHERE [Primary Key Field] IN ( SELECT a.[Primary Key Field] FROM [Table with Duplicates] a, [Table with Duplicates] b WHERE a.[Primary Key Field]!= b.[Primary Key Field] -- i.e. Userkey AND a.[Value to check]= b.[Value to Check] -- i.e. Lastname AND a.[Second Value to Check] = b.[Second Value to Check] -- i.e. Firstname AND a.[Primary Key Field] < b.[Primary Key Field] -- i.e. Userkey )
dd ddorlando@gmail.com
|
 |
|
|
Merkwurdigliebe
Starting Member
1 Posts |
Posted - 07/11/2006 : 17:47:17
|
And this is the only example I have found in several hours of looking that works with non-trivial tables and data. How often do you need to clean up a two-column table with an identity column and no PK, anyway?
Thanks for the clear, logical, and concise code!
I was able to use this to clean up a mess in a table with a three-way compound key comprised of GUID's (uniquidentifiers) that needed a dozen casts in other examples to even think about running. (and still bombed)
quote: Originally posted by darrendorlando
http://com-hpdevelopersolutions-2s.wwwa.com/members/remoteLoginCheck.cfm?id=MTMzMDAsYmxhY2tmaW4sL21lbWJlcnMvZXF1aXBtZW50L2luZGV4LmNmbQ_
This works for me and is way more efficient! Give'r a try, but remember to back up first and test the results to make sure you what you are expecting happens!
DELETE FROM [Table with Duplicates] WHERE [Primary Key Field] IN ( SELECT a.[Primary Key Field] FROM [Table with Duplicates] a, [Table with Duplicates] b WHERE a.[Primary Key Field]!= b.[Primary Key Field] -- i.e. Userkey AND a.[Value to check]= b.[Value to Check] -- i.e. Lastname AND a.[Second Value to Check] = b.[Second Value to Check] -- i.e. Firstname AND a.[Primary Key Field] < b.[Primary Key Field] -- i.e. Userkey )
dd ddorlando@gmail.com
|
 |
|
|
Kristen
Test
United Kingdom
22191 Posts |
|
Topic  |
|
|
|