SQL Server Forums
Profile | Register | Active Topics | Members | Search | Forum FAQ
 
Register Now and get your question answered!
Username:
Password:
Save Password
Forgot your Password?

 All Forums
 Site Related Forums
 Article Discussion
 Article: Deleting Duplicate Records
 New Topic  Reply to Topic
 Printer Friendly
Next Page
Author Previous Topic Topic Next Topic
Page: of 3

AskSQLTeam
Ask SQLTeam Question

USA
0 Posts

Posted - 03/25/2001 :  13:44:22  Show Profile  Visit AskSQLTeam's Homepage  Reply with Quote
Seema writes "There is a Table with no key constraints. It has duplicate records. The duplicate records have to be deleted (eg there are 3 similar records, only 2 have to be deleted). I need a single SQL query for this." This is a pretty common question so I thought I'd provide some options.

Article Link.

Anonymous
Starting Member

0 Posts

Posted - 03/26/2001 :  10:32:55  Show Profile  Reply with Quote
Quicker way to delete dupes

A less elegant way to clean up duplicate rows would be to use the T-SQL SET ROWCOUNT command. If you have 3 duplicate rows and want to delete 2 of them, SET ROWCOUNT 2 combined with a proper WHERE clause will clean up the 2 additional rows. Then a SET ROWCOUNT 0 (Zero) will turn the option off.

As always, it is a great idea to backup you data before you start deleting rows.

-- Donald

Go to Top of Page

Anonymous
Starting Member

0 Posts

Posted - 03/31/2001 :  00:47:55  Show Profile  Reply with Quote
Another solution to deleting two records from a table having three duplicate records with no constra

Delete top 2 from table where [Condition]

Go to Top of Page

Anonymous
Starting Member

0 Posts

Posted - 04/07/2001 :  22:48:18  Show Profile  Reply with Quote
Delete duplicates

This is a question I usually ask interviewees.
There are several ways to do this - the best way depends on the size, number of dups, how much the database is acessed...

The solution given is only really the best for a small off-line table imho, but I would expect every candidate to at least come up with this as a starting point.

Go to Top of Page

graz
Chief SQLTeam Crack Dealer

USA
4137 Posts

Posted - 04/08/2001 :  00:29:22  Show Profile  Visit graz's Homepage  Reply with Quote
Is it fixed?

This shouldn't allow anonymous comments anymore. Of course I already tried this once . . .

Go to Top of Page

nr
SQLTeam MVY

United Kingdom
12543 Posts

Posted - 04/08/2001 :  10:51:50  Show Profile  Visit nr's Homepage  Reply with Quote
Delete dups

I'll probably get round to posting some other methods here later when I have time.
Some are a bit complicated and need testing because I usually get them wrong first time.

Go to Top of Page

nr
SQLTeam MVY

United Kingdom
12543 Posts

Posted - 04/09/2001 :  06:28:08  Show Profile  Visit nr's Homepage  Reply with Quote
Some more ways of deleting duplicates.

These are some ways of deleting dups.
Some are better than others depending on:
size of table.
whether system can be brought down to carry out the action
Space available
Number of duplicates
How much time is available - both to carry out the action and to write the script to do it.

I create any temp table by a select into but this should be replaced by a create statement.
I have also tried to place transactions in the correct place but have omitted error processing.
To have less impact on a live system a pause could be put in the loops

(This comes without any guarantee but should be close)

drop table #a
go
create table #a (i int, j int, k int)

insert #a select 1,1,1
insert #a select 1,1,1
insert #a select 1,1,1
insert #a select 1,1,1
insert #a select 2,1,1
insert #a select 2,1,1
insert #a select 2,2,1
insert #a select 2,2,2
insert #a select 2,2,3
insert #a select 2,2,4
insert #a select 3,3,3
insert #a select 3,3,3
insert #a select 3,3,3

1. simple table recreate
select * into #b from #a where 1 = 0
insert #b select distinct * from #a
begin tran
delete #a
insert #a select * from #b
commit tran
drop table #b

2. delete and replace duplicates
select * into #b from #a where 1 = 0
insert #b select i,j,k from #a group by i,j,k having count(*) > 1
begin tran
delete #a from #b where #a.i = #b.i and #a.j = #b.j and #a.k = #b.k
insert #a select * from #b
commit tran
drop table #b

3. delete duplicates one by one leaving single row
set rowcount 1
select 1
while @@rowcount > 0
delete #a
where 1 < (select count(*) from #a a2 where #a.i = a2.i and #a.j = a2.j and #a.k = a2.k)
set rowcount 0

4. delete all duplicates for one row value at a time
select *, cnt = 0 into #b from #a where 1 = 0
declare @rowcount int
select 1
while @@rowcount <> 0
begin
insert #b (i,j,k,cnt) select top 1 i,j,k, count(*) - 1 from #a group by i,j,k having count(*) > 1
select @rowcount = cnt from #b
set rowcount @rowcount
delete

Go to Top of Page

nr
SQLTeam MVY

United Kingdom
12543 Posts

Posted - 04/09/2001 :  06:29:24  Show Profile  Visit nr's Homepage  Reply with Quote
Continued


4. delete all duplicates for one row value at a time
select *, cnt = 0 into #b from #a where 1 = 0
declare @rowcount int
select 1
while @@rowcount <> 0
begin
insert #b (i,j,k,cnt) select top 1 i,j,k, count(*) - 1 from #a group by i,j,k having count(*) > 1
select @rowcount = cnt from #b
set rowcount @rowcount
delete #a
from #b
where #a.i = #b.i and #a.j = #b.j and #a.k = #b.k
set rowcount 0
delete #b
end
drop table #b

5. similar to above but all rows to delete gathered at beginning
select *, cnt = 0, identity(int,1,1) as id into #b from #a where 1 = 0
insert #b (i,j,k,cnt) select i,j,k, count(*) - 1 from #a group by i,j,k having count(*) > 1
declare @id int, @rowcount int
while exists (select * from #b)
begin
select @id = min(id) from #b
select @rowcount = cnt from #b
set rowcount @rowcount
delete #a
from #b
where #a.i = #b.i and #a.j = #b.j and #a.k = #b.k
and #b.id = @id
set rowcount 0
delete #b where id = @id
end
drop table #b

Go to Top of Page

brianegge
Starting Member

USA
1 Posts

Posted - 06/01/2001 :  10:35:56  Show Profile  Reply with Quote
Here's how I solve this problem. If the table does not have an identity column, add one, so all the rows are numbered. Here's an example script:

Create table ##Test (a int not null, b int not null, c int not null, id int not null identity) on [Primary]
GO
INSERT INTO ##Test (A,B,C) VALUES (1,1,1)
INSERT INTO ##Test (A,B,C) VALUES (1,1,1)
INSERT INTO ##Test (A,B,C) VALUES (1,1,1)

INSERT INTO ##Test (A,B,C) VALUES (1,2,3)
INSERT INTO ##Test (A,B,C) VALUES (1,2,3)
INSERT INTO ##Test (A,B,C) VALUES (1,2,3)

INSERT INTO ##Test (A,B,C) VALUES (4,5,6)
GO
Select * from ##Test
GO
Delete from ##Test where id <
(Select Max(id) from ##Test t where ##Test.a = t.a and
##Test.b = t.b and
##Test.c = t.c)
GO
Select * from ##Test
GO

----
I think this is the fastest and simpliest method.

Go to Top of Page

aiken
Aged Yak Warrior

USA
525 Posts

Posted - 02/08/2002 :  13:14:48  Show Profile  Send aiken an ICQ Message  Reply with Quote
And then there's the dynamic SQL approach which uses no temp tables or identity keys. Probably not great for thousands or millions of duplicates, but very nice for trimming those pesky hundred duplicates from a million row table.

Here's the query as it would be for the demo DB in the article:

select 'delete from dup_authors where au_id=
(select top 1 au_id from dup_authors
where au_lname=' + au_lname + ' and au_fname='
+ au_fname + ' and city=' + city + ' and state='
+ state + ')'
from dup_authors
group by by au_lname, au_fname, city, state
having count(*) > 1

...each time it runs it will delete one of the duplicates; you have to run it a couple of times if one row is duplicated multiple times.

Cheers
-b



Go to Top of Page

MakeYourDaddyProud
Posting Yak Master

United Kingdom
184 Posts

Posted - 05/30/2002 :  11:12:15  Show Profile  Reply with Quote
-- delete dups i have done this b4

-- 1) assuming table has a unique identifier preferably keyed.
-- to delete all duplicate names keeping the one with highest ID

create table #Namez
(id int primary key,
name char(10) not null)
go

insert into #Namez values (1, "Mom")
insert into #Namez values (2, "Mom")
insert into #Namez values (3, "Sister")
insert into #Namez values (4, "Sister")
insert into #Namez values (5, "Sister")
insert into #Namez values (6, "Sister")
insert into #Namez values (7, "Sister")
insert into #Namez values (8, "Sister")
insert into #Namez values (9, "Dad")
go

select * from #Namez
go

DELETE #Namez
FROM #Namez
JOIN
(select [name], max([id]) AS MaxID
from #Namez
group by [Name]) AS G
ON G.[Name] = #Namez.[Name]
WHERE #Namez.[ID] < G.[MaxID]
AND #Namez.[Name] = G.[Name]

select * from #Namez
go

The correlated subquery agreggate establishes the link id in which to reference grouped deletes on name with a lesser id

HTH

Daniel Small CEO
www.danielsmall.com


Go to Top of Page

karbon
Starting Member

9 Posts

Posted - 12/09/2003 :  10:34:34  Show Profile  Reply with Quote

Why do you use long way????

Easy way
------------------
Example table
------------------
Create table EG
(
ID int,
Value1 int,
Value2 int
)

declare @ID int
declare @Count integer
declare CursorDuplicates Cursor for
SELECT ID FROM EG
open CursorDuplicates
fetch next from CursorDuplicates into @ID
while @@fetch_status=0
begin
select @Count = count(ID) from EG where ID = @ID
if @Count > 1
begin
DELETE EG WHERE CURRENT OF CursorDuplicates
end
fetch next from CursorDuplicates into @ID
end
close CursorDuplicates
deallocate CursorDuplicates
Go to Top of Page

robvolk
Most Valuable Yak

USA
15636 Posts

Posted - 12/09/2003 :  23:57:16  Show Profile  Visit robvolk's Homepage  Reply with Quote
You are kidding us, right?

Or do you prefer to make 3 left turns instead of one right turn?
Go to Top of Page

byrmol
Shed Building SQL Farmer

Australia
1591 Posts

Posted - 12/10/2003 :  00:07:30  Show Profile  Reply with Quote

WHERE CURRENT OF


I vote this the "Worst Operator ever implemented"

DavidM

"SQL-3 is an abomination.."
Go to Top of Page

Deep Blue
Starting Member

1 Posts

Posted - 06/28/2004 :  13:09:42  Show Profile  Send Deep Blue an AOL message  Send Deep Blue an ICQ Message  Send Deep Blue a Yahoo! Message  Reply with Quote
Hi!

The query helped me a bit. I've just rewritten it like this and works better in my case (my table having dups already has an autonumeric Primary Key)

delete dim_zonas_venta
where zona_venta_key not in
(
select max(zona_venta_key) as zona_venta_key
from dim_zonas_venta
group by cliente_key, mercado_grupo_key, cod_zona_venta, zona_venta
)
Go to Top of Page

Deep Blue
Starting Member

1 Posts

Posted - 06/28/2004 :  13:13:39  Show Profile  Send Deep Blue an AOL message  Send Deep Blue an ICQ Message  Send Deep Blue a Yahoo! Message  Reply with Quote
Hi!

The query helped me a bit. I've just rewritten it like this and works better in my case (my table having dups already has an autonumeric Primary Key)

delete dim_zonas_venta
where zona_venta_key not in
(
select max(zona_venta_key) as zona_venta_key
from dim_zonas_venta
group by cliente_key, mercado_grupo_key, cod_zona_venta, zona_venta
)
Go to Top of Page

darrendorlando
Starting Member

1 Posts

Posted - 05/09/2005 :  18:18:43  Show Profile  Reply with Quote
http://com-hpdevelopersolutions-2s.wwwa.com/members/remoteLoginCheck.cfm?id=MTMzMDAsYmxhY2tmaW4sL21lbWJlcnMvZXF1aXBtZW50L2luZGV4LmNmbQ_


This works for me and is way more efficient! Give'r a try, but remember to back up first and test the results to make sure you what you are expecting happens!


DELETE FROM [Table with Duplicates]
WHERE [Primary Key Field] IN
(
SELECT a.[Primary Key Field]
FROM [Table with Duplicates] a,
[Table with Duplicates] b

WHERE a.[Primary Key Field]!= b.[Primary Key Field] -- i.e. Userkey
AND a.[Value to check]= b.[Value to Check] -- i.e. Lastname
AND a.[Second Value to Check] = b.[Second Value to Check] -- i.e. Firstname
AND a.[Primary Key Field] < b.[Primary Key Field] -- i.e. Userkey
)


dd
ddorlando@gmail.com
Go to Top of Page

rosieq13
Starting Member

1 Posts

Posted - 06/14/2005 :  16:37:29  Show Profile  Reply with Quote
Your instructions on Deleting Duplicate Records are just great. I followed them and was able to accomplish the deletion. Usually I don't rate articles, but I cannot help saying Thanks to this one.
Go to Top of Page

speedadkt
Starting Member

4 Posts

Posted - 11/18/2005 :  16:18:21  Show Profile  Reply with Quote
This is by far the most straight forward, efficient method of the bunch. Works for any number of dupes too which is great.

quote:
Originally posted by darrendorlando

http://com-hpdevelopersolutions-2s.wwwa.com/members/remoteLoginCheck.cfm?id=MTMzMDAsYmxhY2tmaW4sL21lbWJlcnMvZXF1aXBtZW50L2luZGV4LmNmbQ_


This works for me and is way more efficient! Give'r a try, but remember to back up first and test the results to make sure you what you are expecting happens!


DELETE FROM [Table with Duplicates]
WHERE [Primary Key Field] IN
(
SELECT a.[Primary Key Field]
FROM [Table with Duplicates] a,
[Table with Duplicates] b

WHERE a.[Primary Key Field]!= b.[Primary Key Field] -- i.e. Userkey
AND a.[Value to check]= b.[Value to Check] -- i.e. Lastname
AND a.[Second Value to Check] = b.[Second Value to Check] -- i.e. Firstname
AND a.[Primary Key Field] < b.[Primary Key Field] -- i.e. Userkey
)


dd
ddorlando@gmail.com


Go to Top of Page

Merkwurdigliebe
Starting Member

1 Posts

Posted - 07/11/2006 :  17:47:17  Show Profile  Reply with Quote
And this is the only example I have found in several hours of looking that works with non-trivial tables and data. How often do you need to clean up a two-column table with an identity column and no PK, anyway?

Thanks for the clear, logical, and concise code!

I was able to use this to clean up a mess in a table with a three-way compound key comprised of GUID's (uniquidentifiers) that needed a dozen casts in other examples to even think about running. (and still bombed)


quote:
Originally posted by darrendorlando

http://com-hpdevelopersolutions-2s.wwwa.com/members/remoteLoginCheck.cfm?id=MTMzMDAsYmxhY2tmaW4sL21lbWJlcnMvZXF1aXBtZW50L2luZGV4LmNmbQ_


This works for me and is way more efficient! Give'r a try, but remember to back up first and test the results to make sure you what you are expecting happens!


DELETE FROM [Table with Duplicates]
WHERE [Primary Key Field] IN
(
SELECT a.[Primary Key Field]
FROM [Table with Duplicates] a,
[Table with Duplicates] b

WHERE a.[Primary Key Field]!= b.[Primary Key Field] -- i.e. Userkey
AND a.[Value to check]= b.[Value to Check] -- i.e. Lastname
AND a.[Second Value to Check] = b.[Second Value to Check] -- i.e. Firstname
AND a.[Primary Key Field] < b.[Primary Key Field] -- i.e. Userkey
)


dd
ddorlando@gmail.com


Go to Top of Page

Kristen
Test

United Kingdom
22403 Posts

Posted - 02/05/2007 :  04:05:27  Show Profile  Reply with Quote
Related information:

http://support.microsoft.com/kb/139444 (as posted in Positioned Updates/Deleted without PK)

Kristen
Go to Top of Page
Page: of 3 Previous Topic Topic Next Topic  
Next Page
 New Topic  Reply to Topic
 Printer Friendly
Jump To:
SQL Server Forums © 2000-2009 SQLTeam Publishing, LLC Go To Top Of Page
This page was generated in 0.31 seconds. Powered By: Snitz Forums 2000