Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 Duplicate Delete Timeout

Author  Topic 

kittles3069
Starting Member

17 Posts

Posted - 2008-05-02 : 13:04:32
Hi All,

I am trying to delete the duplicates from a table, but keep 1 of the records.

Here is my query which works on smaller tables <2000 records, but the query either timeouts or just runs and runs on larger tables >50,000 and >150,000.


DELETE
FROM Table1
WHERE Table1.ID IN

/* LIST 1 - all rows that have duplicates*/
(SELECT F.ID
FROM Table1 AS F
WHERE EXISTS ( SELECT computerName, programName, version, publisher, Count(ID)
FROM Table1
WHERE Table1.computerName = F.computerName
AND Table1.programName = F.programName
AND Table1.version = F.version
AND Table1.publisher = F.publisher
GROUP BY Table1.computerName, Table1.programName, Table1.version, Table1.publisher
HAVING Count(Table1.ID) > 1))
AND Table1.ID NOT IN

/* LIST 2 - one row from each set of duplicate*/
(SELECT Min(ID)
FROM Table1 AS F
WHERE EXISTS ( SELECT computerName, programName, version, publisher, Count(ID)
FROM Table1
WHERE Table1.computerName = F.computerName
AND Table1.programName = F.programName
AND Table1.version = F.version
AND Table1.publisher = F.publisher
GROUP BY Table1.computerName, Table1.programName, Table1.version, Table1.publisher
HAVING Count(Table1.ID) > 1)
GROUP BY computerName, programName, version, publisher);


TABLE STRUCTURE::
ID computerName programName version publisher installDate
1 COMP1 Microsoft .NET Framework 1.1 1.1.4322 Microsoft 20050216
2 COMP1 Windows XP Hotfix - KB891781 20050110.17 Microsoft NULL
3 COMP1 Windows Media Player 10 NULL NULL NULL
4 COMP1 Microsoft .NET Framework 1.1 1.1.4322 Microsoft 02/16/2005

The duplicate records i am trying to delete are not truly duplicates, each record has a unique ID and install dates could vary. I am trying to identify duplicate records as identical (computerName + programName + version + publisher) See records 1 and 4 in table structure above... i dont care which of the duplicates is kept, as long as 1 is kept.

What I am looking for is either a completely new query which maybe optimized to run on a larger scale table or maybe just a performance tweak to my exisitng query??

Any help would be greatly appreciated!

Thanks.

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-05-02 : 13:21:24
Try with this:-
DELETE t
FROM(SELECT ROW_NUMBER() OVER (PARTITION BY computerName,programName,version,publisher ORDER BY ID) AS RowNo,
*
FROM Table1)t
WHERE t.RowNo >1
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-05-02 : 13:22:55
Also learn about what all you can do with ROW_NUMBER() function in SQL 2005 from this excellent article by Madhi

http://sqlblogcasts.com/blogs/madhivanan/archive/2007/08/27/multipurpose-row-number-function.aspx
Go to Top of Page

ganeshkumar08
Posting Yak Master

187 Posts

Posted - 2008-05-03 : 07:08:46
Hello,

Try the below query to delete duplicates,
Just copy and run in SQL2005, then u can easily understand...

declare @t1 Table(ID int identity(1,1), LNAME varchar(50))
insert into @t1
select 'Ganesh' union all
select 'Kumar' union all
select 'Kumar' union all
select 'Ganesh' union all
select 'Ganesh' union all
select 'Kumar' union all
select 'Ganesh' union all
select 'Arun'
-- Before Duplicate Deletion
select * from @t1

delete from @t1 where ID NOT IN
(Select Max(ID)
from @t1 Group by(LNAME) Having( Count(LNAME) >= 1))
-- After Duplicate Deletion
select * from @t1

Ganesh
Go to Top of Page
   

- Advertisement -