Duplicate Delete Timeout - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2005 Forums

Transact-SQL (2005)

Duplicate Delete Timeout

Author

Topic

kittles3069
Starting Member

17 Posts

Posted - 2008-05-02 : 13:04:32

Hi All,

I am trying to delete the duplicates from a table, but keep 1 of the records.

Here is my query which works on smaller tables <2000 records, but the query either timeouts or just runs and runs on larger tables >50,000 and >150,000.


DELETE 
FROM Table1
WHERE Table1.ID IN
                          
/* LIST 1 - all rows that have duplicates*/ 
(SELECT     F.ID
 FROM       Table1 AS F
 WHERE      EXISTS ( SELECT     computerName, programName, version, publisher, Count(ID)
		     FROM       Table1
		     WHERE      Table1.computerName = F.computerName 
				AND Table1.programName = F.programName 
				AND Table1.version = F.version 
				AND Table1.publisher = F.publisher
		     GROUP BY Table1.computerName, Table1.programName, Table1.version, Table1.publisher
		     HAVING      Count(Table1.ID) > 1))
 AND Table1.ID NOT IN

/* LIST 2 - one row from each set of duplicate*/ 
(SELECT     Min(ID)
 FROM       Table1 AS F
 WHERE      EXISTS ( SELECT     computerName, programName, version, publisher, Count(ID)
		     FROM       Table1
		     WHERE      Table1.computerName = F.computerName 
				AND Table1.programName = F.programName 
				AND Table1.version = F.version 
				AND Table1.publisher = F.publisher
		     GROUP BY Table1.computerName, Table1.programName, Table1.version, Table1.publisher
		     HAVING      Count(Table1.ID) > 1)
 GROUP BY computerName, programName, version, publisher);

TABLE STRUCTURE::
ID computerName programName version publisher installDate
1 COMP1 Microsoft .NET Framework 1.1 1.1.4322 Microsoft 20050216
2 COMP1 Windows XP Hotfix - KB891781 20050110.17 Microsoft NULL
3 COMP1 Windows Media Player 10 NULL NULL NULL
4 COMP1 Microsoft .NET Framework 1.1 1.1.4322 Microsoft 02/16/2005

The duplicate records i am trying to delete are not truly duplicates, each record has a unique ID and install dates could vary. I am trying to identify duplicate records as identical (computerName + programName + version + publisher) See records 1 and 4 in table structure above... i dont care which of the duplicates is kept, as long as 1 is kept.

What I am looking for is either a completely new query which maybe optimized to run on a larger scale table or maybe just a performance tweak to my exisitng query??

Any help would be greatly appreciated!

Thanks.

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-05-02 : 13:21:24

Try with this:-

DELETE t
FROM(SELECT ROW_NUMBER() OVER (PARTITION BY computerName,programName,version,publisher ORDER BY ID) AS RowNo,
*
FROM Table1)t
WHERE t.RowNo >1

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-05-02 : 13:22:55

Also learn about what all you can do with ROW_NUMBER() function in SQL 2005 from this excellent article by Madhi

http://sqlblogcasts.com/blogs/madhivanan/archive/2007/08/27/multipurpose-row-number-function.aspx

ganeshkumar08
Posting Yak Master

187 Posts

Posted - 2008-05-03 : 07:08:46

Hello,

Try the below query to delete duplicates,
Just copy and run in SQL2005, then u can easily understand...

declare @t1 Table(ID int identity(1,1), LNAME varchar(50))
insert into @t1
select 'Ganesh' union all
select 'Kumar' union all
select 'Kumar' union all
select 'Ganesh' union all
select 'Ganesh' union all
select 'Kumar' union all
select 'Ganesh' union all
select 'Arun'
-- Before Duplicate Deletion
select * from @t1

delete from @t1 where ID NOT IN
(Select Max(ID)
from @t1 Group by(LNAME) Having( Count(LNAME) >= 1))
-- After Duplicate Deletion
select * from @t1

Ganesh

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources