Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 Delete Duplicates using analytics

Author  Topic 

hrishy
Starting Member

47 Posts

Posted - 2008-07-15 : 14:30:23
Hi

I have a table with records someting like this


Account Modified Created
124 01/07/2008 01/07/2008
124 02/07/2008 02/07/2008
125 01/07/2008 01/07/2008
125 02/07/2008 02/07/2008
125 03/07/2008 01/07/2008


i want to delete the duplicate Account so i am left with only those account numbers that were modified last.

So i am left with only these is it possible to do with a single delete statement


Account Modified Created
124 02/07/2008 02/07/2008
125 03/07/2008 01/07/2008

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2008-07-15 : 14:34:59
I think this'll do it, test first though with a SELECT instead of the DELETE just in case!

DELETE t1
FROM YourTable t1
LEFT OUTER JOIN (
SELECT Account, MAX(Modified) AS Modified
FROM YourTable
GROUP BY Account) t2
ON t1.Account = t2.Account AND t1.Modified = t2.Modified
WHERE t2.Account IS NULL AND t2.Modified IS NULL

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-07-15 : 14:39:12
also this:-

DELETE t
FROM
(SELECT ROW_NUMBER() OVER(PARTITION BY Account ORDER BY Modified DESC) AS Seq,*
FROM YourTable)t
WHERE t.Seq>1



please run with select see if its giving you only records to be removed.
Go to Top of Page

hrishy
Starting Member

47 Posts

Posted - 2008-07-15 : 14:39:41
Hi

Thanks a million
Amazing i had been scratching my head over this for almost 3 hours.
Just a question
whats the signoficance of this condition
WHERE t2.Account IS NULL AND t2.Modified IS NULL

regards
Hrishy
Go to Top of Page

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2008-07-15 : 14:41:43
It's due to the LEFT OUTER JOIN. So we are only deleting those records that do not match the join condition. In that situation with an outer join, the right (or left in a right join) table will have null values in the result set.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog
Go to Top of Page

hrishy
Starting Member

47 Posts

Posted - 2008-07-15 : 14:42:18
Hi Visakh16

Thanks a million
Your solution is simple neat and clean.

However whats the significance of Seq,*

would'nt just seq be enough

regards
Hrishy
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-07-15 : 14:42:34
quote:
Originally posted by hrishy

Hi

Thanks a million
Amazing i had been scratching my head over this for almost 3 hours.
Just a question
whats the signoficance of this condition
WHERE t2.Account IS NULL AND t2.Modified IS NULL

regards
Hrishy


its to exclude those records that are involved in subquery i.e those which have max modified date for each Account.
Go to Top of Page

hrishy
Starting Member

47 Posts

Posted - 2008-07-15 : 14:43:31
Hi Tara

Thanks again for the clear and consise explanation.

regards
Hrishy
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-07-15 : 14:46:51
quote:
Originally posted by hrishy

Hi Visakh16

Thanks a million
Your solution is simple neat and clean.

However whats the significance of Seq,*

would'nt just seq be enough

regards
Hrishy


Seq adds the sequence number to identify the correct records.You can exclude the *. i just copy pasted it from select.
Go to Top of Page

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2008-07-15 : 14:49:05
Just to be clear with terminology, both of our solutions use derived tables and not subqueries.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-07-15 : 14:51:25
quote:
Originally posted by tkizer

Just to be clear with terminology, both of our solutions use derived tables and not subqueries.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog



yeah..it is . I stated it wrong. Thanks for pointing it out.
Go to Top of Page

hrishy
Starting Member

47 Posts

Posted - 2008-07-15 : 14:54:11
quote:
Originally posted by tkizer

Just to be clear with terminology, both of our solutions use derived tables and not subqueries.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog




Hmm are subqueries bad in Sqlsever 2005 ?

regards
Hrishy
Go to Top of Page

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2008-07-15 : 14:57:26
It depends, but derived tables are typically preferred.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-07-15 : 14:58:14
Mostly bad due to the cost.
They are bad in SQL Server 2000 too.



E 12°55'05.25"
N 56°04'39.16"
Go to Top of Page
   

- Advertisement -