Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 Delete Duplicates

Author  Topic 

rn5a
Starting Member

25 Posts

Posted - 2008-10-25 : 06:15:50
CREATE TABLE [dbo].[MyTable](
[Name] [varchar](50) NULL,
[Age] [smallint] NULL
) ON [PRIMARY]
------------------------------------
INSERT INTO MyTable VALUES ('Aaron', 34)
INSERT INTO MyTable VALUES ('Aaron', 34)
INSERT INTO MyTable VALUES ('Betty', 26)
INSERT INTO MyTable VALUES ('Clark', 29)
INSERT INTO MyTable VALUES ('Clark', 29)
INSERT INTO MyTable VALUES ('Diana', 33)
INSERT INTO MyTable VALUES ('Diana', 33)
INSERT INTO MyTable VALUES ('Elena', 41)
------------------------------------

To delete duplicates from the above table, this is the query:

------------------------------------
SET ROWCOUNT 1
DELETE MyTable
FROM MyTable a
WHERE (SELECT COUNT(*) FROM MyTable b WHERE b.name = a.name AND b.age = a.age) > 1
WHILE @@ROWCOUNT > 0
DELETE MyTable
FROM MyTable a
WHERE (SELECT COUNT(*) FROM MyTable b WHERE b.name = a.name AND b.age = a.age) > 1
SET ROWCOUNT 0
------------------------------------

Can someone please explain me step-by-step how does the above query delete duplicates?

Thanks,

Ron

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-10-25 : 07:13:58
Beter method is this

DELETE t
FROM
(SELECT ROW_NUMBER() OVER (PARTITION BY Name,Age ORDER BY Name) AS Seq,*
FROM MyTable)t
WHERE t.Seq>1


what it does is organise the data intable into groupsof Name Age and number each with help of row_number() functions. So the duplicate records will get number 1,2,3... I delete those which have number >1. this will remove all records that existed as duplicates within each group and retain only first one (Seq=1)
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-10-25 : 07:19:05
also learn other uses of row_number() function

http://sqlblogcasts.com/blogs/madhivanan/archive/2007/08/27/multipurpose-row-number-function.aspx
Go to Top of Page

gsrinivas.
Yak Posting Veteran

56 Posts

Posted - 2008-10-25 : 08:22:38
hai,
very thanks for deleting duplicates in single query....
GSrinivas.
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-10-25 : 09:25:11
quote:
Originally posted by gsrinivas.

hai,
very thanks for deleting duplicates in single query....
GSrinivas.


you're welcome
Go to Top of Page

rn5a
Starting Member

25 Posts

Posted - 2008-10-27 : 01:43:02
quote:
Originally posted by visakh16

Beter method is this

DELETE t
FROM
(SELECT ROW_NUMBER() OVER (PARTITION BY Name,Age ORDER BY Name) AS Seq,*
FROM MyTable)t
WHERE t.Seq>1


what it does is organise the data intable into groupsof Name Age and number each with help of row_number() functions. So the duplicate records will get number 1,2,3... I delete those which have number >1. this will remove all records that existed as duplicates within each group and retain only first one (Seq=1)

Thanks, mate, for introducing me to the ROW_NUMBER function.

Can it be said that the PARTITION BY clause groups the identical rows based on the columns supplied to the PARTITION BY clause?

Thanks,

Ron
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-10-27 : 02:13:08
yup. it partitions the table data horizontally into groups based on fields specified on partition by.
Go to Top of Page

rn5a
Starting Member

25 Posts

Posted - 2008-10-31 : 05:11:51
quote:
Originally posted by visakh16

yup. it partitions the table data horizontally into groups based on fields specified on partition by.

Then what is the difference between the GROUP BY & PARTITION BY clauses?

Thanks,

Ron
Go to Top of Page

madhivanan
Premature Yak Congratulator

22864 Posts

Posted - 2008-10-31 : 05:19:37
quote:
Originally posted by rn5a

quote:
Originally posted by visakh16

yup. it partitions the table data horizontally into groups based on fields specified on partition by.

Then what is the difference between the GROUP BY & PARTITION BY clauses?

Thanks,

Ron

PARTITION BY is used to reset the ranking numbers


Madhivanan

Failing to plan is Planning to fail
Go to Top of Page
   

- Advertisement -