Finding Duplicate Records - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2005 Forums

Transact-SQL (2005)

Finding Duplicate Records

Author

Topic

hismightiness
Posting Yak Master

164 Posts

Posted - 2008-08-20 : 13:19:37

I am using the advice given in this article to find and remove duplicate data:

[url]http://www.sql-server-performance.com/articles/dba/delete_duplicates_p1.aspx[/url]

However, I am having problems with getting this example to work with my own data. I know this works, as I ran through the example on that link myself.

Here is the data structure I have:

CREATE TABLE [dbo].[city] (
[city_id] [int] IDENTITY(1,1) NOT NULL,
[city_name] [nvarchar](75) NOT NULL,
[state_id] [int] NOT NULL,
[zipcode_id] [int] NOT NULL
);

The city_id and the zipcode_id columns are unique. I am in the middle of removing the zipcode_id column, but I need to first get rid of duplicate cities in this table.

The part in that previous example that is giving me problems is the GROUP BY query that grabs the duplicate records. It is not fetching any data for me.

Here is my unsuccessful query:

SELECT c.* FROM [dbo].[city] c
GROUP BY c.[city_id], c.[city_name], c.[state_id], c.[zipcode_id]
HAVING COUNT(*) > 1;

I am returning no records. Just to troubleshoot it, I also dropped the zipcode_id column to see if it made a difference, but it didn't.

How can I get that GROUP BY query to work right? What am I doing wrong?

- - - -
- Will -
- - - -
http://www.strohlsitedesign.com
http://blog.strohlsitedesign.com/
http://skins.strohlsitedesign.com/

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2008-08-20 : 13:21:21

SELECT city_name
FROM [dbo].[city]
GROUP BY [city_name]
HAVING COUNT(*) > 1

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2008-08-20 : 13:23:19

To return the other columns, use a derived table:


SELECT c.*
FROM [dbo].[city] c
INNER JOIN
(
	SELECT city_name
	FROM [dbo].[city] 
	GROUP BY [city_name]
	HAVING COUNT(*) > 1
) dt
ON c.city_name = dt.city_name

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

hismightiness
Posting Yak Master

164 Posts

Posted - 2008-08-20 : 13:39:22

I think your second example might do what I just found and tested from another article:

WITH City_Cte AS(
SELECT ROW_NUMBER() OVER (PARTITION BY [city_name], [state_id] ORDER BY [city_name], [state_id] DESC) AS MyRowNumber,
[city_id],[city_name],[state_id],[zipcode_id]
FROM [dbo].[city]
)
SELECT * FROM City_Cte WHERE MyRowNumber > 1;

I will also try yours. This is a one-time query that needs to run to update the DB schema. Which method would you suggest to use?

- - - -
- Will -
- - - -
http://www.strohlsitedesign.com
http://blog.strohlsitedesign.com/
http://skins.strohlsitedesign.com/

hismightiness
Posting Yak Master

164 Posts

Posted - 2008-08-20 : 13:46:13

Actually, your example selected all records, not the duplicate ones.

- - - -
- Will -
- - - -
http://www.strohlsitedesign.com
http://blog.strohlsitedesign.com/
http://skins.strohlsitedesign.com/

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-08-20 : 13:46:35

quote:
Originally posted by hismightiness

I think your second example might do what I just found and tested from another article:

WITH City_Cte AS(
SELECT ROW_NUMBER() OVER (PARTITION BY [city_name], [state_id] ORDER BY [city_name], [state_id] DESC) AS MyRowNumber,
[city_id],[city_name],[state_id],[zipcode_id]
FROM [dbo].[city]
)
SELECT * FROM City_Cte WHERE MyRowNumber > 1;

I will also try yours. This is a one-time query that needs to run to update the DB schema. Which method would you suggest to use?

- - - -
- Will -
- - - -
http://www.strohlsitedesign.com
http://blog.strohlsitedesign.com/
http://skins.strohlsitedesign.com/

actually you dont need a CTE. you can directly use a derived table.

SELECT * FROM 
(SELECT ROW_NUMBER() OVER (PARTITION BY [city_name], [state_id] ORDER BY [city_name], [state_id] DESC) AS MyRowNumber, 
	[city_id],[city_name],[state_id],[zipcode_id] 
	FROM [dbo].[city]
)t
WHERE MyRowNumber > 1

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2008-08-20 : 13:49:52

quote:
Originally posted by hismightiness

Actually, your example selected all records, not the duplicate ones.

- - - -
- Will -
- - - -
http://www.strohlsitedesign.com
http://blog.strohlsitedesign.com/
http://skins.strohlsitedesign.com/

for that you need to modify like this

SELECT c.*
FROM [dbo].[city] c
LEFT JOIN
(
	SELECT city_name,MIN(city_id) AS MinID
	FROM [dbo].[city] 
	GROUP BY [city_name]
) dt
ON c.city_name = dt.city_name
AND c.city_id=dt.MinID
WHERE dt.city_name IS NULL

hismightiness
Posting Yak Master

164 Posts

Posted - 2008-08-20 : 13:50:24

Whammmo! Thanks, visakh16. That is exactly what I needed. I really appreciate it!

- - - -
- Will -
- - - -
http://www.strohlsitedesign.com
http://blog.strohlsitedesign.com/
http://skins.strohlsitedesign.com/

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources