| Author |
Topic |
|
hismightiness
Posting Yak Master
164 Posts |
Posted - 2008-08-20 : 13:19:37
|
| I am using the advice given in this article to find and remove duplicate data:[url]http://www.sql-server-performance.com/articles/dba/delete_duplicates_p1.aspx[/url]However, I am having problems with getting this example to work with my own data. I know this works, as I ran through the example on that link myself.Here is the data structure I have:CREATE TABLE [dbo].[city] ( [city_id] [int] IDENTITY(1,1) NOT NULL, [city_name] [nvarchar](75) NOT NULL, [state_id] [int] NOT NULL, [zipcode_id] [int] NOT NULL);The city_id and the zipcode_id columns are unique. I am in the middle of removing the zipcode_id column, but I need to first get rid of duplicate cities in this table. The part in that previous example that is giving me problems is the GROUP BY query that grabs the duplicate records. It is not fetching any data for me. Here is my unsuccessful query:SELECT c.* FROM [dbo].[city] c GROUP BY c.[city_id], c.[city_name], c.[state_id], c.[zipcode_id] HAVING COUNT(*) > 1;I am returning no records. Just to troubleshoot it, I also dropped the zipcode_id column to see if it made a difference, but it didn't.How can I get that GROUP BY query to work right? What am I doing wrong?- - - -- Will -- - - -http://www.strohlsitedesign.comhttp://blog.strohlsitedesign.com/http://skins.strohlsitedesign.com/ |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2008-08-20 : 13:23:19
|
To return the other columns, use a derived table:SELECT c.*FROM [dbo].[city] cINNER JOIN( SELECT city_name FROM [dbo].[city] GROUP BY [city_name] HAVING COUNT(*) > 1) dtON c.city_name = dt.city_name Tara KizerMicrosoft MVP for Windows Server System - SQL Serverhttp://weblogs.sqlteam.com/tarad/Subscribe to my blog |
 |
|
|
hismightiness
Posting Yak Master
164 Posts |
Posted - 2008-08-20 : 13:39:22
|
| I think your second example might do what I just found and tested from another article:WITH City_Cte AS( SELECT ROW_NUMBER() OVER (PARTITION BY [city_name], [state_id] ORDER BY [city_name], [state_id] DESC) AS MyRowNumber, [city_id],[city_name],[state_id],[zipcode_id] FROM [dbo].[city])SELECT * FROM City_Cte WHERE MyRowNumber > 1;I will also try yours. This is a one-time query that needs to run to update the DB schema. Which method would you suggest to use?- - - -- Will -- - - -http://www.strohlsitedesign.comhttp://blog.strohlsitedesign.com/http://skins.strohlsitedesign.com/ |
 |
|
|
hismightiness
Posting Yak Master
164 Posts |
Posted - 2008-08-20 : 13:46:13
|
| Actually, your example selected all records, not the duplicate ones.- - - -- Will -- - - -http://www.strohlsitedesign.comhttp://blog.strohlsitedesign.com/http://skins.strohlsitedesign.com/ |
 |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2008-08-20 : 13:46:35
|
quote: Originally posted by hismightiness I think your second example might do what I just found and tested from another article:WITH City_Cte AS( SELECT ROW_NUMBER() OVER (PARTITION BY [city_name], [state_id] ORDER BY [city_name], [state_id] DESC) AS MyRowNumber, [city_id],[city_name],[state_id],[zipcode_id] FROM [dbo].[city])SELECT * FROM City_Cte WHERE MyRowNumber > 1;I will also try yours. This is a one-time query that needs to run to update the DB schema. Which method would you suggest to use?- - - -- Will -- - - -http://www.strohlsitedesign.comhttp://blog.strohlsitedesign.com/http://skins.strohlsitedesign.com/
actually you dont need a CTE. you can directly use a derived table.SELECT * FROM (SELECT ROW_NUMBER() OVER (PARTITION BY [city_name], [state_id] ORDER BY [city_name], [state_id] DESC) AS MyRowNumber, [city_id],[city_name],[state_id],[zipcode_id] FROM [dbo].[city])tWHERE MyRowNumber > 1 |
 |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2008-08-20 : 13:49:52
|
quote: Originally posted by hismightiness Actually, your example selected all records, not the duplicate ones.- - - -- Will -- - - -http://www.strohlsitedesign.comhttp://blog.strohlsitedesign.com/http://skins.strohlsitedesign.com/
for that you need to modify like thisSELECT c.*FROM [dbo].[city] cLEFT JOIN( SELECT city_name,MIN(city_id) AS MinID FROM [dbo].[city] GROUP BY [city_name]) dtON c.city_name = dt.city_nameAND c.city_id=dt.MinIDWHERE dt.city_name IS NULL |
 |
|
|
hismightiness
Posting Yak Master
164 Posts |
Posted - 2008-08-20 : 13:50:24
|
| Whammmo! Thanks, visakh16. That is exactly what I needed. I really appreciate it!- - - -- Will -- - - -http://www.strohlsitedesign.comhttp://blog.strohlsitedesign.com/http://skins.strohlsitedesign.com/ |
 |
|
|
|