Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 [solved] Making a duplicate address checker better

Author  Topic 

ravl13
Starting Member

38 Posts

Posted - 2011-09-21 : 15:26:13
Greetings,

I have the following duplicate address checker:

SELECT firstname, lastname, address1, City, state 
FROM ccncontacts
GROUP BY state, city, address1, firstname, lastname
HAVING COUNT(*) > 1


what I'd like to do is modify this query to return rows that have matching first 5 characters of the address1 column as well. The reason I want to do this is so that I can pick out 2+ rows that are nearly identical, but for instance one has an address1 of "21 Yancy St." and another row has "21 Yancy Street"

Conceptually, what I'd like to do is this:

SELECT firstname, lastname, LEFT(address1, 5) as adrs1, City, state 
FROM ccncontacts
GROUP BY state, city, adrs1, firstname, lastname
HAVING COUNT(*) > 1


but when I try that, the adrs1 in the GROUP BY clause causes an error ("invalid column name"). Any ideas on how I can improve the original query to have this functionality?

**NOTE: I realize that this new "LEFT 5" query could technically create false positives (ex. 2 guys both named "Ben Grimm", and living at addresses "22 Yan Road" & "22 Yancy Street" respectively). I'm OK with that. I'd manually check each result returned, and verify it is indeed a duplicate before deleting one of the rows manually. I'm running this query once to help me tidy up a database**

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2011-09-21 : 15:28:57
SELECT firstname, lastname, LEFT(address1, 5) as adrs1, City, state
FROM ccncontacts
GROUP BY state, city, LEFT(address1, 5), firstname, lastname
HAVING COUNT(*) > 1

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog
Go to Top of Page

ravl13
Starting Member

38 Posts

Posted - 2011-09-21 : 15:31:04
Haha, wow I feel silly, since the answer was so simple. I didn't realize you could put functions in the GROUP BY clause. Thanks!
Go to Top of Page

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2011-09-21 : 15:35:04
You're welcome, glad to help.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog
Go to Top of Page

blindman
Master Smack Fu Yak Hacker

2365 Posts

Posted - 2011-09-22 : 16:18:56
Instead of comparing just the first five characters, try this fuzzy-match function.
http://dl.dropbox.com/u/2507186/Functions/CompareText.sql
Pass in both address strings as parameters, and filter all the comparisons that return a result greater than 90.

________________________________________________
If it is not practically useful, then it is practically useless.
________________________________________________
Go to Top of Page
   

- Advertisement -