SQL Server Forums
Profile | Register | Active Topics | Members | Search | Forum FAQ
 
Register Now and get your question answered!
Username:
Password:
Save Password
Forgot your Password?

 All Forums
 SQL Server 2000 Forums
 SQL Server Development (2000)
 partial join
 New Topic  Reply to Topic
 Printer Friendly
Author Previous Topic Topic Next Topic  

mxfrail
Yak Posting Veteran

84 Posts

Posted - 12/09/2010 :  16:16:26  Show Profile  Reply with Quote
I am trying to join 2 tables on what I believe is a partial join.

Basically I want to join on address but I cannot get the join I want because the addresses on the files are from different sources. So some say 1400 eighth street and others say 1400 8th street etc or 1400 8th st etc etc etc.

Is this possible?

Kristen
Test

United Kingdom
22403 Posts

Posted - 12/09/2010 :  16:23:26  Show Profile  Reply with Quote
Difficult, IME, although there may be products out there that will do a good job of de-duping addresses for you.

When I have had to do this in the past I have taken a temporary copy and "sanitised" this data - so (given your example) I would make successive UPDATES to change " 8th " to " eighth " and " st" to " street" (using some care where things were only valid to change if at the end of the column, rather than embedded in the middle - hence the spaces I have included in the strings - but more checks would be needed in reality for changes specific to end-of-line)

I would then match records and give them a score - so if the address matches perfectly match=1

If address 1, 3, 4, 5 is the same, but address 2 different, but the phone number also matches then match=2

You could then compare a sample of match=2 pairs and see if you think they are the same. If you are happy then you can declare that match=2 is "good enough"

Some match=N you will decide that a human will have to make the final decision (and maybe you will provide a side-by-side comparison page with MATCH / NO-MATCH buttons).

Edited by - Kristen on 12/09/2010 16:26:23
Go to Top of Page

DBA in the making
Aged Yak Warrior

638 Posts

Posted - 12/21/2010 :  08:45:14  Show Profile  Reply with Quote
quote:
Originally posted by Kristen

Difficult, IME, although there may be products out there that will do a good job of de-duping addresses for you.

When I have had to do this in the past I have taken a temporary copy and "sanitised" this data - so (given your example) I would make successive UPDATES to change " 8th " to " eighth " and " st" to " street" (using some care where things were only valid to change if at the end of the column, rather than embedded in the middle - hence the spaces I have included in the strings - but more checks would be needed in reality for changes specific to end-of-line)

I would then match records and give them a score - so if the address matches perfectly match=1

If address 1, 3, 4, 5 is the same, but address 2 different, but the phone number also matches then match=2

You could then compare a sample of match=2 pairs and see if you think they are the same. If you are happy then you can declare that match=2 is "good enough"

Some match=N you will decide that a human will have to make the final decision (and maybe you will provide a side-by-side comparison page with MATCH / NO-MATCH buttons).


I've just finished a project that did this. For me, it was Titles (Mr, Mrs, etc), label names as salutations in a CRM system. What I did was, do them in small batches, parse each batch out to an excel sheet, with the pre/post update values, and give that sheet to someone to manually check. They edited the post update values, which I then used in the update. Luckily, I wasn't working on that many records, and most of them didn't need any manual updating.

------------------------------------------------------------------------------------
Any and all code contained within this post comes with a 100% money back guarantee.
Go to Top of Page
  Previous Topic Topic Next Topic  
 New Topic  Reply to Topic
 Printer Friendly
Jump To:
SQL Server Forums © 2000-2009 SQLTeam Publishing, LLC Go To Top Of Page
This page was generated in 0.24 seconds. Powered By: Snitz Forums 2000