Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Development (2000)
 partial join

Author  Topic 

mxfrail
Yak Posting Veteran

84 Posts

Posted - 2010-12-09 : 16:16:26
I am trying to join 2 tables on what I believe is a partial join.

Basically I want to join on address but I cannot get the join I want because the addresses on the files are from different sources. So some say 1400 eighth street and others say 1400 8th street etc or 1400 8th st etc etc etc.

Is this possible?

Kristen
Test

22859 Posts

Posted - 2010-12-09 : 16:23:26
Difficult, IME, although there may be products out there that will do a good job of de-duping addresses for you.

When I have had to do this in the past I have taken a temporary copy and "sanitised" this data - so (given your example) I would make successive UPDATES to change " 8th " to " eighth " and " st" to " street" (using some care where things were only valid to change if at the end of the column, rather than embedded in the middle - hence the spaces I have included in the strings - but more checks would be needed in reality for changes specific to end-of-line)

I would then match records and give them a score - so if the address matches perfectly match=1

If address 1, 3, 4, 5 is the same, but address 2 different, but the phone number also matches then match=2

You could then compare a sample of match=2 pairs and see if you think they are the same. If you are happy then you can declare that match=2 is "good enough"

Some match=N you will decide that a human will have to make the final decision (and maybe you will provide a side-by-side comparison page with MATCH / NO-MATCH buttons).
Go to Top of Page

DBA in the making
Aged Yak Warrior

638 Posts

Posted - 2010-12-21 : 08:45:14
quote:
Originally posted by Kristen

Difficult, IME, although there may be products out there that will do a good job of de-duping addresses for you.

When I have had to do this in the past I have taken a temporary copy and "sanitised" this data - so (given your example) I would make successive UPDATES to change " 8th " to " eighth " and " st" to " street" (using some care where things were only valid to change if at the end of the column, rather than embedded in the middle - hence the spaces I have included in the strings - but more checks would be needed in reality for changes specific to end-of-line)

I would then match records and give them a score - so if the address matches perfectly match=1

If address 1, 3, 4, 5 is the same, but address 2 different, but the phone number also matches then match=2

You could then compare a sample of match=2 pairs and see if you think they are the same. If you are happy then you can declare that match=2 is "good enough"

Some match=N you will decide that a human will have to make the final decision (and maybe you will provide a side-by-side comparison page with MATCH / NO-MATCH buttons).


I've just finished a project that did this. For me, it was Titles (Mr, Mrs, etc), label names as salutations in a CRM system. What I did was, do them in small batches, parse each batch out to an excel sheet, with the pre/post update values, and give that sheet to someone to manually check. They edited the post update values, which I then used in the update. Luckily, I wasn't working on that many records, and most of them didn't need any manual updating.

------------------------------------------------------------------------------------
Any and all code contained within this post comes with a 100% money back guarantee.
Go to Top of Page
   

- Advertisement -