Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 Extact Match

Author  Topic 

idsck
Starting Member

16 Posts

Posted - 2007-05-04 : 22:11:08
Scenario: 2 sets of data
Set 1 (200 records): firstname, lastname, address, city, phone
Set 2 (100 records): firstname, lastname, address, city, phone

I run a query using phone as criteria/condition (where xx.phone=yy.phone). I got 75 matches based on the phone numbers

However, if I deleted the phone numbers from set 2 data, used below query, the return result is only 45 matches. My question is what technique that I can use to optimize the result just based on criteria like first, last, and address.

select xx.firstname, xx.lastname, yy.Firstname, yy.Lastname, xx.address, yy.address, xx.city, yy.city, yy.phone
from set2 xx, set1 yy where
substring(soundex(xx.Firstname),0,3)=substring(soundex(yy.Firstname),0,3)
and substring(soundex(xx.lastname),0,4)=substring(soundex(yy.Lastname),0,4)
and substring(xx.address,1,7)= substring(yy.address,1,7)
and xx.city=yy.city;


Thank you

id....

rmiao
Master Smack Fu Yak Hacker

7266 Posts

Posted - 2007-05-04 : 22:26:54
How can you get same result with different search condition?
Go to Top of Page

idsck
Starting Member

16 Posts

Posted - 2007-05-04 : 23:15:22
of course not.
What I want or wish to find out is how to optimize my query and get more similar match.
Go to Top of Page

rmiao
Master Smack Fu Yak Hacker

7266 Posts

Posted - 2007-05-05 : 12:33:14
Use 'where xx.phone=yy.phone' only in your select statement, then check values in other columns to see if can find some logical there.
Go to Top of Page

idsck
Starting Member

16 Posts

Posted - 2007-05-05 : 14:10:37
rmiao: thanks. I think it's good idea to do that and identify the pattern of other columns. I have one last question on this. How can I use something like vowel algorithm or _similarity to compare these 2 sets of data? Thanks
Go to Top of Page

rmiao
Master Smack Fu Yak Hacker

7266 Posts

Posted - 2007-05-05 : 18:07:55
Try tablediff if you use sql2k5.
Go to Top of Page

jezemine
Master Smack Fu Yak Hacker

2886 Posts

Posted - 2007-05-05 : 18:45:35
to compare how close two strings are in "edit space" you can use the Levenshtein edit distance algorithm:

http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=51540





www.elsasoft.org
Go to Top of Page

blindman
Master Smack Fu Yak Hacker

2365 Posts

Posted - 2007-05-06 : 10:01:24
This algorithm was developed specifically to do what you are asking. I've been using if for more than 10 years.
http://sqlblindman.googlepages.com/fuzzysearchalgorithm
SOUNDEX is not really appropriate for this. As a matter of fact, I'm not sure how many things SOUNDEX is useful for, except as a linguistic curiosity.

e4 d5 xd5 Nf6
Go to Top of Page

idsck
Starting Member

16 Posts

Posted - 2007-05-06 : 13:54:21
Hi Blindman: wonder if you could briefly walk me through how to use your fuzzy algorithm and what are the steps I need to do before testing it. I am new and it so advance for me. Thank you.
id
Go to Top of Page

blindman
Master Smack Fu Yak Hacker

2365 Posts

Posted - 2007-05-07 : 00:45:29
There are some tricks you can do to opimize this for large recordsets, but for only a few hundred records you can just do this:
select	Set1.Firstname,
Set1.Lastname,
Set2.Firstname,
Set2.Lastname
from Set1
inner join Set2 on
dbo.CompareText(
dbo.MatchText(Set1.FirstName + Set1.LastName),
dbo.MatchText(Set2.FirstName + Set2.LastName)
) > 80 --Increase number for tighter search


e4 d5 xd5 Nf6
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-05-07 : 02:53:48
How do your phone numbers look like?
Are there hyphens? Spaces? Other characters than numbers?


Peter Larsson
Helsingborg, Sweden
Go to Top of Page

idsck
Starting Member

16 Posts

Posted - 2007-05-07 : 10:22:13
Set1: 999-999-9999
Set2: 9999999999

so it's better to convert them into same format?

thanks
id
Go to Top of Page

idsck
Starting Member

16 Posts

Posted - 2007-05-07 : 10:24:15
Thanks Blindman!!!

quote:
Originally posted by blindman

There are some tricks you can do to opimize this for large recordsets, but for only a few hundred records you can just do this:
select	Set1.Firstname,
Set1.Lastname,
Set2.Firstname,
Set2.Lastname
from Set1
inner join Set2 on
dbo.CompareText(
dbo.MatchText(Set1.FirstName + Set1.LastName),
dbo.MatchText(Set2.FirstName + Set2.LastName)
) > 80 --Increase number for tighter search


e4 d5 xd5 Nf6

Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-05-07 : 14:32:50
select Set1.Firstname,
Set1.Lastname,
Set2.Firstname,
Set2.Lastname
from Set1
inner join Set2 on replace(set2.phonenumber, '-', '') = replace(set1.phonenumber, '-', '')


Peter Larsson
Helsingborg, Sweden
Go to Top of Page

blindman
Master Smack Fu Yak Hacker

2365 Posts

Posted - 2007-05-07 : 15:16:00
Not what he is looking for now...
quote:
Originally posted by idsck
My question is what technique that I can use to optimize the result just based on criteria like first, last, and address.



e4 d5 xd5 Nf6
Go to Top of Page

idsck
Starting Member

16 Posts

Posted - 2007-05-08 : 22:17:41
quote:
Originally posted by blindman

Not what he is looking for now...
quote:
Originally posted by idsck
My question is what technique that I can use to optimize the result just based on criteria like first, last, and address.



e4 d5 xd5 Nf6





Thanks ALL!! it works well......

Go to Top of Page

idsck
Starting Member

16 Posts

Posted - 2007-05-12 : 04:20:00
[quote]Originally posted by idsck

Thanks Blindman!!!

[quote]Originally posted by blindman

There are some tricks you can do to opimize this for large recordsets, but for only a few hundred records you can just do this:
select	Set1.Firstname,
Set1.Lastname,
Set2.Firstname,
Set2.Lastname
from Set1
inner join Set2 on
dbo.CompareText(
dbo.MatchText(Set1.FirstName + Set1.LastName),
dbo.MatchText(Set2.FirstName + Set2.LastName)
) > 80 --Increase number for tighter search


how can i put this result into a new table? I tried to use something like this but it gave me syntax error. I also tried to use insert statement, but nothing work. Please advises and thank you so much!


Create Table exMatch As
(select Set1.Firstname,
Set1.Lastname,
Set2.Firstname,
Set2.Lastname
from Set1
inner join Set2 on
dbo.CompareText(
dbo.MatchText(Set1.FirstName + Set1.LastName),
dbo.MatchText(Set2.FirstName + Set2.LastName)
) > 80) --Increase number for tighten

Go to Top of Page

blindman
Master Smack Fu Yak Hacker

2365 Posts

Posted - 2007-05-12 : 20:02:30
You need to use SELECT INTO rather than CREATE TABLE:

select Set1.Firstname,
Set1.Lastname,
Set2.Firstname,
Set2.Lastname
into exMatch
from Set1
inner join Set2 on
dbo.CompareText(
dbo.MatchText(Set1.FirstName + Set1.LastName),
dbo.MatchText(Set2.FirstName + Set2.LastName)
) > 80)


e4 d5 xd5 Nf6
Go to Top of Page

idsck
Starting Member

16 Posts

Posted - 2007-05-12 : 22:02:08
thanks!!!
Go to Top of Page

marat
Yak Posting Veteran

85 Posts

Posted - 2008-04-08 : 01:20:23
This is quite a complicated issue.
I suggest use scoring to set up a matching threashold and to filter matching records.
You have to create scoring rules:
1. exact match on first name, last name, < other criterion>, <score>
2. exact match on first name initial, surname, <other criterion>, <score>
....all othere matching criterias
It all depends on business requirements; how complicated your matching engine should be.
Good luck.

Go to Top of Page

jezemine
Master Smack Fu Yak Hacker

2886 Posts

Posted - 2008-04-08 : 02:38:11
quote:
Originally posted by marat


Good luck.



they will need a great amount of luck to stumble on this post after it's been dead a year...




elsasoft.org
Go to Top of Page
  Previous Page&nsp;  Next Page

- Advertisement -