Extact Match - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

General SQL Server Forums

New to SQL Server Programming

Extact Match

Author

Topic

idsck
Starting Member

16 Posts

Posted - 2007-05-04 : 22:11:08

Scenario: 2 sets of data
Set 1 (200 records): firstname, lastname, address, city, phone
Set 2 (100 records): firstname, lastname, address, city, phone

I run a query using phone as criteria/condition (where xx.phone=yy.phone). I got 75 matches based on the phone numbers

However, if I deleted the phone numbers from set 2 data, used below query, the return result is only 45 matches. My question is what technique that I can use to optimize the result just based on criteria like first, last, and address.

select xx.firstname, xx.lastname, yy.Firstname, yy.Lastname, xx.address, yy.address, xx.city, yy.city, yy.phone
from set2 xx, set1 yy where
substring(soundex(xx.Firstname),0,3)=substring(soundex(yy.Firstname),0,3)
and substring(soundex(xx.lastname),0,4)=substring(soundex(yy.Lastname),0,4)
and substring(xx.address,1,7)= substring(yy.address,1,7)
and xx.city=yy.city;

Thank you

id....

rmiao
Master Smack Fu Yak Hacker

7266 Posts

Posted - 2007-05-04 : 22:26:54

How can you get same result with different search condition?

idsck
Starting Member

16 Posts

Posted - 2007-05-04 : 23:15:22

of course not.
What I want or wish to find out is how to optimize my query and get more similar match.

rmiao
Master Smack Fu Yak Hacker

7266 Posts

Posted - 2007-05-05 : 12:33:14

Use 'where xx.phone=yy.phone' only in your select statement, then check values in other columns to see if can find some logical there.

idsck
Starting Member

16 Posts

Posted - 2007-05-05 : 14:10:37

rmiao: thanks. I think it's good idea to do that and identify the pattern of other columns. I have one last question on this. How can I use something like vowel algorithm or _similarity to compare these 2 sets of data? Thanks

rmiao
Master Smack Fu Yak Hacker

7266 Posts

Posted - 2007-05-05 : 18:07:55

Try tablediff if you use sql2k5.

jezemine
Master Smack Fu Yak Hacker

2886 Posts

Posted - 2007-05-05 : 18:45:35

to compare how close two strings are in "edit space" you can use the Levenshtein edit distance algorithm:

http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=51540

www.elsasoft.org

blindman
Master Smack Fu Yak Hacker

2365 Posts

Posted - 2007-05-06 : 10:01:24

This algorithm was developed specifically to do what you are asking. I've been using if for more than 10 years.
http://sqlblindman.googlepages.com/fuzzysearchalgorithm
SOUNDEX is not really appropriate for this. As a matter of fact, I'm not sure how many things SOUNDEX is useful for, except as a linguistic curiosity.

e4 d5 xd5 Nf6

idsck
Starting Member

16 Posts

Posted - 2007-05-06 : 13:54:21

Hi Blindman: wonder if you could briefly walk me through how to use your fuzzy algorithm and what are the steps I need to do before testing it. I am new and it so advance for me. Thank you.
id

blindman
Master Smack Fu Yak Hacker

2365 Posts

Posted - 2007-05-07 : 00:45:29

There are some tricks you can do to opimize this for large recordsets, but for only a few hundred records you can just do this:

select	Set1.Firstname,
	Set1.Lastname,
	Set2.Firstname,
	Set2.Lastname
from	Set1
	inner join Set2	on
		dbo.CompareText(
			dbo.MatchText(Set1.FirstName + Set1.LastName),
			dbo.MatchText(Set2.FirstName + Set2.LastName)
				) > 80 --Increase number for tighter search

e4 d5 xd5 Nf6

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-05-07 : 02:53:48

How do your phone numbers look like?
Are there hyphens? Spaces? Other characters than numbers?

Peter Larsson
Helsingborg, Sweden

idsck
Starting Member

16 Posts

Posted - 2007-05-07 : 10:22:13

Set1: 999-999-9999
Set2: 9999999999

so it's better to convert them into same format?

thanks
id

idsck
Starting Member

16 Posts

Posted - 2007-05-07 : 10:24:15

Thanks Blindman!!!

quote:
Originally posted by blindman

There are some tricks you can do to opimize this for large recordsets, but for only a few hundred records you can just do this:
select Set1.Firstname,
Set1.Lastname,
Set2.Firstname,
Set2.Lastname
from Set1
inner join Set2 on
dbo.CompareText(
dbo.MatchText(Set1.FirstName + Set1.LastName),
dbo.MatchText(Set2.FirstName + Set2.LastName)
) > 80 --Increase number for tighter search

e4 d5 xd5 Nf6

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-05-07 : 14:32:50

select Set1.Firstname,
Set1.Lastname,
Set2.Firstname,
Set2.Lastname
from Set1
inner join Set2 on replace(set2.phonenumber, '-', '') = replace(set1.phonenumber, '-', '')

Peter Larsson
Helsingborg, Sweden

blindman
Master Smack Fu Yak Hacker

2365 Posts

Posted - 2007-05-07 : 15:16:00

Not what he is looking for now...

quote:
Originally posted by idsck
My question is what technique that I can use to optimize the result just based on criteria like first, last, and address.

e4 d5 xd5 Nf6

idsck
Starting Member

16 Posts

Posted - 2007-05-08 : 22:17:41

quote:
Originally posted by blindman

Not what he is looking for now...
quote:
Originally posted by idsck
My question is what technique that I can use to optimize the result just based on criteria like first, last, and address.

e4 d5 xd5 Nf6

Thanks ALL!! it works well......

idsck
Starting Member

16 Posts

Posted - 2007-05-12 : 04:20:00

[quote]Originally posted by idsck

Thanks Blindman!!!

[quote]Originally posted by blindman

There are some tricks you can do to opimize this for large recordsets, but for only a few hundred records you can just do this:

select	Set1.Firstname,
	Set1.Lastname,
	Set2.Firstname,
	Set2.Lastname
from	Set1
	inner join Set2	on
		dbo.CompareText(
			dbo.MatchText(Set1.FirstName + Set1.LastName),
			dbo.MatchText(Set2.FirstName + Set2.LastName)
				) > 80 --Increase number for tighter search

how can i put this result into a new table? I tried to use something like this but it gave me syntax error. I also tried to use insert statement, but nothing work. Please advises and thank you so much!

Create Table exMatch As
(select Set1.Firstname,
Set1.Lastname,
Set2.Firstname,
Set2.Lastname
from Set1
inner join Set2 on
dbo.CompareText(
dbo.MatchText(Set1.FirstName + Set1.LastName),
dbo.MatchText(Set2.FirstName + Set2.LastName)
) > 80) --Increase number for tighten

blindman
Master Smack Fu Yak Hacker

2365 Posts

Posted - 2007-05-12 : 20:02:30

You need to use SELECT INTO rather than CREATE TABLE:


select	Set1.Firstname,
	Set1.Lastname,
	Set2.Firstname,
	Set2.Lastname
into	exMatch
from	Set1
	inner join Set2 on
	dbo.CompareText(
		dbo.MatchText(Set1.FirstName + Set1.LastName),
		dbo.MatchText(Set2.FirstName + Set2.LastName)
		) > 80)

e4 d5 xd5 Nf6

idsck
Starting Member

16 Posts

Posted - 2007-05-12 : 22:02:08

thanks!!!

marat
Yak Posting Veteran

85 Posts

Posted - 2008-04-08 : 01:20:23

This is quite a complicated issue.
I suggest use scoring to set up a matching threashold and to filter matching records.
You have to create scoring rules:
1. exact match on first name, last name, < other criterion>, <score>
2. exact match on first name initial, surname, <other criterion>, <score>
....all othere matching criterias
It all depends on business requirements; how complicated your matching engine should be.
Good luck.

jezemine
Master Smack Fu Yak Hacker

2886 Posts

Posted - 2008-04-08 : 02:38:11

quote:
Originally posted by marat

Good luck.

they will need a great amount of luck to stumble on this post after it's been dead a year...

elsasoft.org

Previous Page&nsp; Next Page

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources