SQL Server Forums
Profile | Register | Active Topics | Members | Search | Forum FAQ
 
Register Now and get your question answered!
Username:
Password:
Save Password
Forgot your Password?

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 Extact Match
 New Topic  Reply to Topic
 Printer Friendly
Next Page
Author Previous Topic Topic Next Topic
Page: of 2

idsck
Starting Member

16 Posts

Posted - 05/04/2007 :  22:11:08  Show Profile  Reply with Quote
Scenario: 2 sets of data
Set 1 (200 records): firstname, lastname, address, city, phone
Set 2 (100 records): firstname, lastname, address, city, phone

I run a query using phone as criteria/condition (where xx.phone=yy.phone). I got 75 matches based on the phone numbers

However, if I deleted the phone numbers from set 2 data, used below query, the return result is only 45 matches. My question is what technique that I can use to optimize the result just based on criteria like first, last, and address.

select xx.firstname, xx.lastname, yy.Firstname, yy.Lastname, xx.address, yy.address, xx.city, yy.city, yy.phone
from set2 xx, set1 yy where
substring(soundex(xx.Firstname),0,3)=substring(soundex(yy.Firstname),0,3)
and substring(soundex(xx.lastname),0,4)=substring(soundex(yy.Lastname),0,4)
and substring(xx.address,1,7)= substring(yy.address,1,7)
and xx.city=yy.city;


Thank you

id....

Edited by - idsck on 05/04/2007 22:14:32

rmiao
Flowing Fount of Yak Knowledge

USA
7266 Posts

Posted - 05/04/2007 :  22:26:54  Show Profile  Reply with Quote
How can you get same result with different search condition?
Go to Top of Page

idsck
Starting Member

16 Posts

Posted - 05/04/2007 :  23:15:22  Show Profile  Reply with Quote
of course not.
What I want or wish to find out is how to optimize my query and get more similar match.
Go to Top of Page

rmiao
Flowing Fount of Yak Knowledge

USA
7266 Posts

Posted - 05/05/2007 :  12:33:14  Show Profile  Reply with Quote
Use 'where xx.phone=yy.phone' only in your select statement, then check values in other columns to see if can find some logical there.
Go to Top of Page

idsck
Starting Member

16 Posts

Posted - 05/05/2007 :  14:10:37  Show Profile  Reply with Quote
rmiao: thanks. I think it's good idea to do that and identify the pattern of other columns. I have one last question on this. How can I use something like vowel algorithm or _similarity to compare these 2 sets of data? Thanks
Go to Top of Page

rmiao
Flowing Fount of Yak Knowledge

USA
7266 Posts

Posted - 05/05/2007 :  18:07:55  Show Profile  Reply with Quote
Try tablediff if you use sql2k5.
Go to Top of Page

jezemine
Flowing Fount of Yak Knowledge

USA
2886 Posts

Posted - 05/05/2007 :  18:45:35  Show Profile  Visit jezemine's Homepage  Reply with Quote
to compare how close two strings are in "edit space" you can use the Levenshtein edit distance algorithm:

http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=51540





www.elsasoft.org
Go to Top of Page

blindman
Flowing Fount of Yak Knowledge

USA
2365 Posts

Posted - 05/06/2007 :  10:01:24  Show Profile  Reply with Quote
This algorithm was developed specifically to do what you are asking. I've been using if for more than 10 years.
http://sqlblindman.googlepages.com/fuzzysearchalgorithm
SOUNDEX is not really appropriate for this. As a matter of fact, I'm not sure how many things SOUNDEX is useful for, except as a linguistic curiosity.

e4 d5 xd5 Nf6
Go to Top of Page

idsck
Starting Member

16 Posts

Posted - 05/06/2007 :  13:54:21  Show Profile  Reply with Quote
Hi Blindman: wonder if you could briefly walk me through how to use your fuzzy algorithm and what are the steps I need to do before testing it. I am new and it so advance for me. Thank you.
id
Go to Top of Page

blindman
Flowing Fount of Yak Knowledge

USA
2365 Posts

Posted - 05/07/2007 :  00:45:29  Show Profile  Reply with Quote
There are some tricks you can do to opimize this for large recordsets, but for only a few hundred records you can just do this:
select	Set1.Firstname,
	Set1.Lastname,
	Set2.Firstname,
	Set2.Lastname
from	Set1
	inner join Set2	on
		dbo.CompareText(
			dbo.MatchText(Set1.FirstName + Set1.LastName),
			dbo.MatchText(Set2.FirstName + Set2.LastName)
				) > 80 --Increase number for tighter search


e4 d5 xd5 Nf6

Edited by - blindman on 05/07/2007 00:46:30
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

Sweden
30217 Posts

Posted - 05/07/2007 :  02:53:48  Show Profile  Visit SwePeso's Homepage  Reply with Quote
How do your phone numbers look like?
Are there hyphens? Spaces? Other characters than numbers?


Peter Larsson
Helsingborg, Sweden
Go to Top of Page

idsck
Starting Member

16 Posts

Posted - 05/07/2007 :  10:22:13  Show Profile  Reply with Quote
Set1: 999-999-9999
Set2: 9999999999

so it's better to convert them into same format?

thanks
id
Go to Top of Page

idsck
Starting Member

16 Posts

Posted - 05/07/2007 :  10:24:15  Show Profile  Reply with Quote
Thanks Blindman!!!

quote:
Originally posted by blindman

There are some tricks you can do to opimize this for large recordsets, but for only a few hundred records you can just do this:
select	Set1.Firstname,
	Set1.Lastname,
	Set2.Firstname,
	Set2.Lastname
from	Set1
	inner join Set2	on
		dbo.CompareText(
			dbo.MatchText(Set1.FirstName + Set1.LastName),
			dbo.MatchText(Set2.FirstName + Set2.LastName)
				) > 80 --Increase number for tighter search


e4 d5 xd5 Nf6

Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

Sweden
30217 Posts

Posted - 05/07/2007 :  14:32:50  Show Profile  Visit SwePeso's Homepage  Reply with Quote
select Set1.Firstname,
Set1.Lastname,
Set2.Firstname,
Set2.Lastname
from Set1
inner join Set2 on replace(set2.phonenumber, '-', '') = replace(set1.phonenumber, '-', '')


Peter Larsson
Helsingborg, Sweden
Go to Top of Page

blindman
Flowing Fount of Yak Knowledge

USA
2365 Posts

Posted - 05/07/2007 :  15:16:00  Show Profile  Reply with Quote
Not what he is looking for now...
quote:
Originally posted by idsck
My question is what technique that I can use to optimize the result just based on criteria like first, last, and address.



e4 d5 xd5 Nf6
Go to Top of Page

idsck
Starting Member

16 Posts

Posted - 05/08/2007 :  22:17:41  Show Profile  Reply with Quote
quote:
Originally posted by blindman

Not what he is looking for now...
quote:
Originally posted by idsck
My question is what technique that I can use to optimize the result just based on criteria like first, last, and address.



e4 d5 xd5 Nf6





Thanks ALL!! it works well......

Go to Top of Page

idsck
Starting Member

16 Posts

Posted - 05/12/2007 :  04:20:00  Show Profile  Reply with Quote
[quote]Originally posted by idsck

Thanks Blindman!!!

[quote]Originally posted by blindman

There are some tricks you can do to opimize this for large recordsets, but for only a few hundred records you can just do this:
select	Set1.Firstname,
	Set1.Lastname,
	Set2.Firstname,
	Set2.Lastname
from	Set1
	inner join Set2	on
		dbo.CompareText(
			dbo.MatchText(Set1.FirstName + Set1.LastName),
			dbo.MatchText(Set2.FirstName + Set2.LastName)
				) > 80 --Increase number for tighter search


how can i put this result into a new table? I tried to use something like this but it gave me syntax error. I also tried to use insert statement, but nothing work. Please advises and thank you so much!


Create Table exMatch As
(select Set1.Firstname,
Set1.Lastname,
Set2.Firstname,
Set2.Lastname
from Set1
inner join Set2 on
dbo.CompareText(
dbo.MatchText(Set1.FirstName + Set1.LastName),
dbo.MatchText(Set2.FirstName + Set2.LastName)
) > 80) --Increase number for tighten

Go to Top of Page

blindman
Flowing Fount of Yak Knowledge

USA
2365 Posts

Posted - 05/12/2007 :  20:02:30  Show Profile  Reply with Quote
You need to use SELECT INTO rather than CREATE TABLE:

select	Set1.Firstname,
	Set1.Lastname,
	Set2.Firstname,
	Set2.Lastname
into	exMatch
from	Set1
	inner join Set2 on
	dbo.CompareText(
		dbo.MatchText(Set1.FirstName + Set1.LastName),
		dbo.MatchText(Set2.FirstName + Set2.LastName)
		) > 80)


e4 d5 xd5 Nf6
Go to Top of Page

idsck
Starting Member

16 Posts

Posted - 05/12/2007 :  22:02:08  Show Profile  Reply with Quote
thanks!!!
Go to Top of Page

marat
Yak Posting Veteran

Australia
85 Posts

Posted - 04/08/2008 :  01:20:23  Show Profile  Reply with Quote
This is quite a complicated issue.
I suggest use scoring to set up a matching threashold and to filter matching records.
You have to create scoring rules:
1. exact match on first name, last name, < other criterion>, <score>
2. exact match on first name initial, surname, <other criterion>, <score>
....all othere matching criterias
It all depends on business requirements; how complicated your matching engine should be.
Good luck.

Go to Top of Page

jezemine
Flowing Fount of Yak Knowledge

USA
2886 Posts

Posted - 04/08/2008 :  02:38:11  Show Profile  Visit jezemine's Homepage  Reply with Quote
quote:
Originally posted by marat


Good luck.



they will need a great amount of luck to stumble on this post after it's been dead a year...




elsasoft.org
Go to Top of Page
Page: of 2 Previous Topic Topic Next Topic  
Next Page
 New Topic  Reply to Topic
 Printer Friendly
Jump To:
SQL Server Forums © 2000-2009 SQLTeam Publishing, LLC Go To Top Of Page
This page was generated in 0.23 seconds. Powered By: Snitz Forums 2000