Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 Old Forums
 CLOSED - General SQL Server
 Proper Name Pattern Matching

Author  Topic 

AskSQLTeam
Ask SQLTeam Question

0 Posts

Posted - 2004-03-09 : 08:33:20
Brian writes "I need to know if there is anything in SQL Server that would allow me to find all forms of a proper name, say like 'Joseph'. Where it would find 'Joe' and 'Joey'. I have tried using the full text search and although I can use it on normal nouns and it works fine, I can't seem to get it to match on proper nouns (names). Any help would be greatly appreciated."

X002548
Not Just a Number

15586 Posts

Posted - 2004-03-09 : 11:11:16
You need to create your own fuzzy logic routine...and that will slow things up pretty much..

You could also create an alias table that stores both values....



Brett

8-)
Go to Top of Page

ehorn
Master Smack Fu Yak Hacker

1632 Posts

Posted - 2004-03-09 : 11:18:50
Have a look at "Pattern Matching in Search Conditions" in BOL. Maybe this will help in your routine.
Go to Top of Page

raymondpeacock
Constraint Violating Yak Guru

367 Posts

Posted - 2004-03-09 : 11:27:30
Thats strictly speaking not going to be too much help. Pattern matching won't return 'Joe' from 'Joseph' unless you severely limit it to the first two characters.

I'm with Brett in thinking an alias table might be the best solution, but where would you get a sizeable rowset of all names and their possible derivatives as a starting point!


Raymond
Go to Top of Page

AndrewMurphy
Master Smack Fu Yak Hacker

2916 Posts

Posted - 2004-03-10 : 08:47:10
"but where would you get a sizeable rowset of all names and their possible derivatives as a starting point!"


You build one...slowly...

Best done through a QA process...ie build a list of (all/most) possible matches from your input data set...couple of queries should be able to drum up most unique 1st names (in reality there are very few in active use in the English language..<200...problems will come about with non-native(English) names...ie Francois, Bertrand, Nazim, etc...but most of these wouldn't have short-cuts anyway(???)), set the matches to 'pending'....get somebody to review each of the pairs and set pending -> approved....and then over time people can add/remove combinations at will.

It looks daunting...but in reality is a bit smaller than at 1st perception.
Go to Top of Page

Qualm
Starting Member

7 Posts

Posted - 2004-03-10 : 16:22:26
This is a common problem in my industry (real estate title research), and there are actually companies which provide middleware to help with this kind of thing. Try http://www.name-searching.com/Working/Default.htm as an example.

- Qualm
Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 2004-03-10 : 16:44:40
Hundred bucks they have an alias table in there somewhere, and some type of mathmatical algorythym that keeps a score...



Brett

8-)
Go to Top of Page

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2004-03-10 : 17:28:06
SQL server does have an implementation of the SOUNDEX algorithm, which turns words into what they sound like for matching purposes. look up SOUNDEX in books on-line. I used it in college a little but I don't recall how good or useful it is ...


- Jeff
Go to Top of Page

mohdowais
Sheikh of Yak Knowledge

1456 Posts

Posted - 2004-03-11 : 00:30:26
The way SOUNDEX works is quite simple, it strips the vowels from the words, removes continguous duplicate characters and then evaluates the characters left over. It can also compare similar sounding characters (such as "j" and "g" or "c" and "s") at the same position in both strings.

Another function of interest is DIFFERENCE, which internally uses SOUNDEX to compute how much the two strings resemble each other. DIFFERENCE returns a score between 0 and 4, with 4 being the best match. For near-matches you can use SOUNDEX directly, but for fuzzier searches, DIFFERENCE comes in handy.

The results, though, are not very useful and SOUNDEX suffers from a few limitations, the biggest being the inability to process beyond the first non-alphabet character in the string. So if you have a space or a number in your string, only the portion to the left of the character is evaluated.

OS
Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 2004-03-11 : 09:05:52
quote:
Originally posted by mohdowais

The way SOUNDEX works is quite simple, it strips the vowels from the words, removes continguous duplicate characters and then evaluates the characters left over. It can also compare similar sounding characters (such as "j" and "g" or "c" and "s") at the same position in both strings.

Another function of interest is DIFFERENCE, which internally uses SOUNDEX to compute how much the two strings resemble each other. DIFFERENCE returns a score between 0 and 4, with 4 being the best match. For near-matches you can use SOUNDEX directly, but for fuzzier searches, DIFFERENCE comes in handy.

The results, though, are not very useful and SOUNDEX suffers from a few limitations, the biggest being the inability to process beyond the first non-alphabet character in the string. So if you have a space or a number in your string, only the portion to the left of the character is evaluated.

OS



That's impressive....how did you learn that?

Still like the alias table thing....



Brett

8-)
Go to Top of Page

Qualm
Starting Member

7 Posts

Posted - 2004-03-11 : 09:35:43
I'm certain that alias tables are used in the commerical products, and they are definitely the best way to go as far as searching for name variations.

Example: Liz and Beth can't be SOUNDEX'd, DIFFERENCE'd, or pattern matched, yet they are both common diminutives of Elizabeth.

- Qualm
Go to Top of Page

mohdowais
Sheikh of Yak Knowledge

1456 Posts

Posted - 2004-03-12 : 07:36:30
>>That's impressive....how did you learn that?

Books Online is your best friend...

Visit the MS Newsgroups once in a while. The newsgroups are frequented by Microsoft staff and they often let you in on some undocumented stuff. The MSDN chats section is another nice place to get such information. The MS people will tell you all sorts of stuff in the chats, if you push the right buttons

OS
Go to Top of Page

Merkin
Funky Drop Bear Fearing SQL Dude!

4970 Posts

Posted - 2004-03-12 : 11:09:14
If you have a copy of the Guru's Guide to TSQL, Ken takes the built in Soundex function and deconstructs it and improves it. Not that I think this will help the original poster, I think an alias table is going to be needed as well. But it's a nice academic interlude


Damian
Go to Top of Page

get2vil
Starting Member

1 Post

Posted - 2004-04-11 : 06:39:45
Try searching baby names alternatives websites. An example is below (for letter A)

http://baby-names.adoption.com/browse/a.html
Go to Top of Page
   

- Advertisement -