| Author |
Topic |
|
AskSQLTeam
Ask SQLTeam Question
0 Posts |
Posted - 2004-03-09 : 08:33:20
|
| Brian writes "I need to know if there is anything in SQL Server that would allow me to find all forms of a proper name, say like 'Joseph'. Where it would find 'Joe' and 'Joey'. I have tried using the full text search and although I can use it on normal nouns and it works fine, I can't seem to get it to match on proper nouns (names). Any help would be greatly appreciated." |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2004-03-09 : 11:11:16
|
| You need to create your own fuzzy logic routine...and that will slow things up pretty much..You could also create an alias table that stores both values....Brett8-) |
 |
|
|
ehorn
Master Smack Fu Yak Hacker
1632 Posts |
Posted - 2004-03-09 : 11:18:50
|
| Have a look at "Pattern Matching in Search Conditions" in BOL. Maybe this will help in your routine. |
 |
|
|
raymondpeacock
Constraint Violating Yak Guru
367 Posts |
Posted - 2004-03-09 : 11:27:30
|
| Thats strictly speaking not going to be too much help. Pattern matching won't return 'Joe' from 'Joseph' unless you severely limit it to the first two characters. I'm with Brett in thinking an alias table might be the best solution, but where would you get a sizeable rowset of all names and their possible derivatives as a starting point!Raymond |
 |
|
|
AndrewMurphy
Master Smack Fu Yak Hacker
2916 Posts |
Posted - 2004-03-10 : 08:47:10
|
| "but where would you get a sizeable rowset of all names and their possible derivatives as a starting point!"You build one...slowly...Best done through a QA process...ie build a list of (all/most) possible matches from your input data set...couple of queries should be able to drum up most unique 1st names (in reality there are very few in active use in the English language..<200...problems will come about with non-native(English) names...ie Francois, Bertrand, Nazim, etc...but most of these wouldn't have short-cuts anyway(???)), set the matches to 'pending'....get somebody to review each of the pairs and set pending -> approved....and then over time people can add/remove combinations at will.It looks daunting...but in reality is a bit smaller than at 1st perception. |
 |
|
|
Qualm
Starting Member
7 Posts |
Posted - 2004-03-10 : 16:22:26
|
| This is a common problem in my industry (real estate title research), and there are actually companies which provide middleware to help with this kind of thing. Try http://www.name-searching.com/Working/Default.htm as an example.- Qualm |
 |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2004-03-10 : 16:44:40
|
| Hundred bucks they have an alias table in there somewhere, and some type of mathmatical algorythym that keeps a score...Brett8-) |
 |
|
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2004-03-10 : 17:28:06
|
| SQL server does have an implementation of the SOUNDEX algorithm, which turns words into what they sound like for matching purposes. look up SOUNDEX in books on-line. I used it in college a little but I don't recall how good or useful it is ...- Jeff |
 |
|
|
mohdowais
Sheikh of Yak Knowledge
1456 Posts |
Posted - 2004-03-11 : 00:30:26
|
| The way SOUNDEX works is quite simple, it strips the vowels from the words, removes continguous duplicate characters and then evaluates the characters left over. It can also compare similar sounding characters (such as "j" and "g" or "c" and "s") at the same position in both strings. Another function of interest is DIFFERENCE, which internally uses SOUNDEX to compute how much the two strings resemble each other. DIFFERENCE returns a score between 0 and 4, with 4 being the best match. For near-matches you can use SOUNDEX directly, but for fuzzier searches, DIFFERENCE comes in handy. The results, though, are not very useful and SOUNDEX suffers from a few limitations, the biggest being the inability to process beyond the first non-alphabet character in the string. So if you have a space or a number in your string, only the portion to the left of the character is evaluated.OS |
 |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2004-03-11 : 09:05:52
|
quote: Originally posted by mohdowais The way SOUNDEX works is quite simple, it strips the vowels from the words, removes continguous duplicate characters and then evaluates the characters left over. It can also compare similar sounding characters (such as "j" and "g" or "c" and "s") at the same position in both strings. Another function of interest is DIFFERENCE, which internally uses SOUNDEX to compute how much the two strings resemble each other. DIFFERENCE returns a score between 0 and 4, with 4 being the best match. For near-matches you can use SOUNDEX directly, but for fuzzier searches, DIFFERENCE comes in handy. The results, though, are not very useful and SOUNDEX suffers from a few limitations, the biggest being the inability to process beyond the first non-alphabet character in the string. So if you have a space or a number in your string, only the portion to the left of the character is evaluated.OS
That's impressive....how did you learn that?Still like the alias table thing....Brett8-) |
 |
|
|
Qualm
Starting Member
7 Posts |
Posted - 2004-03-11 : 09:35:43
|
| I'm certain that alias tables are used in the commerical products, and they are definitely the best way to go as far as searching for name variations. Example: Liz and Beth can't be SOUNDEX'd, DIFFERENCE'd, or pattern matched, yet they are both common diminutives of Elizabeth.- Qualm |
 |
|
|
mohdowais
Sheikh of Yak Knowledge
1456 Posts |
Posted - 2004-03-12 : 07:36:30
|
>>That's impressive....how did you learn that?Books Online is your best friend... Visit the MS Newsgroups once in a while. The newsgroups are frequented by Microsoft staff and they often let you in on some undocumented stuff. The MSDN chats section is another nice place to get such information. The MS people will tell you all sorts of stuff in the chats, if you push the right buttons OS |
 |
|
|
Merkin
Funky Drop Bear Fearing SQL Dude!
4970 Posts |
Posted - 2004-03-12 : 11:09:14
|
If you have a copy of the Guru's Guide to TSQL, Ken takes the built in Soundex function and deconstructs it and improves it. Not that I think this will help the original poster, I think an alias table is going to be needed as well. But it's a nice academic interlude Damian |
 |
|
|
get2vil
Starting Member
1 Post |
Posted - 2004-04-11 : 06:39:45
|
| Try searching baby names alternatives websites. An example is below (for letter A)http://baby-names.adoption.com/browse/a.html |
 |
|
|
|