Posted - 01/30/2013 : 17:09:31
| Hi guys,
I have 2 simple tables. both have 1 nvarchar(100) column and a PK ID column with an auto increment seed. What I want to do is return all of 1 table, and only the 2nd table where the value is very similar. (i.e. a spelling mistake, or the addition/absence of specific key words.). I also want to be able to use a thesaurus if possible. I've gone at this problem two way, but am after any advice or input please:
1) SSIS Fuzzy Lookup
It's working quite well, but not great for small words and cant seem to find a way to use a thesaurus file and to include stop words or noise words.
2) Freetext, Contains, and Formsof
Great functionality, but how would it work for my above example? Is it possible to join both tables together in this way, and 2 only return high scoring matches from the 2nd lookup table?
As an example I might have:
MyCompany Danmark in Table1
MyCompany Denmark in Table2
Danmark and Denmark in my thesaurus file
and therefore an exact match. Also If LTD is included in either table for that row, it'll still return as an exact match due to it being in some sort of stop/noise list. Any idea's on how to implement something like this? And am I on the right track?