Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 Fuzzy Match help (Levenshtein?!)

Author  Topic 

peitech
Starting Member

10 Posts

Posted - 2009-03-30 : 08:21:47
I would like to standardise the job titles we have on our database according to a predetermined lookup table.

So, I have created a table (currently just as a test) with about 100 job titles.

I now have a table which lists all the job titles, I want to somehow match these job titles to those in the lookup table, providing me with some kind of result and some level of confidence.

For example...

Lookup Table:

Job Title
* Partner
* General Partner
* CFO

Main Table:
Job Title
* Partner and COO
* Founding Partner
* Operating Partner
* ex-Partner-New York

So, I would want to 'match' up a title such as 'Operating Partner' with the possible choices in the lookup table - give results which gave the possible matches and the 'degree of match'

Perhaps something along the lines of:

Input Title Matched Title Score
Founding Partner Partner 0.8
Founding Partner General Partner 0.6

Sadly, we only have workgroup edition of SQL so cannot use the fuzzy capabilities of SSIS :/

Can anyone advise me?

sakets_2000
Master Smack Fu Yak Hacker

1472 Posts

Posted - 2009-03-30 : 08:32:33
Have you had a look at DIFFERENCE function ?
Go to Top of Page

peitech
Starting Member

10 Posts

Posted - 2009-03-30 : 09:49:45
A little yes, and charindex / patindex etc, however, what I am struggling to get my head around is how one would go through all the records in the lookup table for the one row in the main table?

e.g.
SELECT
c.id,
c.position,
charindex('Partner', c.position, 1) as Match
FROM tblcontacts c

Is fine, and tells me the records with titles like 'Managing Partner, Asia' etc, but I'm not sure how I can replace 'partner' with all the job titles from the lookup table in essence.
Go to Top of Page

peitech
Starting Member

10 Posts

Posted - 2009-03-30 : 10:19:05
Even on an example I'm not sure on that difference function...

select DIFFERENCE('Senior Partner', 'Partner')
select DIFFERENCE('Senior Partner', 'Analyst')

BOTH yield one, despite the fact that one is actually pretty similar and one is totally different!

Totally off the wall but
select DIFFERENCE('Bacon', 'Senior Partner')

Scores 2!!
Go to Top of Page

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2009-03-30 : 10:33:14
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=51540

___________________________________________________________________________
Causing trouble since 1980
Blog: http://weblogs.sqlteam.com/mladenp
Speed up SSMS development: www.ssmstoolspack.com <- version 1.5 out!
Go to Top of Page

blindman
Master Smack Fu Yak Hacker

2365 Posts

Posted - 2009-03-30 : 12:17:36
This will work for you:
http://sqlblindman.pastebin.com/f4fc1ccb5

________________________________________________
If it is not practically useful, then it is practically useless.
________________________________________________
Go to Top of Page

peitech
Starting Member

10 Posts

Posted - 2009-03-30 : 12:30:44
Blindman, many thanks, this is progress :) :

select dbo.CompareText('Partner', 'Partner') = 100
select dbo.CompareText('Partner, Asia', 'Partner') = 70
select dbo.CompareText('Bacon', 'Partner') = 0

However, the one thing I still can't get my head around is how to replace partner with something along the lines of SELECT job_title FROM tbljob_titles and to then get rows of hits for the 'Partner, Asia' record with varying hit rates from 100 down to 0, and I can then specify that the threshold has to be 70 or something?

Do you see what I'm trying to achive?
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-03-30 : 14:04:05
You can use the new CROSS APPLY to invoke the CompareText function.



E 12°55'05.63"
N 56°04'39.26"
Go to Top of Page

blindman
Master Smack Fu Yak Hacker

2365 Posts

Posted - 2009-03-31 : 10:58:53
[code]SELECT job_title,
dbo.CompareText(job_title, 'Partner, Asia') as MatchValue
FROM tbljob_titles
where dbo.CompareText(job_title, 'Partner, Asia') >=70
order by dbo.CompareText(job_title, 'Partner, Asia') desc[/code]
You could also throw a TOP in there to limit the results to the best matches.
Note that this function is a true fuzzy-search, and may be overkill if all you need is the LIKE function, which would seem to be sufficient for the example searches you gave against the string 'Partner'.
________________________________________________
If it is not practically useful, then it is practically useless.
________________________________________________
Go to Top of Page

peitech
Starting Member

10 Posts

Posted - 2009-03-31 : 11:26:43
Not sure why some sites specify that you have to have developer or enterprise version of SQL 2005 to use Fuzzy Lookup in SSIS - I read elsewhere this was not the case so I tried it on a really simple example and it seemed to work well.

Still a total newbie to SSIS, so it's quite different to bog standard TSQL!

The above will come in useful for something else I have in mind though :D - so thanks for that blindman!
Go to Top of Page

blindman
Master Smack Fu Yak Hacker

2365 Posts

Posted - 2009-03-31 : 12:10:07
I would be very hesitant to use the fuzzy lookup logic in SSIS, or use it for anything other that transferring data for that matter.
There is a tendency to put way too much business logic into ETL packages, and that leads to administrative and debugging nightmares. Witness all the shops that have yet to upgrade from 2000 to 2005 because they have so much code invested in their DTS packages and can't migrate them.
Use SSIS for moving records from data sources into staging tables, and then use sprocs to process the data in the staging tables. You'll end up with a much more robust, much easier to maintain system.

________________________________________________
If it is not practically useful, then it is practically useless.
________________________________________________
Go to Top of Page

peitech
Starting Member

10 Posts

Posted - 2009-03-31 : 12:21:04
Tbh, I am pretty unimpressed with SSIS at present - it's crashed on me quite a few times - and googling the crappy errors it gives tells me they are problems others are having too. Give me straight text typing any day lol.

The results of the fuzzy match were useful in some cases, and in some just terrible - but I guess that's what the confidence thing is all about.

This is isn't going to be used going forward, or integrated into our current system - it's being used just as a clean up assistant before we formalise the way job titles are dealt with. Basically we want to standardise the job titles as best as possible so they look half decent! - There's sure gonna be a lot of manual work involved :/
Go to Top of Page

blindman
Master Smack Fu Yak Hacker

2365 Posts

Posted - 2009-03-31 : 14:34:06
Fuzzy logic will reduce the manual effort involved, but not eliminate it.
In my experience with that function, anything value below 80 is questionable and needs to be reviewed.

________________________________________________
If it is not practically useful, then it is practically useless.
________________________________________________
Go to Top of Page
   

- Advertisement -