| Author |
Topic |
|
peitech
Starting Member
10 Posts |
Posted - 2009-03-30 : 08:21:47
|
| I would like to standardise the job titles we have on our database according to a predetermined lookup table. So, I have created a table (currently just as a test) with about 100 job titles. I now have a table which lists all the job titles, I want to somehow match these job titles to those in the lookup table, providing me with some kind of result and some level of confidence. For example... Lookup Table:Job Title* Partner* General Partner* CFOMain Table:Job Title* Partner and COO* Founding Partner * Operating Partner * ex-Partner-New YorkSo, I would want to 'match' up a title such as 'Operating Partner' with the possible choices in the lookup table - give results which gave the possible matches and the 'degree of match'Perhaps something along the lines of:Input Title Matched Title ScoreFounding Partner Partner 0.8Founding Partner General Partner 0.6Sadly, we only have workgroup edition of SQL so cannot use the fuzzy capabilities of SSIS :/ Can anyone advise me? |
|
|
sakets_2000
Master Smack Fu Yak Hacker
1472 Posts |
Posted - 2009-03-30 : 08:32:33
|
| Have you had a look at DIFFERENCE function ? |
 |
|
|
peitech
Starting Member
10 Posts |
Posted - 2009-03-30 : 09:49:45
|
| A little yes, and charindex / patindex etc, however, what I am struggling to get my head around is how one would go through all the records in the lookup table for the one row in the main table?e.g. SELECT c.id,c.position, charindex('Partner', c.position, 1) as Match FROM tblcontacts cIs fine, and tells me the records with titles like 'Managing Partner, Asia' etc, but I'm not sure how I can replace 'partner' with all the job titles from the lookup table in essence. |
 |
|
|
peitech
Starting Member
10 Posts |
Posted - 2009-03-30 : 10:19:05
|
| Even on an example I'm not sure on that difference function... select DIFFERENCE('Senior Partner', 'Partner')select DIFFERENCE('Senior Partner', 'Analyst')BOTH yield one, despite the fact that one is actually pretty similar and one is totally different!Totally off the wall but select DIFFERENCE('Bacon', 'Senior Partner')Scores 2!! |
 |
|
|
spirit1
Cybernetic Yak Master
11752 Posts |
Posted - 2009-03-30 : 10:33:14
|
| http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=51540___________________________________________________________________________Causing trouble since 1980Blog: http://weblogs.sqlteam.com/mladenpSpeed up SSMS development: www.ssmstoolspack.com <- version 1.5 out! |
 |
|
|
blindman
Master Smack Fu Yak Hacker
2365 Posts |
Posted - 2009-03-30 : 12:17:36
|
| This will work for you:http://sqlblindman.pastebin.com/f4fc1ccb5________________________________________________If it is not practically useful, then it is practically useless.________________________________________________ |
 |
|
|
peitech
Starting Member
10 Posts |
Posted - 2009-03-30 : 12:30:44
|
| Blindman, many thanks, this is progress :) :select dbo.CompareText('Partner', 'Partner') = 100select dbo.CompareText('Partner, Asia', 'Partner') = 70select dbo.CompareText('Bacon', 'Partner') = 0However, the one thing I still can't get my head around is how to replace partner with something along the lines of SELECT job_title FROM tbljob_titles and to then get rows of hits for the 'Partner, Asia' record with varying hit rates from 100 down to 0, and I can then specify that the threshold has to be 70 or something?Do you see what I'm trying to achive? |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2009-03-30 : 14:04:05
|
You can use the new CROSS APPLY to invoke the CompareText function. E 12°55'05.63"N 56°04'39.26" |
 |
|
|
blindman
Master Smack Fu Yak Hacker
2365 Posts |
Posted - 2009-03-31 : 10:58:53
|
| [code]SELECT job_title, dbo.CompareText(job_title, 'Partner, Asia') as MatchValueFROM tbljob_titleswhere dbo.CompareText(job_title, 'Partner, Asia') >=70order by dbo.CompareText(job_title, 'Partner, Asia') desc[/code]You could also throw a TOP in there to limit the results to the best matches.Note that this function is a true fuzzy-search, and may be overkill if all you need is the LIKE function, which would seem to be sufficient for the example searches you gave against the string 'Partner'.________________________________________________If it is not practically useful, then it is practically useless.________________________________________________ |
 |
|
|
peitech
Starting Member
10 Posts |
Posted - 2009-03-31 : 11:26:43
|
| Not sure why some sites specify that you have to have developer or enterprise version of SQL 2005 to use Fuzzy Lookup in SSIS - I read elsewhere this was not the case so I tried it on a really simple example and it seemed to work well. Still a total newbie to SSIS, so it's quite different to bog standard TSQL! The above will come in useful for something else I have in mind though :D - so thanks for that blindman! |
 |
|
|
blindman
Master Smack Fu Yak Hacker
2365 Posts |
Posted - 2009-03-31 : 12:10:07
|
| I would be very hesitant to use the fuzzy lookup logic in SSIS, or use it for anything other that transferring data for that matter.There is a tendency to put way too much business logic into ETL packages, and that leads to administrative and debugging nightmares. Witness all the shops that have yet to upgrade from 2000 to 2005 because they have so much code invested in their DTS packages and can't migrate them.Use SSIS for moving records from data sources into staging tables, and then use sprocs to process the data in the staging tables. You'll end up with a much more robust, much easier to maintain system.________________________________________________If it is not practically useful, then it is practically useless.________________________________________________ |
 |
|
|
peitech
Starting Member
10 Posts |
Posted - 2009-03-31 : 12:21:04
|
| Tbh, I am pretty unimpressed with SSIS at present - it's crashed on me quite a few times - and googling the crappy errors it gives tells me they are problems others are having too. Give me straight text typing any day lol. The results of the fuzzy match were useful in some cases, and in some just terrible - but I guess that's what the confidence thing is all about. This is isn't going to be used going forward, or integrated into our current system - it's being used just as a clean up assistant before we formalise the way job titles are dealt with. Basically we want to standardise the job titles as best as possible so they look half decent! - There's sure gonna be a lot of manual work involved :/ |
 |
|
|
blindman
Master Smack Fu Yak Hacker
2365 Posts |
Posted - 2009-03-31 : 14:34:06
|
| Fuzzy logic will reduce the manual effort involved, but not eliminate it.In my experience with that function, anything value below 80 is questionable and needs to be reviewed.________________________________________________If it is not practically useful, then it is practically useless.________________________________________________ |
 |
|
|
|