Author |
Topic |
perels
Starting Member
23 Posts |
Posted - 2007-01-13 : 19:25:29
|
Hi My problem is that I have a list of 5000+ rows in a database, and there are rows that are almost duplicate values.The layout is ProductID and ProductName. In the ProductName it could be something likeFruity LoopsFruiti LoopsFruity Loops.Fruity loopfruity loop . I have created something that will show exact duplicates, but this is not what I want:SELECT A.ProductID,A.ProductName FROM Products A JOIN (SELECT ProductID, ProductName FROM Products GROUP BY ProductName,ProductID HAVING COUNT(*) > 1) B on A.ProductName = B.ProductName Order BY A.ProductName I thought of using SOUNDEX, but I failed miserably trying to accomplish this...Hope somebody can/will give it try :-) |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
|
Alan Schofield
Starting Member
23 Posts |
Posted - 2007-01-16 : 11:29:05
|
You can use this simple function I created to do the same thing. It simply strips all the characters out that are non alpha-numeric. You can then use this in a join to test against.=========================================================CREATE FUNCTION fnStripToAlphaNumeric( @inputString nvarchar(4000)) RETURNS varchar(4000)AS BEGIN DECLARE @Counter as int DECLARE @strReturnVal varchar(4000) DECLARE @Len as int DECLARE @ASCII as int SET @Counter=0 SET @Len=LEN(@inputString) SET @strReturnVal = '' WHILE @Counter<=@Len BEGIN SET @Counter = @Counter +1 SET @ascii= ASCII(SUBSTRING(@inputString,@counter,1)) IF(@ascii BETWEEN 65 AND 90) OR (@ascii BETWEEN 97 AND 122) OR (@ascii BETWEEN 48 AND 57) BEGIN SET @strReturnVal = @strReturnVal + (SUBSTRING(@inputString,@counter,1)) END END RETURN @strReturnValEND=========================================================So in practice you can use the followingselect a.* from Products ajoin Products bon dbo.fnStripToAlphaNumeric(a.ProductName) = dbo.fnStripToAlphaNumeric(b.ProductName) where a.ProductID <> b.ProductID |
 |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-01-16 : 11:50:31
|
How would that make "Fruit Loop" and "Fruity Loops" the same?Peter LarssonHelsingborg, Sweden |
 |
|
Alan Schofield
Starting Member
23 Posts |
Posted - 2007-01-16 : 12:00:40
|
quote: Originally posted by Peso How would that make "Fruit Loop" and "Fruity Loops" the same?Peter LarssonHelsingborg, Sweden
Well spotted ! :))I guess I shold have read this more closely... :(BTW: Larsson from Helsingborg ??? You've had all the Man United jokes by now then? ;-) |
 |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-01-16 : 12:12:46
|
Like Larsson was dissed and the United fans screamed "He's too old!"?I went to Old Trafford to see Beckham just before his move to Spain and the english soccer fans are insane!Peter LarssonHelsingborg, Sweden |
 |
|
perels
Starting Member
23 Posts |
Posted - 2007-01-16 : 14:43:16
|
Thanks to you Peso - I haven't heard of Levenshtein before, but I will look into his algorithm :-)Regarding to your message Alan Schofield, thanks for the effort, but I need something that will look on all characters and match them :-) |
 |
|
perels
Starting Member
23 Posts |
Posted - 2007-01-16 : 16:45:58
|
I am kind of stuck - I have grasped the levenshtein algo and has also read about Double Metaphone, Hammer distance and so on... the problem is now to construct the SQL - could you help me with that? |
 |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-01-16 : 17:27:04
|
Start with this. It will show you how alike all the combinations of the data are.--prepare sample datadeclare @t table (productname varchar(100))insert @tselect 'Fruity Loops' union allselect 'Fruiti Loops' union allselect 'Fruity Loops.' union allselect 'Fruity loop' union allselect 'fruity loop'-- show the variancesselect t1.productname, t2.productname, dbo.edit_distance(t1.productname, t2.productname) from @t as t1cross join @t as t2where t1.productname < t2.productnameorder by t1.productname, t2.productname I cannot do more since I have no idea of your business rules how to deal with this information.Peter LarssonHelsingborg, Sweden |
 |
|
|