Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
Jaeden
Starting Member
2 Posts |
Posted - 2008-01-17 : 19:52:39
|
| Greetings, SQL gurus.I have a table (creatively named Testing at the moment) which I am querying via PHP. The table contains several columns, but there is one more important than the rest, an nvarchar description ("description"). I'm loading rows up into a Javascript text fader to show random samples of the rows. However, many of the "description" values are VERY close together - a few characters difference over 300 or 400 characters. It looks silly to fade one out and replace it with another, almost-the-same description, because unless the page visitor is looking very carefully, he will likely think it just showed the same description twice.I'm trying to write a query which filters out the "similar" ones. Let me post a simplified example:ID Desc---------------------------1 This is a neat widget!2 This thingamabob is great!3 I love the sexy doodad!4 This is a cool widget!If I were to run the desired query and give it, say, a rowcount of 2, I would want it to return rows 2 and 3, since 1 and 4 have other, more similar rows present. I've done about 2 hours of google fu, and determined that the DIFFERENCE() and/or SOUNDEX() functions in MSSQL are my salvation. However, that seems only to compare two strings I provide. I'd thought to use the Group By or Order By (in conjunction with a rowcount) to select only the X number of "most unique" strings (say, the ones having a minimum DIFFERENCE() of 3 from ALL other rows in the column). I've been beating my head against a blank query window for hours, and every join, group, order, case I write gets me no closer - I'm hesitant to post any queries because none of them were close enough to go, "Hey, I'm onto something here!". Because it's a web-based app, I'd like to avoid creating a temp table if I can (for speed) but will do it if it's the only way.Please advise, and thank you in advance for your assistance. |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2008-01-18 : 02:09:26
|
Have you tried to use the CROSS APPLY together with a split function? E 12°55'05.25"N 56°04'39.16" |
 |
|
|
Jaeden
Starting Member
2 Posts |
Posted - 2008-01-18 : 09:27:01
|
quote: Originally posted by Peso Have you tried to use the CROSS APPLY together with a split function?
Actually, I've never heard of such a thing. Could you give me an example of how I might use it? |
 |
|
|
harsh_athalye
Master Smack Fu Yak Hacker
5581 Posts |
|
|
blindman
Master Smack Fu Yak Hacker
2365 Posts |
Posted - 2008-01-18 : 10:44:19
|
Jaeden, this fuzzy-search algorithm measures the similarity between different strings:http://sqlblindman.googlepages.com/fuzzysearchalgorithmExample:select *into #Samplesfrom (select 1 as ID, 'This is a neat widget!' as SampleStringunion select 2, 'This thingamabob is great!'union select 3, 'I love the sexy doodad!'union select 4, 'This is a cool widget!') Samplesselect A.SampleString, B.SampleString, dbo.CompareText(A.SampleString, B.SampleString) as Similarityfrom #Samples A, #Samples Bwhere A.ID <> B.IDorder by dbo.CompareText(A.SampleString, B.SampleString) ascdrop table #Samples--Output:SampleString SampleString Similarity-------------------------- -------------------------- -----------I love the sexy doodad! This is a neat widget! 4This is a neat widget! I love the sexy doodad! 4This is a cool widget! I love the sexy doodad! 9I love the sexy doodad! This is a cool widget! 9This thingamabob is great! I love the sexy doodad! 10I love the sexy doodad! This thingamabob is great! 10This is a cool widget! This thingamabob is great! 39This thingamabob is great! This is a cool widget! 39This thingamabob is great! This is a neat widget! 47This is a neat widget! This thingamabob is great! 47This is a cool widget! This is a neat widget! 76This is a neat widget! This is a cool widget! 76 e4 d5 xd5 Nf6 |
 |
|
|
|
|
|
|
|