Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
flashvenom
Starting Member
1 Post |
Posted - 2010-01-28 : 16:18:25
|
| Hi all, I'm not entirely new to sql, but I'm by far knowledgable outside of fundamental select statements. What I'm trying to do is enhance our site's search functionality by adding a relevance column to a search. For instance I search for "Dali" ... stories with "Salvador Dali" as the name need to rank higher than "Nelson Dalibhunga Mandela". One more criteria - "featured" stories with the same name need to rank higher than non-featured. So literal hits that are featured would have a rank of two, literal hits non-featured would be 1, and everything else would be 0. You can probably see my gap in knowledge with my script below. I've written three case statements to check for a literal hit ('% Dali %', '% Dali', and 'Dali %') making sure the name isn't in the middle of a string and then another case to check to see if it's featured. My problem is simple - adding the computed columns together as a new computed column. Leads me to believe I'm taking the wrong approach. I've tried using a local variable, but I can't figure out how to set a variable within a CASE statement...SELECT pageid, personname, featured, literalhit = CASE WHEN personnameLIKE '% ' + @keyword THEN 1 WHEN personname LIKE 'dali %' THEN 1 WHEN personname LIKE '% dali %' THEN 1 ELSE 0 END ,featured = CASE patindex('%_ul%',d.hero) WHEN 0 then 1 ELSE 0 END ,SUM(literalhit+featured) as relevanceFROM mainTbl WHERE (personname like '%' + @keyword + '%')ORDER BY rank, personnameAny input would be greatly appreciated! Thank you! |
|
|
RobertKaucher
Posting Yak Master
169 Posts |
Posted - 2010-01-28 : 21:22:51
|
| If I understand this right the issue is the way SQL Server processes a query. You cannot reference an alias in a select statement.Imagine if you had a table with two INT columns and you selected them using AS INT1 and AS INT2 and then tried to perform a sum of the alias.Like this:--Bad querySELECT Number1 AS INT1, Number2 AS INT2, INT1+INT2 AS TotalFROM Tbl_Ints This won't work because within the SELECT statement SQL Server is looking in the table for the selected columns but it cannot find any columns called INT1 and INT2 to add together. In your query it cannot find any columns called literalhit nor featured from which it can select a value to compute. Also, SUM is being used incorrectly. I would suggest another CASE statement here that combines the criteria from the previous select statements using AND. I'm not sure if there is a better way to do it... |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-01-29 : 03:19:33
|
Just add the "personname contains keyword" to the "Is featured?" to get RANK.SELECT pageid, personname, featured,literalhitrelevance = CASEWHEN personname LIKE '% ' + @keyword THEN 1WHEN personname LIKE @keyword + ' %' THEN 1WHEN personname LIKE '% ' + @keyword + ' %' THEN 1ELSE 0END+ CASE patindex('%_ul%',d.hero)WHEN 0 then 1ELSE 0END,featured = CASE patindex('%_ul%',d.hero)WHEN 0 then 1ELSE 0END,SUM(literalhit+featured) as relevanceFROM mainTblWHERE (personname like '%' + @keyword + '%')ORDER BY rank relevance DESC, personname We do similar, but provide higher ranking for the work surrounded by spaces. Also need to consider if there is an exact match[code]SELECT pageid, personname, featured,relevance = CASEWHEN ' ' + personname + ' ' LIKE '% ' + @keyworddali + ' %' THEN 10 -- Whole word, or exact matchWHEN ' ' + personname LIKE '% ' + @keyword + '%' THEN 5 -- Word (or field) Starts-WithWHEN personname + ' ' LIKE '%' + @keyword + ' %' THEN 1 -- Word (or fields) Ends-WithELSE 0END* CASE patindex('%_ul%',d.hero)WHEN 0 then 2ELSE 1ENDFROM mainTblWHERE (personname like '%' + @keyword + '%')ORDER BY relevance DESC, personnamethis provides high weighting for whole word, medium for starts-with and low for ends-with. That weighting is, say, doubled if the person is featured.What if the person is searching for multiple words? We split out "foo bar" and, in effect, SUM the results for both words. So something containing both Foo and Bar will score higher than a record with only one of them, but rows with on Foo OR Bar will still be included, just lower down the list.We also check for pseudo-plurals - So a search phrase of "Foo Bar" would also include "Foos" and "Bars", and a search for "Foos Bars" would also include "Foo" and "Bar" (Ditto for "Y" and "IES" etc.). Bit crude, but a few extra hits, with low weightings, seemed better than possible missing something altogether!If "Featured" is commonly searched it should be in a separate field (e.g. a BIT column with 1 or 0), not needing a search for "_ul" within the fieldNote that (I think) your search will include Featured who do NOT match the keyword - so you will always get Featured people (that may be your intention).Multiplying the Relevance by Featured gives more weight to featured people but ONLY if they also match the Keyword |
 |
|
|
RobertKaucher
Posting Yak Master
169 Posts |
Posted - 2010-01-29 : 13:54:23
|
| Very nice! |
 |
|
|
|
|
|
|
|