Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 non-simple search using like

Author  Topic 

flashvenom
Starting Member

1 Post

Posted - 2010-01-28 : 16:18:25
Hi all, I'm not entirely new to sql, but I'm by far knowledgable outside of fundamental select statements. What I'm trying to do is enhance our site's search functionality by adding a relevance column to a search. For instance I search for "Dali" ... stories with "Salvador Dali" as the name need to rank higher than "Nelson Dalibhunga Mandela". One more criteria - "featured" stories with the same name need to rank higher than non-featured. So literal hits that are featured would have a rank of two, literal hits non-featured would be 1, and everything else would be 0. You can probably see my gap in knowledge with my script below. I've written three case statements to check for a literal hit ('% Dali %', '% Dali', and 'Dali %') making sure the name isn't in the middle of a string and then another case to check to see if it's featured. My problem is simple - adding the computed columns together as a new computed column. Leads me to believe I'm taking the wrong approach. I've tried using a local variable, but I can't figure out how to set a variable within a CASE statement...

SELECT pageid, personname, featured,
literalhit = CASE
WHEN personnameLIKE '% ' + @keyword THEN 1
WHEN personname LIKE 'dali %' THEN 1
WHEN personname LIKE '% dali %' THEN 1
ELSE 0
END
,featured = CASE patindex('%_ul%',d.hero)
WHEN 0 then 1
ELSE 0
END
,SUM(literalhit+featured) as relevance
FROM mainTbl
WHERE (personname like '%' + @keyword + '%')
ORDER BY rank, personname


Any input would be greatly appreciated! Thank you!

RobertKaucher
Posting Yak Master

169 Posts

Posted - 2010-01-28 : 21:22:51
If I understand this right the issue is the way SQL Server processes a query. You cannot reference an alias in a select statement.

Imagine if you had a table with two INT columns and you selected them using AS INT1 and AS INT2 and then tried to perform a sum of the alias.

Like this:

--Bad query
SELECT Number1 AS INT1, Number2 AS INT2, INT1+INT2 AS Total
FROM Tbl_Ints

This won't work because within the SELECT statement SQL Server is looking in the table for the selected columns but it cannot find any columns called INT1 and INT2 to add together. In your query it cannot find any columns called literalhit nor featured from which it can select a value to compute. Also, SUM is being used incorrectly. I would suggest another CASE statement here that combines the criteria from the previous select statements using AND. I'm not sure if there is a better way to do it...


Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-01-29 : 03:19:33
Just add the "personname contains keyword" to the "Is featured?" to get RANK.

SELECT pageid, personname, featured,
literalhit
relevance = CASE
WHEN personname LIKE '% ' + @keyword THEN 1
WHEN personname LIKE @keyword + ' %' THEN 1
WHEN personname LIKE '% ' + @keyword + ' %' THEN 1
ELSE 0
END
+ CASE patindex('%_ul%',d.hero)
WHEN 0 then 1
ELSE 0
END

,featured = CASE patindex('%_ul%',d.hero)
WHEN 0 then 1
ELSE 0
END

,SUM(literalhit+featured) as relevance
FROM mainTbl
WHERE (personname like '%' + @keyword + '%')
ORDER BY rank relevance DESC, personname

We do similar, but provide higher ranking for the work surrounded by spaces. Also need to consider if there is an exact match

[code]
SELECT pageid, personname, featured,
relevance = CASE
WHEN ' ' + personname + ' ' LIKE '% ' + @keyworddali + ' %' THEN 10 -- Whole word, or exact match
WHEN ' ' + personname LIKE '% ' + @keyword + '%' THEN 5 -- Word (or field) Starts-With
WHEN personname + ' ' LIKE '%' + @keyword + ' %' THEN 1 -- Word (or fields) Ends-With
ELSE 0
END
* CASE patindex('%_ul%',d.hero)
WHEN 0 then 2
ELSE 1
END
FROM mainTbl
WHERE (personname like '%' + @keyword + '%')
ORDER BY relevance DESC, personname

this provides high weighting for whole word, medium for starts-with and low for ends-with. That weighting is, say, doubled if the person is featured.

What if the person is searching for multiple words? We split out "foo bar" and, in effect, SUM the results for both words. So something containing both Foo and Bar will score higher than a record with only one of them, but rows with on Foo OR Bar will still be included, just lower down the list.

We also check for pseudo-plurals - So a search phrase of "Foo Bar" would also include "Foos" and "Bars", and a search for "Foos Bars" would also include "Foo" and "Bar" (Ditto for "Y" and "IES" etc.). Bit crude, but a few extra hits, with low weightings, seemed better than possible missing something altogether!

If "Featured" is commonly searched it should be in a separate field (e.g. a BIT column with 1 or 0), not needing a search for "_ul" within the field

Note that (I think) your search will include Featured who do NOT match the keyword - so you will always get Featured people (that may be your intention).

Multiplying the Relevance by Featured gives more weight to featured people but ONLY if they also match the Keyword
Go to Top of Page

RobertKaucher
Posting Yak Master

169 Posts

Posted - 2010-01-29 : 13:54:23
Very nice!
Go to Top of Page
   

- Advertisement -