SQLTeam.com Logo

Return to Keyword search using a tally or sequence table

Keyword search using a tally or sequence table

Written by Damian Maclennen on 08 October 2001

Ryan writes "I am looking for a way of creating a stored procedure that will carry out a keyword search without building up the SQL dynamically in the stored procedure. What I'm after is a method of doing this that doesn't require me to build up a huge string."

"When I have to match a list of IDs I would typically parse the comma delimted list of IDs and insert them in to a temporary table and then use an IN clause as follows:

SELECT @strEmployeeID = @strEmployeeID + ','

CREATE TABLE #employees (EmployeeID int not null)

IF CHARINDEX(',', @strEmployeeID) > 0 BEGIN
     WHILE CHARINDEX(',',@strEmployeeID) > 0 BEGIN
	INSERT #employees VALUES(SUBSTRING(@strEmployeeID, 1, CHARINDEX(',', @strEmployeeID) - 1))
	SELECT @strEmployeeID = STUFF(@strEmployeeID, 1, CHARINDEX(',', @strEmployeeID),'')
	END
END

SELECT * FROM Job WHERE EmployeeID IN (SELECT EmployeeID FROM #employees)

Can anyone offer me a similar solution where I parse the keywords in to a temporary table of some sort or is my only option to build a monster string ?"

Sure Ryan. There are a few ways to do this. One way is used on SQL Team, Graz wrote a great article about it here. A few months ago, I was building a similar site to this one and had an idea of a different way to go about it.

First some background. Rob Volk wrote an article at the beginning of this year on Parsing CSV Values Into Multiple Rows using a "tally" or "sequence" table method. If you are not familiar with this article and/or technique I strongly suggest you go read that now. If it is new to you, read it a few times, it may take a bit to get your head around it. Essentially, it splits out a comma delimited list and returns a set.

The technique I am going to demonstrate takes this approach.

Firstly, we need our tally table. I am going to call my table "Sequence".


Create Table Sequence(
       Seq int
)

--Generate some data for it
SET NOCOUNT ON
Declare @i int
Set @i = 1
WHILE @i <= 8000
     BEGIN
         Insert Into Sequence Values (@i)
         Set @i = @i + 1
     END
SET NOCOUNT OFF   

Now take our list of keywords (in this case, they are space delimited) and convert them into a Set. Paste this code into Query Analyzer and have a look.


Declare @Keywords varchar(2000)
Select @Keywords = 'my space delimited string'

	Select
		Substring(' ' + @keywords + ' ',seq,
		CharIndex(' ' , ' ' + @keywords + ' ' , seq) - seq)
	FROM SEQUENCE
	
	WHERE 
	 seq <= len(' ' + @keywords + ' ') and
	 Substring(' ' + @keywords + ' ', seq - 1, 1) = ' ' and
	 CharIndex(' ' , ' ' + @keywords + ' ' , seq) - seq > 0 

Pretty cool huh ?

Next up, we JOIN the article table to this set based on the body text. Where a string in the article body matches one of the words in this set, the join condition will match and the table will join. Where the word appears more than once, the table will join multiple times. If you group these and and count the number of joins ... voila! ... instant ranked search results.

Here is some code...

--Assuming we have an Articles table
Create Table Articles(
	ArticleID int NOT NULL,
	ArticleTitle VarChar(200) NOT NULL,
	ArticleDescription VarChar(500) NOT NULL,
	ArticleBody Text
)

Declare @Keywords varchar(2000)
Select @Keywords = 'my space delimited string'


	Select ArticleID, ArticleTitle, ArticleDescription, count(AA_ID) hits
	FROM SEQUENCE
	INNER JOIN Articles ON Articles.ArticleBody like '%' +  
		Substring(' ' + @keywords + ' ',seq,
		CharIndex(' ' , ' ' + @keywords + ' ' , seq) - seq)
		+ '%'
	
	WHERE 
	 seq <= len(' ' + @keywords + ' ') and
	 Substring(' ' + @keywords + ' ', seq - 1, 1) = ' ' and
	 CharIndex(' ' , ' ' + @keywords + ' ' , seq) - seq > 0 
	
	Group by ArticleID, ArticleTitle, ArticleDescription
	
	ORDER BY Hits DESC	

So there you have it, a simple way to use the "tally" table to produce ranked keyword searches.

The "tally" table technique (say that 6 times fast) is a very powerful tool and a good one to have in your arsenal whenever you need to split delimited strings of words or numbers.

Good luck...and happy coding.