Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 Count Multiple Occurrences of a String

Author  Topic 

heystu
Starting Member

3 Posts

Posted - 2007-06-28 : 10:21:28
I am using SQL 2005 and have a table with a varbinary column where documents of various types (.doc .pdf .htm) are stored. I need to search through all documents to find matches to a text string. The challenge is that I need to capture the number of occurrences of the string in each document. For example, I need to know which documents contain the string "Apple Sauce" and the number of occurrences of that string within each of the matching documents.

Is there an existing function that will do the trick? Do I have to use Full Text Search (FTS)? Does FTS have the ability to return a 'hit count' for each document?
Thanks,
-Stuart

rudesyle
Posting Yak Master

110 Posts

Posted - 2007-06-28 : 10:28:46
Sure sounds like you're destined for a full-text index. In regards to a hit count, not sure. You can now utilize the CONTAINS function though, which should give you what you want.
Go to Top of Page

heystu
Starting Member

3 Posts

Posted - 2007-06-28 : 10:48:03
I thought of using CONTAINS, but it only seems to search for the first occurrence of the search term. I'm thinking about creating a cursor that would combine CONTAINS and SUBSTRING to search for the term. Each time the cursor finds a the matching string, the SUBSTRING starting point would update and the search would repeat from the point where the occurrence of the term ended.

That would be complicated, especially since it would need to be a nested cursor. The interor cursor would do the search, the outer cursor would iterate through all the records in the table. It's do-able,I'm just wondering whether there is a better way.
-Stuart
Go to Top of Page

KenW
Constraint Violating Yak Guru

391 Posts

Posted - 2007-06-28 : 13:08:07
How do you plan on being able to access the contents of a binary document (e.g., Word .doc or Adobe .pdf) using SQL at all?

Opening either of those file formats using Notepad or a hex editor should make it pretty clear that SQL isn't going to work...
Go to Top of Page

heystu
Starting Member

3 Posts

Posted - 2007-06-28 : 13:51:48
Ken makes a good point. I got ahead of myself, if we use varbinary to store the documents, FTS appears to be the only option (without going outside SQL Server). I have not used FTS, but BOL says it can be used to index varbinary data.

-Stuart
Go to Top of Page
   

- Advertisement -