Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
heystu
Starting Member
3 Posts |
Posted - 2007-06-28 : 10:21:28
|
| I am using SQL 2005 and have a table with a varbinary column where documents of various types (.doc .pdf .htm) are stored. I need to search through all documents to find matches to a text string. The challenge is that I need to capture the number of occurrences of the string in each document. For example, I need to know which documents contain the string "Apple Sauce" and the number of occurrences of that string within each of the matching documents.Is there an existing function that will do the trick? Do I have to use Full Text Search (FTS)? Does FTS have the ability to return a 'hit count' for each document?Thanks,-Stuart |
|
|
rudesyle
Posting Yak Master
110 Posts |
Posted - 2007-06-28 : 10:28:46
|
| Sure sounds like you're destined for a full-text index. In regards to a hit count, not sure. You can now utilize the CONTAINS function though, which should give you what you want. |
 |
|
|
heystu
Starting Member
3 Posts |
Posted - 2007-06-28 : 10:48:03
|
| I thought of using CONTAINS, but it only seems to search for the first occurrence of the search term. I'm thinking about creating a cursor that would combine CONTAINS and SUBSTRING to search for the term. Each time the cursor finds a the matching string, the SUBSTRING starting point would update and the search would repeat from the point where the occurrence of the term ended.That would be complicated, especially since it would need to be a nested cursor. The interor cursor would do the search, the outer cursor would iterate through all the records in the table. It's do-able,I'm just wondering whether there is a better way.-Stuart |
 |
|
|
KenW
Constraint Violating Yak Guru
391 Posts |
Posted - 2007-06-28 : 13:08:07
|
| How do you plan on being able to access the contents of a binary document (e.g., Word .doc or Adobe .pdf) using SQL at all?Opening either of those file formats using Notepad or a hex editor should make it pretty clear that SQL isn't going to work... |
 |
|
|
heystu
Starting Member
3 Posts |
Posted - 2007-06-28 : 13:51:48
|
| Ken makes a good point. I got ahead of myself, if we use varbinary to store the documents, FTS appears to be the only option (without going outside SQL Server). I have not used FTS, but BOL says it can be used to index varbinary data. -Stuart |
 |
|
|
|
|
|
|
|