Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
rwolfcastle
Starting Member
8 Posts |
Posted - 2004-12-13 : 01:29:55
|
Hi,I've read with much interest the following post (http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=42667) about searching and replacing text in NTEXT fields and would like to know if there's a way around not being able to use PATINDEX with IMAGE fields.The internet is riddled, it seems, with claims that PATINDEX works with IMAGE fields (such as: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsqlpro03/html/sp03g8.asp) but Query Analyzer just keeps giving me the same, old, error:The PATINDEX function operates on char, nchar, varchar, nvarchar, text, and ntext data types only.Is there an alternative way for searching and replacing textual data stored in an IMAGE field? My database basically stores documents of any type, PNG, EXE, DOC, TXT, HTML, etc, with a corresponding content-type ID and I'm querying for documents of type text/html only, so the data is actually text-based, except that it's stored in an IMAGE field for what I previously thought were reasons of simplicity. Now I'm beginning to regret normalising my database to this extent  |
|
|
nr
SQLTeam MVY
12543 Posts |
Posted - 2004-12-13 : 04:22:53
|
| As you have realised the article is wrong.There's not a lot you can do with images except replace the whole image.For your database you should distinguish between text blobs and binary blobs and hold them separately.==========================================Cursors are useful if you don't know sql.DTS can be used in a similar way.Beer is not cold and it isn't fizzy. |
 |
|
|
rwolfcastle
Starting Member
8 Posts |
Posted - 2004-12-13 : 07:58:54
|
| Thanks. The system I'm designing hasn't gone live, yet, so I'll make the necessary changes to the stored procedures. One more question, if I may - what is the best way to deal with an "either/or" situation as far as normalisation goes? Should I create an extra field in my DocumentBlob table (which is in a one-to-one relationship with the master DocumentVersion table), with one field called BinaryData and the other TextData, and allow nulls, or would it be best to have both BinaryData and TextData tables, each as foreign keys for the master DocumentVersion table, and with identical field names to store the file data?I've never been too sure about this (which might explain why I'm currently stuffing everything into one field, in one table) as I have always assumed that having two tables and having to check - in code - for a document type then access the appropriate table through code in a second call is poor design (not that I've ever been properly trained, but it does seem very naughty, or at the very least, inelegant).Thanks heaps. |
 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2004-12-13 : 08:10:30
|
| I have a question for you: why not consider storing the files as FILES, and NOT in the database? You avoid all of the problems with document types and the limitations of PATINDEX, and potential security/virus issues by posting EXE and other files in a database. You can use Indexing Service to full-text index the documents. This index can queried using linked servers, or through the application directly:http://msdn.microsoft.com/library/default.asp?url=/library/en-us/indexsrv/html/ixqlang_92xx.asp |
 |
|
|
rwolfcastle
Starting Member
8 Posts |
Posted - 2004-12-13 : 17:11:33
|
| Because the system I'm designing, for all intents and purposes, is a document management system and I need to maintain control over the versions of each file. I preferred the idea of controlling access to them via stored procedures than stuffing potentially thousands and thousands of files in a folder somewhere, where I would not be able to limit access to only the document creator and approver, for example. Having them in a database means that I can enforce all manner of intricate access-control-mechanisms without resorting to manipulating NTFS ACLs (which is an absolute nightmare).Is there a way, perhaps, to get the contents of an IMAGE field and put it into a TEXT variable in T-SQL? This might make things a little easier (for the time being), but my attempts at using CAST and CONVERT to convert from IMAGE to TEXT have failed. The only reason why I need to use PATINDEX is to search and replace for values in our procedures (which are HTML files) as a once-off during our reorganisation. For example, I need to replace the word "Division" with "Department" |
 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2004-12-13 : 17:40:55
|
| Because the image column is just a binary stream you'd have no way of knowing if text is ASCII or Unicode, or some other type of encoding. Knowing the document type won't necessarily help here. You can hope that it converts to text properly, but any embedded Unicode won't come over. Converting to ntext would fix that but naturally won't convert ASCII correctly. And some formats (PDF in particular) could encode the data in a compressed format that can't be converted at all.If you really intend to have a document management system set up this way they you will HAVE TO have full-text indexing. PatIndex will not come close to supporting what you'll need. You can use specific document filters or write your own:http://msdn.microsoft.com/library/default.asp?url=/library/en-us/createdb/cm_fullad_55mb.asphttp://msdn.microsoft.com/library/default.asp?url=/library/en-us/indexsrv/html/ixufilt_912d.aspThis lets you put the document in an image column and allow the full-text service to index it correctly. Otherwise it's just a binary hash that won't do you much good.As far as security goes, you can easily put the files into a folder and block direct access to all users except for your application. That way you do not have to manipulate any ACLs except for the most basic things. Your application would then be the only thing that could read or write the file, using the database to drive the permissions.I'll concede that if you were to use full-text indexing, then it would be easier to keep the documents in the database. But without that, I think you'll end up fighting with SQL Server to try and do something it was not intended to do (that's more or less the reason this is causing you so much hassle) Even then, I'd suggest you research full-text indexing in Books Online as much as you can, you'll discover that there is a lot you can do with it via Indexing Service and files. |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2004-12-13 : 20:44:40
|
| We use a Document management system called InVu which stores files, securely, on a file server, with the path/filename recorded in a SQL database.Not sure how they do it, but might be worth a look.Kristen |
 |
|
|
rwolfcastle
Starting Member
8 Posts |
Posted - 2004-12-13 : 22:19:30
|
| I intend to explore the full-text indexing eventually, but right now, I'm told they don't want that and would prefer to use the meta data search fields I've created (but I can see the day when this will be insufficient, which also one of the reasons why I'm stuffing everything into a database, not just to avoid ACLs or having to read files on the file system in the database and double-handle it, so to speak). I'm now thinking that the ASP page would access each text-based file in the IMAGE field, apply the changes in VBScript, then submit the new file stream to the existing save_file stored procedure (which dumps it back into the IMAGE field as a new record with a new version number). This isn't the most efficient way of handling it but since I can't seem to get IMAGE data into a TEXT variable in T-SQL, it'll do until I have more time.I noticed in those links posted above that the file extensions for full-text indexing should be stored in a field in the same table, but I've actually got a separate table called ContentType with the MIME type and file extension associated with that record. I hope the indexing can handle JOINs to get the file type.Thanks for the help, guys-------------------------My eyes... the goggles do nothing! |
 |
|
|
|
|
|
|
|