Author |
Topic |
jfish
Starting Member
5 Posts |
Posted - 2014-05-22 : 09:24:12
|
Server is SQL 2000 I have a table with 10 rows with a varbinary columnI wish to concatenate all the binary column into a single binary column and then write that to another table within the database. This application splits a binary file (Word or PDF document) into multiple segments (this is Column2 as below)example as followsTableAColumn1 Column2 Column3aaa 001 <some binary value>aaa 002 <some binary value>aaa 003 <some binary value>aaa 004 <some binary value>aaa 005 <some binary value>desired results in TableBColumn1 Column2 aaa <concatenated value of above binary columns>What's best approach for this. |
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2014-05-22 : 17:18:51
|
We don't have any sql 2000 servers around anymore but good thing is it looks like a simple concatenation works. for example if you "+" to binary(5) values you get the two values together as binary(10):select 0x0011223344 + 0x5555778899Returns this:0x00112233445555778899How big will the final value be for all 10 rows? If I recall with 2000 you'd have to go to image if the binary value was greater than 5K bytes.EDIT:sorry 8K bytes was the limit for binary and varbinary.Be One with the OptimizerTG |
 |
|
jfish
Starting Member
5 Posts |
Posted - 2014-05-23 : 06:41:38
|
The binary column length in TableA is 7983.As for length of concatenated columns, would vary as the contents of the binary column are PDF and Word docsWould it be more efficient to migrate the database to say SQL 2005 and use the varbinary(max) length |
 |
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2014-05-23 : 08:42:30
|
looks like the multi-row design with varbinary(7983) was to get around the 8000 byte limit.I would suggest that you consider something totally different. Storing the .pdf and .doc content as native files on the file system. Use your sql table to maintain facts about the files but not store the content. facts would include at least the folder path and filename.sounds like this has been around for awhile - why do you want to change to a single row per document now?2005 is now 2-4 versions old (depending on how you count). If you're going to upgrade why choose 2005? And your question about efficiency completely depends on what you're doing with the data. Reconstituting a document from one row versus from 10 rows won't make that much difference.Even with 2000 you could change to IMAGE data type if your objective is all in one row. That is a binary type that, like TEXT, has no size limit other than the size of your physical storage. It is less flexible to manipulate but I can't imagine that you're doing any manipulation on portions of a pdf or .doc file content. Be One with the OptimizerTG |
 |
|
jfish
Starting Member
5 Posts |
Posted - 2014-05-23 : 11:31:28
|
This is legacy application with a back end SQL 2000 database - when a document was uploaded via the application is got written as a binary value within the database. A shame as you say would have been far better to only store the location of the document in terms of UNC path and filename.I have to somehow extract all the documents out to a another storage device. Therefore redesign of the table is not possible.I have managed to make some progress, I have written a SQL script to concatenate the binary columns and write it to another database table (housed on SQL 2005) and written a VBScript to write out the binary file - this works but the file is corrupted - so I suspect my concatenation isn't correct.This is my script I have written I have taken a copy of the database and migrated it over to SQL 2005I have copied the rows to a tmp table using this commandselect * INTO #tmp2 from doc_attach where doc_id='reference number of document'This script is to concatenate the columns DECLARE@RowCount INT,@bin VARBINARY(MAX)= 0x;SET @RowCount = (SELECT COUNT(doc_id) FROM #tmp2) DECLARE @I INTSET @I = 1WHILE (@I <= @RowCount)BEGINSELECT@bin += data FROM #tmp2;SET @I = @I + 1ENDINSERT INTO Attachments (xBinary) values (@bin) |
 |
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2014-05-23 : 13:18:33
|
>>this works but the file is corruptedI don't see anything in your code to enforce the sequence order of the data rows. And actually within your WHILE loop you aren't limiting the data selection to row to row @i. So every iteration is getting all the rows.you need to confirm that the DATALENGTH(@bin) ends up to be equal to: select sum(datalength(data)) from #tmp2and you need to confirm that the individual data values get concatenated in the correct sequence.If you need help you need to tell us how the sequence of rows for a given doc_id is determined.Be One with the OptimizerTG |
 |
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2014-05-23 : 13:32:31
|
Here is a slightly simpler method (no looping) and no temp tablecreate table #doc_attach (doc_id int, seq int, content varbinary(7000))insert #doc_attachselect 1, 1, 0x0001020304 union allselect 1, 2, 0x05060708 union allselect 1, 3, 0x091011121314 union allselect 2, 1, 0x0001020304declare @bin varbinary(max)set @bin = 0xselect @bin += content from #doc_attach where doc_id = 1 order by seqselect datalength(@bin) [@binLength] ,sum(datalength(content)) [sumContentLength] ,@bin [fullContent]from #doc_attachwhere doc_id = 1@binLength sumContentLength fullContent-------------------- ---------------- -----------------------------------15 15 0x000102030405060708091011121314 Be One with the OptimizerTG |
 |
|
jfish
Starting Member
5 Posts |
Posted - 2014-05-23 : 15:37:45
|
quote: Originally posted by TG >>this works but the file is corruptedI don't see anything in your code to enforce the sequence order of the data rows. And actually within your WHILE loop you aren't limiting the data selection to row to row @i. So every iteration is getting all the rows.you need to confirm that the DATALENGTH(@bin) ends up to be equal to: select sum(datalength(data)) from #tmp2and you need to confirm that the individual data values get concatenated in the correct sequence.If you need help you need to tell us how the sequence of rows for a given doc_id is determined.Be One with the OptimizerTG
There is a column in the original table with a sequence number .. so I assumed the script would go thru the sequence in order - unless that is not the case.The table has a unique identifier as doc_id column name - as follows Doc_id SequenceNumber BinaryData--------------------------------------aaa 001 binary valueaaa 002 binary valueaaa 003 binary valueetcI then use the select statement using the doc_id to copy all the segments of the row from the source table to the tmp table, then concatenate that table.I also took the outputed PDF file, opened it in Notepad++ (to see the RAW data) and compared it to another PDF file and the contents of the header of the file seem to be correct.I will take your suggestion and look into it. |
 |
|
jfish
Starting Member
5 Posts |
Posted - 2014-06-02 : 11:43:11
|
I have got further with thisI have concatenated the fields and used the SUM DATALENGTH to confirm all columns size add up with the final column that holds the concatenated binary value - which they doThe problem is the file is still corrupt, For example the PDF file file size is normally 334k but the outputed PDF file is 335K - so an extra 1k is written. So not sure if the issue is with the concatenation or the WSH VBScript that outputs the file. |
 |
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2014-06-02 : 12:12:25
|
quote: There is a column in the original table with a sequence number .. so I assumed the script would go thru the sequence in order
Not guaranteed unless you explicitly ORDER BY the sequence number as I did in the sample script I posted (above - the blue part).Not sure about the extra 1K. Do you have an original, uncorrupted version of a .pdf that is also stored in the database? If so I would do a binary compare on the original to the concatenated output.Be One with the OptimizerTG |
 |
|
|