SQL 2000 and Concatenation of varbinary column

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

General SQL Server Forums

New to SQL Server Programming

SQL 2000 and Concatenation of varbinary column

Author

Topic

jfish
Starting Member

5 Posts

Posted - 2014-05-22 : 09:24:12

Server is SQL 2000

I have a table with 10 rows with a varbinary column

I wish to concatenate all the binary column into a single binary column and then write that to another table within the database. This application splits a binary file (Word or PDF document) into multiple segments (this is Column2 as below)

example as follows

TableA

Column1 Column2 Column3
aaa 001 <some binary value>
aaa 002 <some binary value>
aaa 003 <some binary value>
aaa 004 <some binary value>
aaa 005 <some binary value>

desired results in TableB

Column1 Column2
aaa <concatenated value of above binary columns>

What's best approach for this.

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2014-05-22 : 17:18:51

We don't have any sql 2000 servers around anymore but good thing is it looks like a simple concatenation works. for example if you "+" to binary(5) values you get the two values together as binary(10):

select 0x0011223344 + 0x5555778899
Returns this:
0x00112233445555778899

How big will the final value be for all 10 rows? If I recall with 2000 you'd have to go to image if the binary value was greater than 5K bytes.

EDIT:
sorry 8K bytes was the limit for binary and varbinary.

Be One with the Optimizer
TG

jfish
Starting Member

5 Posts

Posted - 2014-05-23 : 06:41:38

The binary column length in TableA is 7983.

As for length of concatenated columns, would vary as the contents of the binary column are PDF and Word docs

Would it be more efficient to migrate the database to say SQL 2005 and use the varbinary(max) length

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2014-05-23 : 08:42:30

looks like the multi-row design with varbinary(7983) was to get around the 8000 byte limit.

I would suggest that you consider something totally different. Storing the .pdf and .doc content as native files on the file system. Use your sql table to maintain facts about the files but not store the content. facts would include at least the folder path and filename.

sounds like this has been around for awhile - why do you want to change to a single row per document now?
2005 is now 2-4 versions old (depending on how you count). If you're going to upgrade why choose 2005? And your question about efficiency completely depends on what you're doing with the data. Reconstituting a document from one row versus from 10 rows won't make that much difference.

Even with 2000 you could change to IMAGE data type if your objective is all in one row. That is a binary type that, like TEXT, has no size limit other than the size of your physical storage. It is less flexible to manipulate but I can't imagine that you're doing any manipulation on portions of a pdf or .doc file content.

Be One with the Optimizer
TG

jfish
Starting Member

5 Posts

Posted - 2014-05-23 : 11:31:28

This is legacy application with a back end SQL 2000 database - when a document was uploaded via the application is got written as a binary value within the database. A shame as you say would have been far better to only store the location of the document in terms of UNC path and filename.

I have to somehow extract all the documents out to a another storage device. Therefore redesign of the table is not possible.

I have managed to make some progress, I have written a SQL script to concatenate the binary columns and write it to another database table (housed on SQL 2005) and written a VBScript to write out the binary file - this works but the file is corrupted - so I suspect my concatenation isn't correct.

This is my script I have written

I have taken a copy of the database and migrated it over to SQL 2005

I have copied the rows to a tmp table using this command

select * INTO #tmp2 from doc_attach where doc_id='reference number of document'

This script is to concatenate the columns

DECLARE
@RowCount INT,
@bin VARBINARY(MAX)= 0x;

SET @RowCount = (SELECT COUNT(doc_id) FROM #tmp2)
DECLARE @I INT
SET @I = 1
WHILE (@I <= @RowCount)
BEGIN
SELECT
@bin += data FROM #tmp2;
SET @I = @I + 1
END
INSERT INTO Attachments (xBinary) values (@bin)

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2014-05-23 : 13:18:33

>>this works but the file is corrupted

I don't see anything in your code to enforce the sequence order of the data rows. And actually within your WHILE loop you aren't limiting the data selection to row to row @i. So every iteration is getting all the rows.

you need to confirm that the DATALENGTH(@bin) ends up to be equal to: select sum(datalength(data)) from #tmp2
and you need to confirm that the individual data values get concatenated in the correct sequence.

If you need help you need to tell us how the sequence of rows for a given doc_id is determined.

Be One with the Optimizer
TG

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2014-05-23 : 13:32:31

Here is a slightly simpler method (no looping) and no temp table


create table #doc_attach (doc_id int, seq int, content varbinary(7000))
insert #doc_attach
select 1, 1, 0x0001020304 union all
select 1, 2, 0x05060708 union all
select 1, 3, 0x091011121314 union all
select 2, 1, 0x0001020304

declare @bin varbinary(max)
set @bin = 0x

select @bin += content from #doc_attach where doc_id = 1 order by seq

select datalength(@bin) [@binLength]
       ,sum(datalength(content)) [sumContentLength]
       ,@bin [fullContent]
from   #doc_attach
where  doc_id = 1

@binLength           sumContentLength fullContent
-------------------- ---------------- -----------------------------------
15                   15               0x000102030405060708091011121314

Be One with the Optimizer
TG

jfish
Starting Member

5 Posts

Posted - 2014-05-23 : 15:37:45

quote:
Originally posted by TG

>>this works but the file is corrupted

I don't see anything in your code to enforce the sequence order of the data rows. And actually within your WHILE loop you aren't limiting the data selection to row to row @i. So every iteration is getting all the rows.

you need to confirm that the DATALENGTH(@bin) ends up to be equal to: select sum(datalength(data)) from #tmp2
and you need to confirm that the individual data values get concatenated in the correct sequence.

If you need help you need to tell us how the sequence of rows for a given doc_id is determined.

Be One with the Optimizer
TG

There is a column in the original table with a sequence number .. so I assumed the script would go thru the sequence in order - unless that is not the case.

The table has a unique identifier as doc_id column name - as follows

Doc_id SequenceNumber BinaryData
--------------------------------------
aaa 001 binary value
aaa 002 binary value
aaa 003 binary value
etc

I then use the select statement using the doc_id to copy all the segments of the row from the source table to the tmp table, then concatenate that table.

I also took the outputed PDF file, opened it in Notepad++ (to see the RAW data) and compared it to another PDF file and the contents of the header of the file seem to be correct.

I will take your suggestion and look into it.

jfish
Starting Member

5 Posts

Posted - 2014-06-02 : 11:43:11

I have got further with this

I have concatenated the fields and used the SUM DATALENGTH to confirm all columns size add up with the final column that holds the concatenated binary value - which they do

The problem is the file is still corrupt, For example the PDF file file size is normally 334k but the outputed PDF file is 335K - so an extra 1k is written. So not sure if the issue is with the concatenation or the WSH VBScript that outputs the file.

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2014-06-02 : 12:12:25

quote:
There is a column in the original table with a sequence number .. so I assumed the script would go thru the sequence in order

Not guaranteed unless you explicitly ORDER BY the sequence number as I did in the sample script I posted (above - the blue part).

Not sure about the extra 1K. Do you have an original, uncorrupted version of a .pdf that is also stored in the database? If so I would do a binary compare on the original to the concatenated output.

Be One with the Optimizer
TG

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources