Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 Old Forums
 CLOSED - General SQL Server
 BINARY_CHECKSUM problems

Author  Topic 

timmy
Master Smack Fu Yak Hacker

1242 Posts

Posted - 2006-08-11 : 00:17:53
Hi all,

I'm creating a script that uses BINARY_CHECKSUM on all columns in a table's primary key, but I'm getting duplicates:

SELECT key1, key2, key3, key4, key5,
BINARY_CHECKSUM(key1, key2, key3, key4, key5) as CheckSumVal
FROM myTable
ORDER BY BINARY_CHECKSUM(key1, key2, key3, key4, key5) ASC

The table in question has 7 columns in it's primary key and is a mixture of Varchars and Smallints. The table has 2530 records, but when I do a unique query on the results of the above, I only get 2518 unique 'CheckSumVal' values.
I'm assuming that BINARY_CHECKSUM is not reliable enough, but thought I might ask....

Cheers,

Tim

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2006-08-11 : 00:24:13
You said the table has 7 columns in the primary key, but you only included 5 of them in the BINARY_CHECKSUM. That could lead to duplicate checksums.

There is also a small, but real chance that 2 different keys can have the same checksum (about 1 chance in 4 billion). That is just the way checksum works.



CODO ERGO SUM
Go to Top of Page

timmy
Master Smack Fu Yak Hacker

1242 Posts

Posted - 2006-08-11 : 01:00:30
Michael,
The code was just to illustrate the point.

I'm creating triggers that use the BINARY_CHECKSUM value as a means of iterating through the records in the deleted virtual table. But the following experiment is worrying:
SELECT COUNT(*) FROM myTable  -- returns 2494 records
SELECT DISTINCT BINARY_CHECKSUM(field1, field2, field3,......) FROM myTable -- returns 2493 records

and if I do the same thing with just the columns in the primary key:
SELECT field1, field2, field3, field4, field5, field6, field7 FROM myTable  -- returns 2494 records
SELECT DISTINCT BINARY_CHECKSUM(field1, field2, field3, field4, field5, field6, field7) FROM myTable -- returns 2482 records

Go to Top of Page

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2006-08-11 : 07:01:42
Can you post an example of several of the primary keys that are returning the same BINARY_CHECKSUM?

CODO ERGO SUM
Go to Top of Page

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2006-08-11 : 08:58:47
[code]
SELECT a, BINARY_CHECKSUM(*)
FROM (
SELECT CAST(1 AS bigint) AS a
UNION ALL SELECT CAST(4294967296 AS bigint)
) AS A
[/code]
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2006-08-11 : 09:29:43
quote:
I'm creating triggers that use the BINARY_CHECKSUM value as a means of iterating through the records in the deleted virtual table
What's the rationale for doing that? Can you post the code of your trigger and explain what you're trying to do?

BINARY_CHECKSUM is not intended to uniquely identify a combination of values, it's meant to be a fast way to check whether a value or values have changed.
Go to Top of Page

timmy
Master Smack Fu Yak Hacker

1242 Posts

Posted - 2006-08-13 : 18:03:32
Thanks all for your input. I'll try and explain the situation a bit better.

I'm building a series of triggers. They will be automatically generated by a script during db upgrade, but that's not the issue. Inside the trigger I have a WHILE loop that iterates through all values in the 'inserted' and 'deleted' virtual tables. The WHILE loop is there instead of using cursors/temp tables/table variables. The basic code goes something like this:
	WHILE EXISTS(SELECT NULL FROM inserted WHERE BINARY_CHECKSUM(keyfields....) > @keys) BEGIN
SELECT TOP 1 @keys = BINARY_CHECKSUM(keyfields....)
FROM inserted
WHERE BINARY_CHECKSUM(keyfields....) > @keys
ORDER BY BINARY_CHECKSUM(keyfields....) ASC

PRINT 'Adding records for keys = ' + CONVERT(VARCHAR, @keys)

-- Execute store proc to insert audit parent record

-- INSERT statements to insert primary key values

-- INSERT statements to insert values

END


So I'm basically using BINARY_CHECKSUM to iterate through all records in my virtual tables. Ideally I'd use a single key for this, but some of my tables are using primary keys of up to 8 columns. I suppose I could just concatenate all the values into a delimited string and use that for comparison but it would get nasty with 8 key columns of varying data types.
Or I could just go back to using table variables for this, but I'm going for efficiency over anything else at the moment. The fact that these triggers are auto-generated alleviates some of the pressure to make nicely maintainable code (i.e. it doesn't matter how nasty the code looks, it won't get that much attention anyway).

Any suggestions, as always, would be appreciated.

Cheers,

Tim
Go to Top of Page

eyechart
Master Smack Fu Yak Hacker

3575 Posts

Posted - 2006-08-13 : 18:10:15
why not just throw in an identity column instead? keep the PK on your 7 columns, put a UK on the identity and you are done.



-ec
Go to Top of Page

nr
SQLTeam MVY

12543 Posts

Posted - 2006-08-13 : 18:10:20
Insert the PK columns into a temp table with an identity and use the identity for the loop.

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

eyechart
Master Smack Fu Yak Hacker

3575 Posts

Posted - 2006-08-13 : 18:11:16
quote:
Originally posted by nr

Insert the PK columns into a temp table with an identity and use the identity for the loop.





even better idea.



-ec
Go to Top of Page

timmy
Master Smack Fu Yak Hacker

1242 Posts

Posted - 2006-08-13 : 19:37:44
I wanted to keep away from temp tables and the like if possible (mainly from a performance perspective).

Or wouldn't it make much difference? The triggers will mainly be fired when one row is updated/inserted/deleted because that's the way the app is designed. I don't really want to get into temp tables etc for the odd data change that is multi-row.
The existing triggers on these tables used temp tables, system tables (sysobjects etc), and dynamic sql. Part of the reason for the re-design is to get rid of these sorts of dependencies. But if temp tables don't represent a performance hit I might consider them.
Go to Top of Page

nr
SQLTeam MVY

12543 Posts

Posted - 2006-08-13 : 19:44:36
Do you need to process it row by row?


==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

timmy
Master Smack Fu Yak Hacker

1242 Posts

Posted - 2006-08-13 : 19:55:51
Unfortunately yes. We have one audit table (or 3 to be precise...), so we basically need to write a row for each column that's changed.
Go to Top of Page

nr
SQLTeam MVY

12543 Posts

Posted - 2006-08-13 : 21:20:06
Have you seen
http://www.nigelrivett.net/SQLTriggers/AuditTrailTrigger.html

It might do what you want.

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

timmy
Master Smack Fu Yak Hacker

1242 Posts

Posted - 2006-08-13 : 21:33:30
I did consider this option, but I didn't really like the concatenated key bit. And the I/S view references and dynamic sql would be a bit of a performance hit that I really don't need.
Go to Top of Page

nr
SQLTeam MVY

12543 Posts

Posted - 2006-08-13 : 22:07:29
If you have generic audit trail tables then you have to concatenate the PK columns unless you have columns available to the max number in your database.
The dynamic sql is only because this is a single trigger for all tables. If you are generating the triggers for each table then it doesn't need dynamic sql.

This generates a trigger from the table definition - you would want a combination of both probably.
http://www.nigelrivett.net/SQLTriggers/GenerateTriggerForAudit.html

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

blindman
Master Smack Fu Yak Hacker

2365 Posts

Posted - 2006-08-14 : 00:19:47
quote:
Originally posted by Michael Valentine Jones

There is also a small, but real chance that 2 different keys can have the same checksum (about 1 chance in 4 billion).

1 in 4 billion? That depends completely upon the nature of the data. Transpositions will lead to identical checksum values, and transpositions would seem to be a real possibility where a composite key is being used.
Go to Top of Page

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2006-08-14 : 07:48:07
quote:
Originally posted by blindman

quote:
Originally posted by Michael Valentine Jones

There is also a small, but real chance that 2 different keys can have the same checksum (about 1 chance in 4 billion).

1 in 4 billion? That depends completely upon the nature of the data. Transpositions will lead to identical checksum values, and transpositions would seem to be a real possibility where a composite key is being used.


Do you have any example of that?

I tried some simple examples ans didn't see it.

select a=binary_checksum( 'a','b')
select b=binary_checksum( 'b','a')
select a1=binary_checksum( 1,2)
select a2=binary_checksum( 2,1)

Results:

a
-----------
1650

(1 row(s) affected)

b
-----------
1601

(1 row(s) affected)

a1
-----------
18

(1 row(s) affected)

a2
-----------
33

(1 row(s) affected)


CODO ERGO SUM
Go to Top of Page

blindman
Master Smack Fu Yak Hacker

2365 Posts

Posted - 2006-08-14 : 10:54:49
I stand corrected.
I have run into some cases where binary_checksum has returned duplicate values, though. When I get back to my office I'll see if I can dig them up.
Go to Top of Page

blindman
Master Smack Fu Yak Hacker

2365 Posts

Posted - 2006-08-14 : 11:01:42
Here's an example of two simple strings returning the same binary_checksum value:
select binary_checksum('bQ')
select binary_checksum('aa')

I just think 1 in 4 billion is too general a statement to make.
Go to Top of Page

eyechart
Master Smack Fu Yak Hacker

3575 Posts

Posted - 2006-08-14 : 11:03:07
quote:
Originally posted by blindman

I stand corrected.
I have run into some cases where binary_checksum has returned duplicate values, though. When I get back to my office I'll see if I can dig them up.



This is the example that I have seen showing binary_checksum returning dupe values.

SELECT BINARY_CHECKSUM('ABA'), BINARY_CHECKSUM('ACQ') 


----------- -----------
17761 17761

(1 row(s) affected)



-ec
Go to Top of Page
    Next Page

- Advertisement -