SQL Server Forums
Profile | Register | Active Topics | Members | Search | Forum FAQ
 
Register Now and get your question answered!
Username:
Password:
Save Password
Forgot your Password?

 All Forums
 Old Forums
 CLOSED - General SQL Server
 BINARY_CHECKSUM problems
 Forum Locked
 Printer Friendly
Next Page
Author Previous Topic Topic Next Topic
Page: of 2

timmy
Flowing Fount of Yak Knowledge

Australia
1242 Posts

Posted - 08/11/2006 :  00:17:53  Show Profile  Visit timmy's Homepage
Hi all,

I'm creating a script that uses BINARY_CHECKSUM on all columns in a table's primary key, but I'm getting duplicates:

SELECT key1, key2, key3, key4, key5, 
       BINARY_CHECKSUM(key1, key2, key3, key4, key5) as CheckSumVal
FROM   myTable
ORDER BY BINARY_CHECKSUM(key1, key2, key3, key4, key5) ASC

The table in question has 7 columns in it's primary key and is a mixture of Varchars and Smallints. The table has 2530 records, but when I do a unique query on the results of the above, I only get 2518 unique 'CheckSumVal' values.
I'm assuming that BINARY_CHECKSUM is not reliable enough, but thought I might ask....

Cheers,

Tim

Edited by - timmy on 08/11/2006 00:18:15

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

USA
7020 Posts

Posted - 08/11/2006 :  00:24:13  Show Profile
You said the table has 7 columns in the primary key, but you only included 5 of them in the BINARY_CHECKSUM. That could lead to duplicate checksums.

There is also a small, but real chance that 2 different keys can have the same checksum (about 1 chance in 4 billion). That is just the way checksum works.



CODO ERGO SUM
Go to Top of Page

timmy
Flowing Fount of Yak Knowledge

Australia
1242 Posts

Posted - 08/11/2006 :  01:00:30  Show Profile  Visit timmy's Homepage
Michael,
The code was just to illustrate the point.

I'm creating triggers that use the BINARY_CHECKSUM value as a means of iterating through the records in the deleted virtual table. But the following experiment is worrying:
SELECT COUNT(*) FROM myTable  -- returns 2494 records
SELECT DISTINCT BINARY_CHECKSUM(field1, field2, field3,......) FROM myTable  -- returns 2493 records

and if I do the same thing with just the columns in the primary key:
SELECT field1, field2, field3, field4, field5, field6, field7 FROM myTable  -- returns 2494 records
SELECT DISTINCT BINARY_CHECKSUM(field1, field2, field3, field4, field5, field6, field7) FROM myTable  -- returns 2482 records

Go to Top of Page

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

USA
7020 Posts

Posted - 08/11/2006 :  07:01:42  Show Profile
Can you post an example of several of the primary keys that are returning the same BINARY_CHECKSUM?

CODO ERGO SUM
Go to Top of Page

Arnold Fribble
Yak-finder General

United Kingdom
1961 Posts

Posted - 08/11/2006 :  08:58:47  Show Profile

SELECT a, BINARY_CHECKSUM(*)
FROM (
    SELECT CAST(1 AS bigint) AS a
    UNION ALL SELECT CAST(4294967296 AS bigint)
  ) AS A

Go to Top of Page

robvolk
Most Valuable Yak

USA
15658 Posts

Posted - 08/11/2006 :  09:29:43  Show Profile  Visit robvolk's Homepage
quote:
I'm creating triggers that use the BINARY_CHECKSUM value as a means of iterating through the records in the deleted virtual table
What's the rationale for doing that? Can you post the code of your trigger and explain what you're trying to do?

BINARY_CHECKSUM is not intended to uniquely identify a combination of values, it's meant to be a fast way to check whether a value or values have changed.
Go to Top of Page

timmy
Flowing Fount of Yak Knowledge

Australia
1242 Posts

Posted - 08/13/2006 :  18:03:32  Show Profile  Visit timmy's Homepage
Thanks all for your input. I'll try and explain the situation a bit better.

I'm building a series of triggers. They will be automatically generated by a script during db upgrade, but that's not the issue. Inside the trigger I have a WHILE loop that iterates through all values in the 'inserted' and 'deleted' virtual tables. The WHILE loop is there instead of using cursors/temp tables/table variables. The basic code goes something like this:
	WHILE EXISTS(SELECT NULL FROM inserted WHERE BINARY_CHECKSUM(keyfields....) > @keys) BEGIN
		SELECT 	TOP 1 @keys = BINARY_CHECKSUM(keyfields....) 
 		FROM 	inserted 
		WHERE 	BINARY_CHECKSUM(keyfields....) > @keys
		ORDER BY BINARY_CHECKSUM(keyfields....) ASC

		PRINT 'Adding records for keys = ' + CONVERT(VARCHAR, @keys)

		-- Execute store proc to insert audit parent record

		-- INSERT statements to insert primary key values
		
		-- INSERT statements to insert values

	END


So I'm basically using BINARY_CHECKSUM to iterate through all records in my virtual tables. Ideally I'd use a single key for this, but some of my tables are using primary keys of up to 8 columns. I suppose I could just concatenate all the values into a delimited string and use that for comparison but it would get nasty with 8 key columns of varying data types.
Or I could just go back to using table variables for this, but I'm going for efficiency over anything else at the moment. The fact that these triggers are auto-generated alleviates some of the pressure to make nicely maintainable code (i.e. it doesn't matter how nasty the code looks, it won't get that much attention anyway).

Any suggestions, as always, would be appreciated.

Cheers,

Tim
Go to Top of Page

eyechart
Flowing Fount of Yak Knowledge

USA
3575 Posts

Posted - 08/13/2006 :  18:10:15  Show Profile
why not just throw in an identity column instead? keep the PK on your 7 columns, put a UK on the identity and you are done.



-ec
Go to Top of Page

nr
SQLTeam MVY

United Kingdom
12543 Posts

Posted - 08/13/2006 :  18:10:20  Show Profile  Visit nr's Homepage
Insert the PK columns into a temp table with an identity and use the identity for the loop.

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

eyechart
Flowing Fount of Yak Knowledge

USA
3575 Posts

Posted - 08/13/2006 :  18:11:16  Show Profile
quote:
Originally posted by nr

Insert the PK columns into a temp table with an identity and use the identity for the loop.





even better idea.



-ec
Go to Top of Page

timmy
Flowing Fount of Yak Knowledge

Australia
1242 Posts

Posted - 08/13/2006 :  19:37:44  Show Profile  Visit timmy's Homepage
I wanted to keep away from temp tables and the like if possible (mainly from a performance perspective).

Or wouldn't it make much difference? The triggers will mainly be fired when one row is updated/inserted/deleted because that's the way the app is designed. I don't really want to get into temp tables etc for the odd data change that is multi-row.
The existing triggers on these tables used temp tables, system tables (sysobjects etc), and dynamic sql. Part of the reason for the re-design is to get rid of these sorts of dependencies. But if temp tables don't represent a performance hit I might consider them.
Go to Top of Page

nr
SQLTeam MVY

United Kingdom
12543 Posts

Posted - 08/13/2006 :  19:44:36  Show Profile  Visit nr's Homepage
Do you need to process it row by row?


==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

timmy
Flowing Fount of Yak Knowledge

Australia
1242 Posts

Posted - 08/13/2006 :  19:55:51  Show Profile  Visit timmy's Homepage
Unfortunately yes. We have one audit table (or 3 to be precise...), so we basically need to write a row for each column that's changed.
Go to Top of Page

nr
SQLTeam MVY

United Kingdom
12543 Posts

Posted - 08/13/2006 :  21:20:06  Show Profile  Visit nr's Homepage
Have you seen
http://www.nigelrivett.net/SQLTriggers/AuditTrailTrigger.html

It might do what you want.

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

timmy
Flowing Fount of Yak Knowledge

Australia
1242 Posts

Posted - 08/13/2006 :  21:33:30  Show Profile  Visit timmy's Homepage
I did consider this option, but I didn't really like the concatenated key bit. And the I/S view references and dynamic sql would be a bit of a performance hit that I really don't need.
Go to Top of Page

nr
SQLTeam MVY

United Kingdom
12543 Posts

Posted - 08/13/2006 :  22:07:29  Show Profile  Visit nr's Homepage
If you have generic audit trail tables then you have to concatenate the PK columns unless you have columns available to the max number in your database.
The dynamic sql is only because this is a single trigger for all tables. If you are generating the triggers for each table then it doesn't need dynamic sql.

This generates a trigger from the table definition - you would want a combination of both probably.
http://www.nigelrivett.net/SQLTriggers/GenerateTriggerForAudit.html

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.

Edited by - nr on 08/13/2006 22:10:30
Go to Top of Page

blindman
Flowing Fount of Yak Knowledge

USA
2365 Posts

Posted - 08/14/2006 :  00:19:47  Show Profile
quote:
Originally posted by Michael Valentine Jones

There is also a small, but real chance that 2 different keys can have the same checksum (about 1 chance in 4 billion).

1 in 4 billion? That depends completely upon the nature of the data. Transpositions will lead to identical checksum values, and transpositions would seem to be a real possibility where a composite key is being used.
Go to Top of Page

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

USA
7020 Posts

Posted - 08/14/2006 :  07:48:07  Show Profile
quote:
Originally posted by blindman

quote:
Originally posted by Michael Valentine Jones

There is also a small, but real chance that 2 different keys can have the same checksum (about 1 chance in 4 billion).

1 in 4 billion? That depends completely upon the nature of the data. Transpositions will lead to identical checksum values, and transpositions would seem to be a real possibility where a composite key is being used.


Do you have any example of that?

I tried some simple examples ans didn't see it.

select a=binary_checksum( 'a','b')
select b=binary_checksum( 'b','a')
select a1=binary_checksum( 1,2)
select a2=binary_checksum( 2,1)

Results:

a           
----------- 
1650

(1 row(s) affected)

b           
----------- 
1601

(1 row(s) affected)

a1          
----------- 
18

(1 row(s) affected)

a2          
----------- 
33

(1 row(s) affected)


CODO ERGO SUM
Go to Top of Page

blindman
Flowing Fount of Yak Knowledge

USA
2365 Posts

Posted - 08/14/2006 :  10:54:49  Show Profile
I stand corrected.
I have run into some cases where binary_checksum has returned duplicate values, though. When I get back to my office I'll see if I can dig them up.
Go to Top of Page

blindman
Flowing Fount of Yak Knowledge

USA
2365 Posts

Posted - 08/14/2006 :  11:01:42  Show Profile
Here's an example of two simple strings returning the same binary_checksum value:
select binary_checksum('bQ')
select binary_checksum('aa')

I just think 1 in 4 billion is too general a statement to make.
Go to Top of Page

eyechart
Flowing Fount of Yak Knowledge

USA
3575 Posts

Posted - 08/14/2006 :  11:03:07  Show Profile
quote:
Originally posted by blindman

I stand corrected.
I have run into some cases where binary_checksum has returned duplicate values, though. When I get back to my office I'll see if I can dig them up.



This is the example that I have seen showing binary_checksum returning dupe values.

SELECT BINARY_CHECKSUM('ABA'), BINARY_CHECKSUM('ACQ') 

                        
----------- ----------- 
17761       17761

(1 row(s) affected)



-ec
Go to Top of Page
Page: of 2 Previous Topic Topic Next Topic  
Next Page
 Forum Locked
 Printer Friendly
Jump To:
SQL Server Forums © 2000-2009 SQLTeam Publishing, LLC Go To Top Of Page
This page was generated in 0.14 seconds. Powered By: Snitz Forums 2000