Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 binary_checksum returning same value

Author  Topic 

zion99
Posting Yak Master

141 Posts

Posted - 2008-10-27 : 12:11:06
hi,
i have a following structure:

declare @hello table
( empid int
,account varchar(128)
,chksum as binary_checksum(empid, account)
)

This is returning same checksum value for different rows (14 out of 48000). I wasn't aware that checksum too can return duplicate values.

Earlier, i had put checksum as the primary key in many tables, now i m wondering if i should remove that. maybe it may fail in production environment.
was wondering is there any way to catch exceptions like primary key violation in sql server ?
so that insert of the duplicate row alone doesnt happen, while rest of the rows are inserted.

dinakar
Master Smack Fu Yak Hacker

2507 Posts

Posted - 2008-10-27 : 12:24:52
quote:
Originally posted by zion99

.... I wasn't aware that checksum too can return duplicate values.
...



It can.

Dinakar Nethi
************************
Life is short. Enjoy it.
************************
http://weblogs.sqlteam.com/dinakar/
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-10-27 : 12:35:43
See
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=70832
and
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=70328


E 12°55'05.63"
N 56°04'39.26"
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-10-27 : 12:38:06
The BINARY_CHECKSUM value is calculated this way.

Take first value of string. Shift four bits to left
Take second value of string. XOR that value with shifted valued. Shift four bits to left.
Take third value of string. XOR that value with shifted valued. Shift four bits to left.

If the shift value spills over 32 bit, truncate value and spilled 4 bits is XORed with current value.

Repeat until no more characters left in string.



E 12°55'05.63"
N 56°04'39.26"
Go to Top of Page

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2008-10-27 : 12:43:04
quote:
Originally posted by zion99

Earlier, i had put checksum as the primary key in many tables, now i m wondering if i should remove that. maybe it may fail in production environment.
was wondering is there any way to catch exceptions like primary key violation in sql server ?
so that insert of the duplicate row alone doesnt happen, while rest of the rows are inserted.

You should not use a checksum as a priamry key. What checksums are good for is to tell if a particular row has changed. That way you do not need to compare every column in the row, just the checksum for that particular PK. And assuming you index the PK and checksum columns, that lookup/comparison is pretty fast.
Go to Top of Page

zion99
Posting Yak Master

141 Posts

Posted - 2008-10-27 : 12:52:45
quote:
Originally posted by Peso

See
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=70832
and
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=70328


E 12°55'05.63"
N 56°04'39.26"



thanks Peso, Truly speaking, i had already read your comments & the function which you had created before posting. I wanted to know your thoughts hence i posted.

Is it possible to have a try catch mechanism for insert statements?
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-10-27 : 12:55:10
You can recreate the primary key with "ignore duplicate key" set.



E 12°55'05.63"
N 56°04'39.26"
Go to Top of Page

zion99
Posting Yak Master

141 Posts

Posted - 2008-10-27 : 13:01:08
thanks a ton , that will allow me to keep my existing tables with checksum as the primary key ...
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-10-27 : 13:01:48
[code]CREATE TABLE #Temp
(
i INT,
CONSTRAINT [PK_Temp]
PRIMARY KEY CLUSTERED (i)
WITH (IGNORE_DUP_KEY = ON)
)

insert #Temp
SELECT 1 UNION ALL SELECT 2

insert #Temp
SELECT 2 UNION ALL SELECT 3

SELECT *
FROM #Temp

DROP TABLE #Temp[/code]


E 12°55'05.63"
N 56°04'39.26"
Go to Top of Page

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2008-10-27 : 13:39:25
quote:
Originally posted by zion99

thanks a ton , that will allow me to keep my existing tables with checksum as the primary key ...

Out of curiosity, why would you do that? That seems like a very, very bad idea to me. Why wouldn't you make a priamry key that is actualy unique?
Go to Top of Page

cr8nk
Yak Posting Veteran

77 Posts

Posted - 2008-10-27 : 15:43:01
Yes, good question. Not only should Primary Key values be unique, but you want them to not change their values as well.
Go to Top of Page

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2008-10-27 : 16:19:47
You cannot create a primary key constraint that allows duplicate values.



CODO ERGO SUM
Go to Top of Page

zion99
Posting Yak Master

141 Posts

Posted - 2008-10-29 : 13:28:17
quote:
Originally posted by Lamprey

quote:
Originally posted by zion99

thanks a ton , that will allow me to keep my existing tables with checksum as the primary key ...

Out of curiosity, why would you do that? That seems like a very, very bad idea to me. Why wouldn't you make a priamry key that is actualy unique?



Plz. consider this scenario.
source is sql 2000
table structure is COL1 which is EMPID
COL2 is EMPSTATUS

sample data:
COL1 COL2
1004 PERMANENT
1004 RESGNED
1005 TEMPORARY
1005 RESGNED
1006 PERMANENT

as you can see, even if i use DISTINCT, i still be getting 5 rows.
What i need is only
1004 PERMANENT
1005 TEMPORARY
1006 PERMANENT

so after retrieving the data from sql 2000, i use row_number function and delete all except the first req. rows.
since COL1 contains duplicate values before deletion, i use a non-clustered index
But by using INDEX_DUP_KEY ON, i can create a CLUSTERED index which should be definitely superior to the non clustered one.


Go to Top of Page

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2008-10-29 : 15:03:35
Thanks for getting back on this. Where does the PK of the checksum come into play in this scenario?

It's probably not worth bothering to discuss if you are satisfied with your solution. But, I suspect that either you or I is missing something about the problem or solution. I still do not see how a checksum helps anything.

Go to Top of Page

zion99
Posting Yak Master

141 Posts

Posted - 2008-10-30 : 10:39:14
quote:
Originally posted by Lamprey

Thanks for getting back on this. Where does the PK of the checksum come into play in this scenario?

It's probably not worth bothering to discuss if you are satisfied with your solution. But, I suspect that either you or I is missing something about the problem or solution. I still do not see how a checksum helps anything.



i forgot to mention that i use a staging table as well. Staging table is truncated everytime and refreshed with source data
then i use checksum in the staging table to compare with that in the main table.

The logic is if checksum has a clustered index on it, processing (inserts, updates,delets on main table) will be faster..

this is my logic, plz. correct me if you think my logic is incorrect.
Go to Top of Page

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2008-10-30 : 12:23:26
[code]Source
COL1 COL2
1004 PERMANENT
1004 RESIGNED
1005 TEMPORARY
1005 RESIGNED
1006 PERMANENT

Temp
EmployeeID EmployeeStatus EmpChecksum
1004 PERMANENT -1238952635
1005 TEMPORARY -1287304914
1006 PERMANENT -1238952603

Destination
EmployeeID EmployeeStatus EmpChecksum
1004 RESIGNED -1769345935
1005 TEMPORARY -1287304914[/code]I've listed 3 tables above: Source, Temp and Destination. Hopefully, this is close to your scenario.

Normally, one would populate the Temp table and then INNER JOIN to the Destination table on the primary key which, in this case, should be the EmployeeID. Then update any rows where then CheckSum is not equal. After that one would LEFT OUTER JOIN to the Destination and insert all the new rows. Actually, with only 2 columns using a Checksum is probably not as efficient as just using the actual data. However, I can see using the checksum as part of a pattern.

I suspect that I still do not fully understand what you are doing. But, if you have the CheckSum as your Primary Key, then it becomes worthless as data changes. For example, if you look at EmployeeID 1004. In the Temp table the Status is PERMANENT and in the Destination table it is RESIGNED and thus, the CheckSum is different. How do you look up the Status for an employee in this case? If you use your PK (CheckSum), then it doesn’t exist/match so you have to look at the EmployeeID which is not the PK. Effectively, rendering the PK useless. Does that make sense?

If this is not how you have your data setup, please help us/me understand better because I cannot even imagine a scenario where a checksum as a PK makes sense. Who knows maybe one of us will learn something. :)
Go to Top of Page

zion99
Posting Yak Master

141 Posts

Posted - 2008-10-30 : 13:50:01
[code]
Source
COL1 COL2
1004 PERMANENT
1004 RESIGNED
1005 TEMPORARY
1005 RESIGNED
1006 PERMANENT

Temp
EmployeeID EmployeeStatus EmpChecksum
1004 PERMANENT -1238952635
1004 RESIGNED -1234534435
1005 TEMPORARY -1287304914
1005 RESIGNED -1287304914
1006 PERMANENT -1238952603

Destination
EmployeeID EmployeeStatus EmpChecksum
1004 RESIGNED -1238952635
1005 TEMPORARY -1287304914
1006 PERMANENT -1238952603

[/code]

now i hope it makes sense, doing the comparison using the checksum. in real scenario there are around 6 columns in this table
Go to Top of Page

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2008-10-30 : 13:54:07
Excellent. Now, can you describe your scenario? How you comparing the Temp table to the Destination and how do you determine if a record should be updated or inserted?
Go to Top of Page

zion99
Posting Yak Master

141 Posts

Posted - 2008-10-30 : 14:14:27
[code]
DELETE from TEMP where RowNumber <> 1
-- this ensures there is only one instance of each COL1

Update DEST
set
COL1=stg.COL1,
COL2=stg.COL2
FROM [TEMP] stg
INNER JOIN
DEST i
ON stg.COL1 = i.COL1
where
i.[checksum]
<>
stg.[checksum]

INSERT INTO DEST
(COL1,
COL2)
SELECT
COL1,
COL2
FROM
[TEMP] stg
Where
not exists(
SELECT 1 FROM DEST d
WHERE d.[CheckSum] = stg.[CheckSum]
)
[/code]
Note: in both the tables temp and dest, CHECKSUM is a function.
Go to Top of Page

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2008-10-30 : 14:55:47
You Update looks fine. ALthough, I wouldn't bother updating COL1 as that's the PK, but not really an issue.

The Insert looks all wrong to me. I think you want to do something more like:
INSERT 
Dest
SELECT
Temp.Col1,
Temp.Col2
FROM
Temp
LEFT OUTER JOIN
Dest
ON Temp.Col1 = Dest.Col1
WHERE
Dest.Col1 IS NULL
Otherwise you will be/are inserting duplicate EmployeeIDs into the destination. Does that make sense or are you doing someting different?
Go to Top of Page

zion99
Posting Yak Master

141 Posts

Posted - 2008-10-31 : 07:18:33
one thing i can be sure of is that DEST table doesn't have any duplicates of COL1 , it has a unique nonclustered index over it and it has never failed
if possible, please explain why my insert logic is incorrect? or is it more performance intensive?
Go to Top of Page
    Next Page

- Advertisement -