| Author |
Topic |
|
zion99
Posting Yak Master
141 Posts |
Posted - 2008-10-27 : 12:11:06
|
hi, i have a following structure:declare @hello table( empid int,account varchar(128),chksum as binary_checksum(empid, account)) This is returning same checksum value for different rows (14 out of 48000). I wasn't aware that checksum too can return duplicate values.Earlier, i had put checksum as the primary key in many tables, now i m wondering if i should remove that. maybe it may fail in production environment.was wondering is there any way to catch exceptions like primary key violation in sql server ?so that insert of the duplicate row alone doesnt happen, while rest of the rows are inserted. |
|
|
dinakar
Master Smack Fu Yak Hacker
2507 Posts |
Posted - 2008-10-27 : 12:24:52
|
quote: Originally posted by zion99 .... I wasn't aware that checksum too can return duplicate values....
It can.Dinakar Nethi************************Life is short. Enjoy it.************************http://weblogs.sqlteam.com/dinakar/ |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2008-10-27 : 12:38:06
|
The BINARY_CHECKSUM value is calculated this way.Take first value of string. Shift four bits to leftTake second value of string. XOR that value with shifted valued. Shift four bits to left.Take third value of string. XOR that value with shifted valued. Shift four bits to left.If the shift value spills over 32 bit, truncate value and spilled 4 bits is XORed with current value.Repeat until no more characters left in string. E 12°55'05.63"N 56°04'39.26" |
 |
|
|
Lamprey
Master Smack Fu Yak Hacker
4614 Posts |
Posted - 2008-10-27 : 12:43:04
|
quote: Originally posted by zion99 Earlier, i had put checksum as the primary key in many tables, now i m wondering if i should remove that. maybe it may fail in production environment.was wondering is there any way to catch exceptions like primary key violation in sql server ?so that insert of the duplicate row alone doesnt happen, while rest of the rows are inserted.
You should not use a checksum as a priamry key. What checksums are good for is to tell if a particular row has changed. That way you do not need to compare every column in the row, just the checksum for that particular PK. And assuming you index the PK and checksum columns, that lookup/comparison is pretty fast. |
 |
|
|
zion99
Posting Yak Master
141 Posts |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2008-10-27 : 12:55:10
|
You can recreate the primary key with "ignore duplicate key" set. E 12°55'05.63"N 56°04'39.26" |
 |
|
|
zion99
Posting Yak Master
141 Posts |
Posted - 2008-10-27 : 13:01:08
|
thanks a ton , that will allow me to keep my existing tables with checksum as the primary key ... |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2008-10-27 : 13:01:48
|
[code]CREATE TABLE #Temp ( i INT, CONSTRAINT [PK_Temp] PRIMARY KEY CLUSTERED (i) WITH (IGNORE_DUP_KEY = ON) )insert #TempSELECT 1 UNION ALL SELECT 2insert #TempSELECT 2 UNION ALL SELECT 3SELECT *FROM #TempDROP TABLE #Temp[/code] E 12°55'05.63"N 56°04'39.26" |
 |
|
|
Lamprey
Master Smack Fu Yak Hacker
4614 Posts |
Posted - 2008-10-27 : 13:39:25
|
quote: Originally posted by zion99 thanks a ton , that will allow me to keep my existing tables with checksum as the primary key ...
Out of curiosity, why would you do that? That seems like a very, very bad idea to me. Why wouldn't you make a priamry key that is actualy unique? |
 |
|
|
cr8nk
Yak Posting Veteran
77 Posts |
Posted - 2008-10-27 : 15:43:01
|
| Yes, good question. Not only should Primary Key values be unique, but you want them to not change their values as well. |
 |
|
|
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
7020 Posts |
Posted - 2008-10-27 : 16:19:47
|
| You cannot create a primary key constraint that allows duplicate values.CODO ERGO SUM |
 |
|
|
zion99
Posting Yak Master
141 Posts |
Posted - 2008-10-29 : 13:28:17
|
quote: Originally posted by Lamprey
quote: Originally posted by zion99 thanks a ton , that will allow me to keep my existing tables with checksum as the primary key ...
Out of curiosity, why would you do that? That seems like a very, very bad idea to me. Why wouldn't you make a priamry key that is actualy unique?
Plz. consider this scenario.source is sql 2000table structure is COL1 which is EMPIDCOL2 is EMPSTATUSsample data:COL1 COL21004 PERMANENT1004 RESGNED1005 TEMPORARY1005 RESGNED1006 PERMANENTas you can see, even if i use DISTINCT, i still be getting 5 rows.What i need is only1004 PERMANENT1005 TEMPORARY1006 PERMANENTso after retrieving the data from sql 2000, i use row_number function and delete all except the first req. rows.since COL1 contains duplicate values before deletion, i use a non-clustered indexBut by using INDEX_DUP_KEY ON, i can create a CLUSTERED index which should be definitely superior to the non clustered one. |
 |
|
|
Lamprey
Master Smack Fu Yak Hacker
4614 Posts |
Posted - 2008-10-29 : 15:03:35
|
| Thanks for getting back on this. Where does the PK of the checksum come into play in this scenario?It's probably not worth bothering to discuss if you are satisfied with your solution. But, I suspect that either you or I is missing something about the problem or solution. I still do not see how a checksum helps anything. |
 |
|
|
zion99
Posting Yak Master
141 Posts |
Posted - 2008-10-30 : 10:39:14
|
quote: Originally posted by Lamprey Thanks for getting back on this. Where does the PK of the checksum come into play in this scenario?It's probably not worth bothering to discuss if you are satisfied with your solution. But, I suspect that either you or I is missing something about the problem or solution. I still do not see how a checksum helps anything.
i forgot to mention that i use a staging table as well. Staging table is truncated everytime and refreshed with source data then i use checksum in the staging table to compare with that in the main table.The logic is if checksum has a clustered index on it, processing (inserts, updates,delets on main table) will be faster..this is my logic, plz. correct me if you think my logic is incorrect. |
 |
|
|
Lamprey
Master Smack Fu Yak Hacker
4614 Posts |
Posted - 2008-10-30 : 12:23:26
|
| [code]SourceCOL1 COL21004 PERMANENT1004 RESIGNED1005 TEMPORARY1005 RESIGNED1006 PERMANENTTempEmployeeID EmployeeStatus EmpChecksum1004 PERMANENT -12389526351005 TEMPORARY -12873049141006 PERMANENT -1238952603DestinationEmployeeID EmployeeStatus EmpChecksum1004 RESIGNED -17693459351005 TEMPORARY -1287304914[/code]I've listed 3 tables above: Source, Temp and Destination. Hopefully, this is close to your scenario.Normally, one would populate the Temp table and then INNER JOIN to the Destination table on the primary key which, in this case, should be the EmployeeID. Then update any rows where then CheckSum is not equal. After that one would LEFT OUTER JOIN to the Destination and insert all the new rows. Actually, with only 2 columns using a Checksum is probably not as efficient as just using the actual data. However, I can see using the checksum as part of a pattern.I suspect that I still do not fully understand what you are doing. But, if you have the CheckSum as your Primary Key, then it becomes worthless as data changes. For example, if you look at EmployeeID 1004. In the Temp table the Status is PERMANENT and in the Destination table it is RESIGNED and thus, the CheckSum is different. How do you look up the Status for an employee in this case? If you use your PK (CheckSum), then it doesn’t exist/match so you have to look at the EmployeeID which is not the PK. Effectively, rendering the PK useless. Does that make sense?If this is not how you have your data setup, please help us/me understand better because I cannot even imagine a scenario where a checksum as a PK makes sense. Who knows maybe one of us will learn something. :) |
 |
|
|
zion99
Posting Yak Master
141 Posts |
Posted - 2008-10-30 : 13:50:01
|
| [code]SourceCOL1 COL21004 PERMANENT1004 RESIGNED1005 TEMPORARY1005 RESIGNED1006 PERMANENTTempEmployeeID EmployeeStatus EmpChecksum1004 PERMANENT -12389526351004 RESIGNED -12345344351005 TEMPORARY -12873049141005 RESIGNED -12873049141006 PERMANENT -1238952603DestinationEmployeeID EmployeeStatus EmpChecksum1004 RESIGNED -12389526351005 TEMPORARY -12873049141006 PERMANENT -1238952603[/code]now i hope it makes sense, doing the comparison using the checksum. in real scenario there are around 6 columns in this table |
 |
|
|
Lamprey
Master Smack Fu Yak Hacker
4614 Posts |
Posted - 2008-10-30 : 13:54:07
|
| Excellent. Now, can you describe your scenario? How you comparing the Temp table to the Destination and how do you determine if a record should be updated or inserted? |
 |
|
|
zion99
Posting Yak Master
141 Posts |
Posted - 2008-10-30 : 14:14:27
|
| [code]DELETE from TEMP where RowNumber <> 1 -- this ensures there is only one instance of each COL1Update DEST set COL1=stg.COL1,COL2=stg.COL2FROM [TEMP] stg INNER JOINDEST i ON stg.COL1 = i.COL1 where i.[checksum]<> stg.[checksum]INSERT INTO DEST (COL1, COL2) SELECT COL1, COL2 FROM [TEMP] stg Where not exists( SELECT 1 FROM DEST d WHERE d.[CheckSum] = stg.[CheckSum] ) [/code]Note: in both the tables temp and dest, CHECKSUM is a function. |
 |
|
|
Lamprey
Master Smack Fu Yak Hacker
4614 Posts |
Posted - 2008-10-30 : 14:55:47
|
You Update looks fine. ALthough, I wouldn't bother updating COL1 as that's the PK, but not really an issue.The Insert looks all wrong to me. I think you want to do something more like:INSERT DestSELECT Temp.Col1, Temp.Col2FROM TempLEFT OUTER JOIN Dest ON Temp.Col1 = Dest.Col1WHERE Dest.Col1 IS NULL Otherwise you will be/are inserting duplicate EmployeeIDs into the destination. Does that make sense or are you doing someting different? |
 |
|
|
zion99
Posting Yak Master
141 Posts |
Posted - 2008-10-31 : 07:18:33
|
one thing i can be sure of is that DEST table doesn't have any duplicates of COL1 , it has a unique nonclustered index over it and it has never failed if possible, please explain why my insert logic is incorrect? or is it more performance intensive? |
 |
|
|
Next Page
|