Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 Ignore Duplicate Primary Keys

Author  Topic 

drewLeapLab
Starting Member

1 Post

Posted - 2008-03-27 : 12:27:23
I have one table that stores log messages generated by a web service. I have a second table where I want to store just the distinct messages from the first table. This second table has two columns one for the message and the second for the checksum of the message. The checksum column is the primary key for the table.

My query for populating the second table looks like:
INSERT INTO TransactionMessages ( message, messageHash )
SELECT DISTINCT message, CHECKSUM( message )
FROM Log
WHERE logDate BETWEEN '2008-03-26 00:00:00' AND '2008-03-26 23:59:59'
AND NOT EXISTS (
SELECT * FROM TransactionMessages
WHERE messageHash = CHECKSUM( Log.message ) )

I run this query once per day to insert the new messages from the day before. It fails when a day has two messages that have the same checksum. In this case I would like to ignore the second message and let the query proceed. I tried creating an instead of insert trigger that only inserted unique primary keys. The trigger looks like:

IF( NOT EXISTS(
SELECT TM.messageHash
FROM TransactionMessages TM, inserted I
WHERE TM.messageHash = I.messageHash
) )
BEGIN
INSERT INTO TransactionMessages ( messageHash, message )
SELECT messageHash, message FROM inserted
END

That didn't work. I think the issue is that all the rows get committed to the table at the end of the whole query. That means the trigger cannot match the duplicate primary key because the initial row has not been inserted yet.

Does anyone know the best way to do this?

Thanks for your help,
Drew

bfoster
Starting Member

30 Posts

Posted - 2008-03-27 : 14:41:45
This wouldn't necessarily get the first message so it may not meet your requirements, but I think it will only give you one messasge per distinct checksum.

INSERT INTO TransactionMessages ( message, messageHash )
SELECT MIN( message ), CHECKSUM( message )
FROM Log
WHERE logDate BETWEEN '2008-03-26 00:00:00' AND '2008-03-26 23:59:59'
GROUP BY message
Go to Top of Page

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2008-03-27 : 15:28:48
Why would you apply a checksum to a single column? That doesn't really make any sense to me.

But, if you are intent to sticking with a checksum, here is a kind of a way you could solve the problem (I don't have time to make up real sample data so I just put the checksum in the table):
DECLARE @Yak TABLE (Msg VARCHAR(50), CkSum INT)

INSERT @Yak
SELECT 'Foo', 1
UNION ALL SELECT 'Foo bar', 1
UNION ALL SELECT 'I am yak!', 2
UNION ALL SELECT 'Foo bar', 1
UNION ALL SELECT 'I am yak!', 2

-- 2005
SELECT
Msg,
CkSum
FROM
(
SELECT
Msg,
CkSum,
ROW_NUMBER() OVER (PARTITION BY CkSum ORDER BY Msg) AS RowNum
FROM
@Yak
) AS Temp
WHERE
RowNum = 1

-- 2000
SELECT
MIN(Msg),
CkSum
FROM
@Yak
GROUP BY
CkSum
Hopefully, the sample shows well enough what you need to do. :)

EDIT: I hit the button too fast.. Another option (although I wouldn't recommend it) is to set the IGNORE_DUP_KEY option on your PK to ON.
Go to Top of Page
   

- Advertisement -