Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 Transact-SQL (2000)
 Finding sort of duplicates

Author  Topic 

Peter Dutch
Posting Yak Master

127 Posts

Posted - 2003-10-23 : 09:15:36
[code]
CREATE TABLE ClockTimes
(
Id INT IDENTITY(1,1),
Badge int,
ClockDate datetime,
InOut CHAR(1)
)

SET DATEFORMAT ymd

INSERT ClockTimes (Badge, Clockdate, InOut) VALUES (1010, '2003-10-23 11:00:00','I')
INSERT ClockTimes (Badge, Clockdate, InOut) VALUES (1010, '2003-10-23 11:01:00','I')
INSERT ClockTimes (Badge, Clockdate, InOut) VALUES (1500, '2003-10-23 11:00:00','O')
INSERT ClockTimes (Badge, Clockdate, InOut) VALUES (1610, '2003-10-23 11:15:00','O')
INSERT ClockTimes (Badge, Clockdate, InOut) VALUES (1712, '2003-10-23 13:15:00','I')
INSERT ClockTimes (Badge, Clockdate, InOut) VALUES (1712, '2003-10-23 13:15:00','O')
INSERT ClockTimes (Badge, Clockdate, InOut) VALUES (1813, '2003-10-23 11:00:00','O')
INSERT ClockTimes (Badge, Clockdate, InOut) VALUES (1912, '2003-10-23 11:15:00','O')
INSERT ClockTimes (Badge, Clockdate, InOut) VALUES (1010, '2003-10-23 15:15:00','O')
[/code]


The above table is a simplified version of a table we have which registers work hours of our employees. It sometimes happens employees clock accidentally twice. I need to remove these 'duplicate records'.

A duplicate record is defined as the same badgenumber clocks within a timeperiod of two minutes. In or out is not relevant. In the above sample data the first two records with bagde 1010 are duplicate (the last 1010 record isn't), and the records with badgenumber 1712 are duplicate.

When I find a 'duplicate record' I need to remove the first record.

I've been playing with this for a while now, and can't find a solution. Deleting duplicate records is easy, but that's not exactly what I need here.

Anyone have some advice?

Thanks in advance


mohdowais
Sheikh of Yak Knowledge

1456 Posts

Posted - 2003-10-23 : 10:29:20
Ok, I spent quite some time on this one, and the following solution seems to work *but* only with one duplicate. More than one duplicate and you will end up deleting the most recent records, but if you have only one duplicate it'll delete the oldest one.


delete z
from clocktimes z inner join
(
select id, badge, clockdate, inout,
(select count(b.clockdate) from clocktimes b where
a.badge = b.badge and
a.id <> b.id and
a.inout = b.inout and
datediff(mi, a.clockdate, b.clockdate) between 0 and 2) as dups
from clocktimes a
) subquery
on z.id = subquery.id
where subquery.dups = 1


If you run just the subquery, it will return the number of duplicates (within 2 minutes of that record), and the delete uses this subquery to delete all those records that have a duplicates. But you will see the problem with more than one duplicate - the first record will accurately show 2 duplicates, however, the other two records will show one duplicate each as well.

Somebody else here could improvise upon this.

Owais


Make it idiot proof and someone will make a better idiot
Go to Top of Page

Peter Dutch
Posting Yak Master

127 Posts

Posted - 2003-10-24 : 03:11:55
Woohoo!

This seems to works great, and multiple duplicates is not really a problem for me.

Thank you very much!
Go to Top of Page

AndrewMurphy
Master Smack Fu Yak Hacker

2916 Posts

Posted - 2003-10-24 : 10:43:36
I presume however....you should be implementing 'specific rules' to cater for identifying the true duplicate...

i.e...if persona clocks in @ 8.10am and @ 8.12am....are you legally obliged to always ensure you pay him/her from the 8.10am entry and not randomly from the 8.12am time?

Likewise with the clock-out time....


You've got a technological problem....but beware of the business/legal implications of any solution.
Go to Top of Page
   

- Advertisement -