Finding sort of duplicates - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

Transact-SQL (2000)

Finding sort of duplicates

Author

Topic

Peter Dutch
Posting Yak Master

127 Posts

Posted - 2003-10-23 : 09:15:36

[code]
CREATE TABLE ClockTimes
(
Id INT IDENTITY(1,1),
Badge int,
ClockDate datetime,
InOut CHAR(1)
)

SET DATEFORMAT ymd

INSERT ClockTimes (Badge, Clockdate, InOut) VALUES (1010, '2003-10-23 11:00:00','I')
INSERT ClockTimes (Badge, Clockdate, InOut) VALUES (1010, '2003-10-23 11:01:00','I')
INSERT ClockTimes (Badge, Clockdate, InOut) VALUES (1500, '2003-10-23 11:00:00','O')
INSERT ClockTimes (Badge, Clockdate, InOut) VALUES (1610, '2003-10-23 11:15:00','O')
INSERT ClockTimes (Badge, Clockdate, InOut) VALUES (1712, '2003-10-23 13:15:00','I')
INSERT ClockTimes (Badge, Clockdate, InOut) VALUES (1712, '2003-10-23 13:15:00','O')
INSERT ClockTimes (Badge, Clockdate, InOut) VALUES (1813, '2003-10-23 11:00:00','O')
INSERT ClockTimes (Badge, Clockdate, InOut) VALUES (1912, '2003-10-23 11:15:00','O')
INSERT ClockTimes (Badge, Clockdate, InOut) VALUES (1010, '2003-10-23 15:15:00','O')
[/code]

The above table is a simplified version of a table we have which registers work hours of our employees. It sometimes happens employees clock accidentally twice. I need to remove these 'duplicate records'.

A duplicate record is defined as the same badgenumber clocks within a timeperiod of two minutes. In or out is not relevant. In the above sample data the first two records with bagde 1010 are duplicate (the last 1010 record isn't), and the records with badgenumber 1712 are duplicate.

When I find a 'duplicate record' I need to remove the first record.

I've been playing with this for a while now, and can't find a solution. Deleting duplicate records is easy, but that's not exactly what I need here.

Anyone have some advice?

Thanks in advance

mohdowais
Sheikh of Yak Knowledge

1456 Posts

Posted - 2003-10-23 : 10:29:20

Ok, I spent quite some time on this one, and the following solution seems to work *but* only with one duplicate. More than one duplicate and you will end up deleting the most recent records, but if you have only one duplicate it'll delete the oldest one.


delete z
from clocktimes z inner join
(
	select id, badge, clockdate, inout,
		(select count(b.clockdate) from clocktimes b where
		a.badge = b.badge and
		a.id <> b.id and
		a.inout = b.inout and
		datediff(mi, a.clockdate, b.clockdate) between 0 and 2) as dups
	from clocktimes a
) subquery
on z.id = subquery.id
where subquery.dups = 1

If you run just the subquery, it will return the number of duplicates (within 2 minutes of that record), and the delete uses this subquery to delete all those records that have a duplicates. But you will see the problem with more than one duplicate - the first record will accurately show 2 duplicates, however, the other two records will show one duplicate each as well.

Somebody else here could improvise upon this.

Owais

Make it idiot proof and someone will make a better idiot

Peter Dutch
Posting Yak Master

127 Posts

Posted - 2003-10-24 : 03:11:55

Woohoo!

This seems to works great, and multiple duplicates is not really a problem for me.

Thank you very much!

AndrewMurphy
Master Smack Fu Yak Hacker

2916 Posts

Posted - 2003-10-24 : 10:43:36

I presume however....you should be implementing 'specific rules' to cater for identifying the true duplicate...

i.e...if persona clocks in @ 8.10am and @ 8.12am....are you legally obliged to always ensure you pay him/her from the 8.10am entry and not randomly from the 8.12am time?

Likewise with the clock-out time....

You've got a technological problem....but beware of the business/legal implications of any solution.

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources