limit deletion - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2005 Forums

Transact-SQL (2005)

limit deletion

Author

Topic

pintosack
Starting Member

4 Posts

Posted - 2007-04-18 : 15:00:36

I have some code that works great for removing duplicates on my table. The only problem is that my table is getting rather large and this query takes a few hours. I was wondering how I could limit this query to only look at the first 50 rows in the table, without scanning over every single record. I tried playing around with Rowcount, but always got syntax errors. Here is my code:
DELETE
FROM dbo.CARDRDR
WHERE dbo.CARDRDR.ID IN

(SELECT TOP 50 F.ID
FROM dbo.CARDRDR AS F
WHERE Exists (SELECT SYSTEM, OBJECT, CARDNUM, L_NAME, F_NAME, STATUS, TIME_H, TIME_M, TIME_S, DATE_Y, DATE_M, DATE_D, Count(ID)
FROM dbo.CARDRDR
WHERE dbo.CARDRDR.SYSTEM = F.SYSTEM
AND dbo.CARDRDR.OBJECT = F.OBJECT
AND dbo.CARDRDR.CARDNUM = F.CARDNUM
AND dbo.CARDRDR.L_NAME = F.L_NAME
AND dbo.CARDRDR.F_NAME = F.F_NAME
AND dbo.CARDRDR.STATUS = F.STATUS
AND dbo.CARDRDR.TIME_H = F.TIME_H
AND dbo.CARDRDR.TIME_M = F.TIME_M
AND dbo.CARDRDR.TIME_S = F.TIME_S
AND dbo.CARDRDR.DATE_Y = F.DATE_Y
AND dbo.CARDRDR.DATE_M = F.DATE_M
AND dbo.CARDRDR.DATE_D = F.DATE_D
GROUP BY dbo.CARDRDR.SYSTEM, dbo.CARDRDR.OBJECT, dbo.CARDRDR.CARDNUM, dbo.CARDRDR.L_NAME, dbo.CARDRDR.F_NAME, dbo.CARDRDR.STATUS, dbo.CARDRDR.TIME_H, dbo.CARDRDR.TIME_M, dbo.CARDRDR.TIME_S, dbo.CARDRDR.DATE_Y, dbo.CARDRDR.DATE_M, dbo.CARDRDR.DATE_D
HAVING Count(dbo.CARDRDR.ID) > 1))
AND dbo.CARDRDR.ID NOT IN

(SELECT Min(ID)
FROM dbo.CARDRDR AS F
WHERE Exists (SELECT SYSTEM, OBJECT, CARDNUM, L_NAME, F_NAME, STATUS, TIME_H, TIME_M, TIME_S, DATE_Y, DATE_M, DATE_D, Count(ID)
FROM dbo.CARDRDR
WHERE dbo.CARDRDR.SYSTEM = F.SYSTEM
AND dbo.CARDRDR.OBJECT = F.OBJECT
AND dbo.CARDRDR.CARDNUM = F.CARDNUM
AND dbo.CARDRDR.L_NAME = F.L_NAME
AND dbo.CARDRDR.F_NAME = F.F_NAME
AND dbo.CARDRDR.STATUS = F.STATUS
AND dbo.CARDRDR.TIME_H = F.TIME_H
AND dbo.CARDRDR.TIME_M = F.TIME_M
AND dbo.CARDRDR.TIME_S = F.TIME_S
AND dbo.CARDRDR.DATE_Y = F.DATE_Y
AND dbo.CARDRDR.DATE_M = F.DATE_M
AND dbo.CARDRDR.DATE_D = F.DATE_D
GROUP BY dbo.CARDRDR.SYSTEM, dbo.CARDRDR.OBJECT, dbo.CARDRDR.CARDNUM, dbo.CARDRDR.L_NAME, dbo.CARDRDR.F_NAME, dbo.CARDRDR.STATUS, dbo.CARDRDR.TIME_H, dbo.CARDRDR.TIME_M, dbo.CARDRDR.TIME_S, dbo.CARDRDR.DATE_Y, dbo.CARDRDR.DATE_M, dbo.CARDRDR.DATE_D
HAVING Count(dbo.CARDRDR.ID) > 1)
GROUP BY SYSTEM, OBJECT, CARDNUM, L_NAME, F_NAME, STATUS, TIME_H, TIME_M, TIME_S, DATE_Y, DATE_M, DATE_D);

Thanks
Joe

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2007-04-18 : 15:35:31

How are all these duplicates getting into your data? Why not disallow these duplicates by enforcing a unique constraint on these columns?

That is also an extremely inefficient way to remove duplicates from a table ... You are SELECTing from and joining to your table 4 different times when you only need to do it once! If your goal is to only keep the Min(ID) per group of unique columns, then it is *much* quicker and shorter to simply do this:

delete from yourtable where ID not in (Select min(ID) from yourtable group by col1,col2,col3,col4 .. )

But, again, manually removing duplicate entries should NOT be something that you are doing over and over ... let the database itself PREVENT duplicates from getting there in the first place via a good database design and using the features built into SQL Server such as unique constraints and primary keys.

- Jeff
http://weblogs.sqlteam.com/JeffS

pintosack
Starting Member

4 Posts

Posted - 2007-04-18 : 15:50:55

Thanks Jeff,
The duplicates are getting in there from an old card access system. I'm taking the data from 3 separate database tables and putting them into 1 sql table. I will try creating a constraint on the table like you suggested to prevent the dupes from getting in there in the first place. Also, the code you posted worked great as a temp solution.

Joe

Kristen
Test

22859 Posts

Posted - 2007-04-19 : 04:01:29

Your problem is probably the time taken to "find" the top 50 each time.

I think a better bet is to create a Temporary Table containing an Identity column and the PKs of ALL the records to be deleted (you will need a unique key for each row to achieve that), and then deleting in batches JOINing that table.

So something like:


LoopStart

DELETE D
FROM MyTable AS X
     JOIN #TEMP AS T
         ON T.MyPK = X.MyPK
WHERE T.MyIdentity BETWEEN @intStart AND @intEnd

SELECT @intStart = @intEnd +1,
       @intEnd + @intEnd + @intIncrement

LOOP

Kristen

harsh_athalye
Master Smack Fu Yak Hacker

5581 Posts

Posted - 2007-04-19 : 04:07:46

If you are using SQL 2005, TOP clause is supported for DELETE statement as well.

Harsh Athalye
India.
"The IMPOSSIBLE is often UNTRIED"

Kristen
Test

22859 Posts

Posted - 2007-04-19 : 04:31:38

I still think a complex query on a large table, with a large number of deletes to make, is going to choke on the "WHERE" clause each iteration.

But its just a gut-feeling ... although we have changed some crap-performing Deletes over the years, using temp tables as described above, with good success.

Kristen

Kristen
Test

22859 Posts

Posted - 2007-04-19 : 04:32:19

P.S. I should add that if the DELETE needs to be a single transaction then this ain't going to help!

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2007-04-19 : 08:56:18

Kristen -- his primary problem was also that his DELETE statement was at least 4 times as inefficient as it needed to be, regardless of anything else, the way it was written. The first step is to re-write the statement (as I provided), and then from there he would optimize only if necessary.

In other words, never "optimize" poorly written code -- FIRST rewrite it, and THEN optimize it (if necessary -- it usually isn't if it is well written).

- Jeff
http://weblogs.sqlteam.com/JeffS

Kristen
Test

22859 Posts

Posted - 2007-04-20 : 02:30:46

Thanks Jeff, I didn't look at it too closely. I suppose my interpretation of "problem is that my table is getting rather large" might have been subjective

Kristen

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources