Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 limit deletion

Author  Topic 

pintosack
Starting Member

4 Posts

Posted - 2007-04-18 : 15:00:36
I have some code that works great for removing duplicates on my table. The only problem is that my table is getting rather large and this query takes a few hours. I was wondering how I could limit this query to only look at the first 50 rows in the table, without scanning over every single record. I tried playing around with Rowcount, but always got syntax errors. Here is my code:
DELETE
FROM dbo.CARDRDR
WHERE dbo.CARDRDR.ID IN

(SELECT TOP 50 F.ID
FROM dbo.CARDRDR AS F
WHERE Exists (SELECT SYSTEM, OBJECT, CARDNUM, L_NAME, F_NAME, STATUS, TIME_H, TIME_M, TIME_S, DATE_Y, DATE_M, DATE_D, Count(ID)
FROM dbo.CARDRDR
WHERE dbo.CARDRDR.SYSTEM = F.SYSTEM
AND dbo.CARDRDR.OBJECT = F.OBJECT
AND dbo.CARDRDR.CARDNUM = F.CARDNUM
AND dbo.CARDRDR.L_NAME = F.L_NAME
AND dbo.CARDRDR.F_NAME = F.F_NAME
AND dbo.CARDRDR.STATUS = F.STATUS
AND dbo.CARDRDR.TIME_H = F.TIME_H
AND dbo.CARDRDR.TIME_M = F.TIME_M
AND dbo.CARDRDR.TIME_S = F.TIME_S
AND dbo.CARDRDR.DATE_Y = F.DATE_Y
AND dbo.CARDRDR.DATE_M = F.DATE_M
AND dbo.CARDRDR.DATE_D = F.DATE_D
GROUP BY dbo.CARDRDR.SYSTEM, dbo.CARDRDR.OBJECT, dbo.CARDRDR.CARDNUM, dbo.CARDRDR.L_NAME, dbo.CARDRDR.F_NAME, dbo.CARDRDR.STATUS, dbo.CARDRDR.TIME_H, dbo.CARDRDR.TIME_M, dbo.CARDRDR.TIME_S, dbo.CARDRDR.DATE_Y, dbo.CARDRDR.DATE_M, dbo.CARDRDR.DATE_D
HAVING Count(dbo.CARDRDR.ID) > 1))
AND dbo.CARDRDR.ID NOT IN

(SELECT Min(ID)
FROM dbo.CARDRDR AS F
WHERE Exists (SELECT SYSTEM, OBJECT, CARDNUM, L_NAME, F_NAME, STATUS, TIME_H, TIME_M, TIME_S, DATE_Y, DATE_M, DATE_D, Count(ID)
FROM dbo.CARDRDR
WHERE dbo.CARDRDR.SYSTEM = F.SYSTEM
AND dbo.CARDRDR.OBJECT = F.OBJECT
AND dbo.CARDRDR.CARDNUM = F.CARDNUM
AND dbo.CARDRDR.L_NAME = F.L_NAME
AND dbo.CARDRDR.F_NAME = F.F_NAME
AND dbo.CARDRDR.STATUS = F.STATUS
AND dbo.CARDRDR.TIME_H = F.TIME_H
AND dbo.CARDRDR.TIME_M = F.TIME_M
AND dbo.CARDRDR.TIME_S = F.TIME_S
AND dbo.CARDRDR.DATE_Y = F.DATE_Y
AND dbo.CARDRDR.DATE_M = F.DATE_M
AND dbo.CARDRDR.DATE_D = F.DATE_D
GROUP BY dbo.CARDRDR.SYSTEM, dbo.CARDRDR.OBJECT, dbo.CARDRDR.CARDNUM, dbo.CARDRDR.L_NAME, dbo.CARDRDR.F_NAME, dbo.CARDRDR.STATUS, dbo.CARDRDR.TIME_H, dbo.CARDRDR.TIME_M, dbo.CARDRDR.TIME_S, dbo.CARDRDR.DATE_Y, dbo.CARDRDR.DATE_M, dbo.CARDRDR.DATE_D
HAVING Count(dbo.CARDRDR.ID) > 1)
GROUP BY SYSTEM, OBJECT, CARDNUM, L_NAME, F_NAME, STATUS, TIME_H, TIME_M, TIME_S, DATE_Y, DATE_M, DATE_D);

Thanks
Joe

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2007-04-18 : 15:35:31
How are all these duplicates getting into your data? Why not disallow these duplicates by enforcing a unique constraint on these columns?

That is also an extremely inefficient way to remove duplicates from a table ... You are SELECTing from and joining to your table 4 different times when you only need to do it once! If your goal is to only keep the Min(ID) per group of unique columns, then it is *much* quicker and shorter to simply do this:

delete from yourtable where ID not in (Select min(ID) from yourtable group by col1,col2,col3,col4 .. )

But, again, manually removing duplicate entries should NOT be something that you are doing over and over ... let the database itself PREVENT duplicates from getting there in the first place via a good database design and using the features built into SQL Server such as unique constraints and primary keys.

- Jeff
http://weblogs.sqlteam.com/JeffS
Go to Top of Page

pintosack
Starting Member

4 Posts

Posted - 2007-04-18 : 15:50:55
Thanks Jeff,
The duplicates are getting in there from an old card access system. I'm taking the data from 3 separate database tables and putting them into 1 sql table. I will try creating a constraint on the table like you suggested to prevent the dupes from getting in there in the first place. Also, the code you posted worked great as a temp solution.

Joe
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2007-04-19 : 04:01:29
Your problem is probably the time taken to "find" the top 50 each time.

I think a better bet is to create a Temporary Table containing an Identity column and the PKs of ALL the records to be deleted (you will need a unique key for each row to achieve that), and then deleting in batches JOINing that table.

So something like:

LoopStart

DELETE D
FROM MyTable AS X
JOIN #TEMP AS T
ON T.MyPK = X.MyPK
WHERE T.MyIdentity BETWEEN @intStart AND @intEnd

SELECT @intStart = @intEnd +1,
@intEnd + @intEnd + @intIncrement

LOOP

Kristen
Go to Top of Page

harsh_athalye
Master Smack Fu Yak Hacker

5581 Posts

Posted - 2007-04-19 : 04:07:46
If you are using SQL 2005, TOP clause is supported for DELETE statement as well.

Harsh Athalye
India.
"The IMPOSSIBLE is often UNTRIED"
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2007-04-19 : 04:31:38
I still think a complex query on a large table, with a large number of deletes to make, is going to choke on the "WHERE" clause each iteration.

But its just a gut-feeling ... although we have changed some crap-performing Deletes over the years, using temp tables as described above, with good success.

Kristen
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2007-04-19 : 04:32:19
P.S. I should add that if the DELETE needs to be a single transaction then this ain't going to help!
Go to Top of Page

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2007-04-19 : 08:56:18
Kristen -- his primary problem was also that his DELETE statement was at least 4 times as inefficient as it needed to be, regardless of anything else, the way it was written. The first step is to re-write the statement (as I provided), and then from there he would optimize only if necessary.

In other words, never "optimize" poorly written code -- FIRST rewrite it, and THEN optimize it (if necessary -- it usually isn't if it is well written).

- Jeff
http://weblogs.sqlteam.com/JeffS
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2007-04-20 : 02:30:46
Thanks Jeff, I didn't look at it too closely. I suppose my interpretation of "problem is that my table is getting rather large" might have been subjective

Kristen
Go to Top of Page
   

- Advertisement -