Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
pintosack
Starting Member
4 Posts |
Posted - 2007-04-18 : 15:00:36
|
| I have some code that works great for removing duplicates on my table. The only problem is that my table is getting rather large and this query takes a few hours. I was wondering how I could limit this query to only look at the first 50 rows in the table, without scanning over every single record. I tried playing around with Rowcount, but always got syntax errors. Here is my code:DELETEFROM dbo.CARDRDRWHERE dbo.CARDRDR.ID IN(SELECT TOP 50 F.IDFROM dbo.CARDRDR AS FWHERE Exists (SELECT SYSTEM, OBJECT, CARDNUM, L_NAME, F_NAME, STATUS, TIME_H, TIME_M, TIME_S, DATE_Y, DATE_M, DATE_D, Count(ID)FROM dbo.CARDRDRWHERE dbo.CARDRDR.SYSTEM = F.SYSTEMAND dbo.CARDRDR.OBJECT = F.OBJECTAND dbo.CARDRDR.CARDNUM = F.CARDNUMAND dbo.CARDRDR.L_NAME = F.L_NAMEAND dbo.CARDRDR.F_NAME = F.F_NAMEAND dbo.CARDRDR.STATUS = F.STATUSAND dbo.CARDRDR.TIME_H = F.TIME_HAND dbo.CARDRDR.TIME_M = F.TIME_MAND dbo.CARDRDR.TIME_S = F.TIME_SAND dbo.CARDRDR.DATE_Y = F.DATE_YAND dbo.CARDRDR.DATE_M = F.DATE_MAND dbo.CARDRDR.DATE_D = F.DATE_DGROUP BY dbo.CARDRDR.SYSTEM, dbo.CARDRDR.OBJECT, dbo.CARDRDR.CARDNUM, dbo.CARDRDR.L_NAME, dbo.CARDRDR.F_NAME, dbo.CARDRDR.STATUS, dbo.CARDRDR.TIME_H, dbo.CARDRDR.TIME_M, dbo.CARDRDR.TIME_S, dbo.CARDRDR.DATE_Y, dbo.CARDRDR.DATE_M, dbo.CARDRDR.DATE_DHAVING Count(dbo.CARDRDR.ID) > 1))AND dbo.CARDRDR.ID NOT IN(SELECT Min(ID)FROM dbo.CARDRDR AS FWHERE Exists (SELECT SYSTEM, OBJECT, CARDNUM, L_NAME, F_NAME, STATUS, TIME_H, TIME_M, TIME_S, DATE_Y, DATE_M, DATE_D, Count(ID)FROM dbo.CARDRDRWHERE dbo.CARDRDR.SYSTEM = F.SYSTEMAND dbo.CARDRDR.OBJECT = F.OBJECTAND dbo.CARDRDR.CARDNUM = F.CARDNUMAND dbo.CARDRDR.L_NAME = F.L_NAMEAND dbo.CARDRDR.F_NAME = F.F_NAMEAND dbo.CARDRDR.STATUS = F.STATUSAND dbo.CARDRDR.TIME_H = F.TIME_HAND dbo.CARDRDR.TIME_M = F.TIME_MAND dbo.CARDRDR.TIME_S = F.TIME_SAND dbo.CARDRDR.DATE_Y = F.DATE_YAND dbo.CARDRDR.DATE_M = F.DATE_MAND dbo.CARDRDR.DATE_D = F.DATE_DGROUP BY dbo.CARDRDR.SYSTEM, dbo.CARDRDR.OBJECT, dbo.CARDRDR.CARDNUM, dbo.CARDRDR.L_NAME, dbo.CARDRDR.F_NAME, dbo.CARDRDR.STATUS, dbo.CARDRDR.TIME_H, dbo.CARDRDR.TIME_M, dbo.CARDRDR.TIME_S, dbo.CARDRDR.DATE_Y, dbo.CARDRDR.DATE_M, dbo.CARDRDR.DATE_DHAVING Count(dbo.CARDRDR.ID) > 1)GROUP BY SYSTEM, OBJECT, CARDNUM, L_NAME, F_NAME, STATUS, TIME_H, TIME_M, TIME_S, DATE_Y, DATE_M, DATE_D);ThanksJoe |
|
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2007-04-18 : 15:35:31
|
| How are all these duplicates getting into your data? Why not disallow these duplicates by enforcing a unique constraint on these columns?That is also an extremely inefficient way to remove duplicates from a table ... You are SELECTing from and joining to your table 4 different times when you only need to do it once! If your goal is to only keep the Min(ID) per group of unique columns, then it is *much* quicker and shorter to simply do this:delete from yourtable where ID not in (Select min(ID) from yourtable group by col1,col2,col3,col4 .. )But, again, manually removing duplicate entries should NOT be something that you are doing over and over ... let the database itself PREVENT duplicates from getting there in the first place via a good database design and using the features built into SQL Server such as unique constraints and primary keys.- Jeffhttp://weblogs.sqlteam.com/JeffS |
 |
|
|
pintosack
Starting Member
4 Posts |
Posted - 2007-04-18 : 15:50:55
|
| Thanks Jeff, The duplicates are getting in there from an old card access system. I'm taking the data from 3 separate database tables and putting them into 1 sql table. I will try creating a constraint on the table like you suggested to prevent the dupes from getting in there in the first place. Also, the code you posted worked great as a temp solution.Joe |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2007-04-19 : 04:01:29
|
Your problem is probably the time taken to "find" the top 50 each time.I think a better bet is to create a Temporary Table containing an Identity column and the PKs of ALL the records to be deleted (you will need a unique key for each row to achieve that), and then deleting in batches JOINing that table.So something like:LoopStartDELETE DFROM MyTable AS X JOIN #TEMP AS T ON T.MyPK = X.MyPKWHERE T.MyIdentity BETWEEN @intStart AND @intEndSELECT @intStart = @intEnd +1, @intEnd + @intEnd + @intIncrementLOOP Kristen |
 |
|
|
harsh_athalye
Master Smack Fu Yak Hacker
5581 Posts |
Posted - 2007-04-19 : 04:07:46
|
| If you are using SQL 2005, TOP clause is supported for DELETE statement as well.Harsh AthalyeIndia."The IMPOSSIBLE is often UNTRIED" |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2007-04-19 : 04:31:38
|
| I still think a complex query on a large table, with a large number of deletes to make, is going to choke on the "WHERE" clause each iteration.But its just a gut-feeling ... although we have changed some crap-performing Deletes over the years, using temp tables as described above, with good success.Kristen |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2007-04-19 : 04:32:19
|
| P.S. I should add that if the DELETE needs to be a single transaction then this ain't going to help! |
 |
|
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2007-04-19 : 08:56:18
|
| Kristen -- his primary problem was also that his DELETE statement was at least 4 times as inefficient as it needed to be, regardless of anything else, the way it was written. The first step is to re-write the statement (as I provided), and then from there he would optimize only if necessary. In other words, never "optimize" poorly written code -- FIRST rewrite it, and THEN optimize it (if necessary -- it usually isn't if it is well written).- Jeffhttp://weblogs.sqlteam.com/JeffS |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2007-04-20 : 02:30:46
|
Thanks Jeff, I didn't look at it too closely. I suppose my interpretation of "problem is that my table is getting rather large" might have been subjective Kristen |
 |
|
|
|
|
|
|
|