Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
mike123
Master Smack Fu Yak Hacker
1462 Posts |
Posted - 2006-09-29 : 18:12:10
|
Hi,I have a table with about 15 million records that is used for an internal mail system. Alot of these rows are just taking up space and I want to delete them, however every time I run a query to try it the system comes to a halt. By halt, I mean I let the query run at 4am for 30 minutes and it doesnt complete. This is a live server and all other queries get timed out, so its definatley not something I can try again. Also it is filling up the transaction log.Is there anyway I can delete these records without halting the system? I don't really care if it takes two days, its not urgent just that I dont have the storage space anymore:)Thanks very much for any suggestionsmike123 |
|
Kristen
Test
22859 Posts |
Posted - 2006-09-29 : 19:06:32
|
We do something like:DECLARE @intRowCount int, @intBatchSize int, @intRowTotal int, @dtStart datetime, @intElapsedS intSELECT @intRowCount = 1, -- Force first loop iteration @intBatchSize = 100000, -- Records to delete in each batch @intRowTotal = 0WHILE @intRowCount >= 1BEGIN SET ROWCOUNT @intBatchSize -- Restrict ALL subsequent operations to batch size BEGIN TRANSACTION SELECT @dtStart = GetDate() DELETE FROM MyTable WHERE ... SELECT @intRowCount = @@ROWCOUNT, -- Rows actually deleted @intRowTotal = @intRowTotal + @intRowCount -- Total rows COMMIT SET ROWCOUNT 0 -- Remove the restriction SELECT @intElapsedS = DATEDIFF(Second, @dtStart, GetDate()) IF @intElapsedS > 120 BEGIN SELECT @intBatchSize = @intBatchSize / 2 -- Reduce batch size END ELSE IF @intElapsedS < 30 BEGIN SELECT @intBatchSize = @intBatchSize + (@intBatchSize / 4) -- Increase batch size END -- ... Optionally backup TLog if @intRowTotal > 1,000,000 and reset @intRowTotal to 0 ... WAITFOR DELAY '000:00:05' -- Allow other queries to runEND it adjusts the batch size when the machine is busy, can take a TLog backup every 1,000,000 deletions, or so, which stops the LDF file getting huge, and pauses after each loop to allow other tasks to run.You could set the batch sufficiently small that it can run all the time, or schedule it for 4AM each and limit it to looping, say, 100 times.Kristen |
 |
|
EugeneZ
Starting Member
26 Posts |
Posted - 2006-09-29 : 19:41:04
|
try to copy "good" records in sometemptable then:TRUNCATE TABLE yourmainTable---truncate is fast not logged operation unlike DELETEand then insert into yourmainTableselect * from sometemptable |
 |
|
byrmol
Shed Building SQL Farmer
1591 Posts |
Posted - 2006-09-29 : 23:24:01
|
>>---truncate is fast not logged operation unlike DELETENot quite true.... truncate is minimally loggedDavidMProduction is just another testing cycle |
 |
|
Merkin
Funky Drop Bear Fearing SQL Dude!
4970 Posts |
Posted - 2006-09-29 : 23:36:20
|
Another thing to check is whether you have any foreign keys referencing the table you are deleting from. When you delete a row, it's checking that you are not orphaning a row somewhere else. This can bring everything to a crawl in a big table.Damian"A foolish consistency is the hobgoblin of little minds." - Emerson |
 |
|
mike123
Master Smack Fu Yak Hacker
1462 Posts |
Posted - 2006-10-02 : 17:12:59
|
Hi Kristen,I tried running your statement, but I still have had to stop it after a couple minutes of running. I even lowered the batch size to 2 as seen in the line below.@intBatchSize = 2, -- Records to delete in each batchThis does sound like a great approach, just not sure why its running like this? Am I integrating my query incorrectly? You can see my DELETE statement below.Also, you say it adjusts when the system is too busy. I am testing this on a development server that doesnt have any other load. Would this be the problem? It looks to me like my DELETE query isn't structured the way its supposed to be?Hey Damian, I do have some contraints but nothing too ridiculous I don't think. Plus I don't think removing them is a good idea? Thanks everyone for the tips :)Mike123DECLARE @intRowCount int, @intBatchSize int, @intRowTotal int, @dtStart datetime, @intElapsedS intSELECT @intRowCount = 1, -- Force first loop iteration @intBatchSize = 2, -- Records to delete in each batch @intRowTotal = 0WHILE @intRowCount >= 1BEGIN SET ROWCOUNT @intBatchSize -- Restrict ALL subsequent operations to batch size BEGIN TRANSACTION SELECT @dtStart = GetDate() delete tblmessage where deletedbysender =1 and deletedbyrecipient =1 SELECT @intRowCount = @@ROWCOUNT, -- Rows actually deleted @intRowTotal = @intRowTotal + @intRowCount -- Total rows COMMIT SET ROWCOUNT 0 -- Remove the restriction SELECT @intElapsedS = DATEDIFF(Second, @dtStart, GetDate()) IF @intElapsedS > 120 BEGIN SELECT @intBatchSize = @intBatchSize / 2 -- Reduce batch size END ELSE IF @intElapsedS < 30 BEGIN SELECT @intBatchSize = @intBatchSize + (@intBatchSize / 4) -- Increase batch size END -- ... Optionally backup TLog if @intRowTotal > 1,000,000 and reset @intRowTotal to 0 ... WAITFOR DELAY '000:00:05' -- Allow other queries to runEND |
 |
|
Merkin
Funky Drop Bear Fearing SQL Dude!
4970 Posts |
Posted - 2006-10-02 : 18:46:08
|
There isn't a value of ridiculousness on constraints.You can always write your script to drop them, do your deletes then recreate them. Basically it's checking every row in the other table to make sure it isn't breaking integrity by deleting, and if you don't have an index in that other table, it's doing a table scan.Give dropping and recreating the constraint a try.Damian"A foolish consistency is the hobgoblin of little minds." - Emerson |
 |
|
eyechart
Master Smack Fu Yak Hacker
3575 Posts |
Posted - 2006-10-02 : 19:10:53
|
as merkin pointed out, if you have foreign keys they are going to get checked for each delete. Make sure that each FK is indexed, so that the lookups are faster. by default, FK constraints are not indexed.-ec |
 |
|
mike123
Master Smack Fu Yak Hacker
1462 Posts |
Posted - 2006-10-02 : 20:23:59
|
damn, this is alot more complex than I thought. I've dropped all FK constraints, and it's still slow =[.Any help appreciated.. I'd still like to try the original suggestion as I think thats good in theory if we can get it to work the way I'd like to.Thanks very much once again!! mike123Here's some table info in case its any use.tblMessagerows:8204151reserved: 4498560 KBdata: 889536 KBindex size: 3605568 KBunused: 3456 KB |
 |
|
Kristen
Test
22859 Posts |
Posted - 2006-10-03 : 01:46:42
|
"damn, this is alot more complex than I thought"Hey, have one of my T-shirts, I've got loads!I find that my first loop takes ages. (I'm talking BIG [for us] tables that we are doing this on, you are in the same boat). I reckon it drags all the data into memory, or takes ages optimising the query, or somesuch.You ought to output the elapsed time for each iteration so you can see how its jumping around. Kill the automatic adjust of batch size for now. What I'm after is you set the batch size to sometime reasonable, call it 1,000 rows, and see how the individual loop elapsed time jumps around - your "quiet" DEV system will be ideal for that. If it hits, say, 10 seconds regular as clockwork great, but our systems jump around - probably fragmentation in the tables and the way the wind is blowing etc. The other thought that occurs to me is that:delete tblmessage where deletedbysender =1 and deletedbyrecipient =1 may be doing a full table scan, making a list of everything to delete, and then only doing the first @intBatchSize deletions.You may need to help it by also limiting the range of records that are checked, using the Clustered Index on your table (or the PK if you don't have a clustered index)How long doesSELECT TOP 1000 MyPKColumnINTO #TEMPFROM tblmessage WHERE deletedbysender =1 and deletedbyrecipient =1ORDER BY MyPKColumn take to run?and having done that how long does:DELETE DFROM #TEMP AS T JOIN tblmessage AS D ON D.MyPKColumn = T.MyPKColumnWHERE deletedbysender =1 and deletedbyrecipient =1 -- Belt and braces! take?If that ups the speed then you need to change the delete mechanics to first grab a list of N,000 rows into a temporary table, and then use that as the basis for a delete.An index on deletedbysender, deletedbyrecipient, MyPKColumnwould help here, but it may be a bit of an overkill.If most of the records do NOT have both these set AND you have the enterprise version of SQL Server you could set up an Indexed View on:CREATE MyViewASSELECT MyPKColumnFROM tblmessageWHERE deletedbysender =1 and deletedbyrecipient =1and index that. That way the index would only be holding the appropriate rows, rather than also all the inappropriate ones, and then your delete will run way faster (assuming the query planner uses the index [it will ONLY use an the index from an Indexed View in the Enterprise Version])Another rambling thought:If there is no way around the killer "find all the ones flagged for deletion" then get that out of the way first:CREATE TABLE #TEMP( T_ID int identity NOT NULL, T_MyPKColumn datatype(999) NOT NULL, PRIMARY KEY ( T_ID ))INSERT INTO #TEMP(T_MyPKColumn)SELECT MyPKColumnFROM tblmessageWHERE deletedbysender =1 and deletedbyrecipient =1ORDER BY MyPKColumn -- I think this helps the Clustered Index Insert on PK, but not sure and thenDECLARE @intStartID intSET @intStartID = 1DELETE DFROM #TEMP AS T JOIN tblmessage AS D ON D.MyPKColumn = T.MyPKColumnWHERE T_ID BETWEEN @intStartID AND @intStartID + @intBatchSize AND deletedbysender =1 and deletedbyrecipient =1 -- Belt and braces!SELECT @intStartID = @intStartID + @intBatchSize -- Increment start point Loop round until no more to delete (you can use @@ROWCOUNT=0 or MAX(T_ID) for that purpose)Adjust the @intBatchSize on each iteration if you like/need.Kristen |
 |
|
mike123
Master Smack Fu Yak Hacker
1462 Posts |
Posted - 2006-10-17 : 00:47:06
|
Hi Kristen,Sorry for the delay on getting back to you. I just ended up getting it working with your suggestions. I don't have enterprise edition, so I ended up with the solution below. It's a bit of a hassle, but it works! :)Thanks!Mike123DECLARE @intRowCount int,@intBatchSize int,@intRowTotal int,@dtStart datetime,@intElapsedS int,@intTotalMax intSELECT @intRowCount = 1, -- Force first loop iteration@intBatchSize = 2, -- Records to delete in each batch@intRowTotal = 0,@intTotalMax = 0WHILE @intRowCount >= 1 or @intTotalMax <= 100BEGINSET ROWCOUNT @intBatchSize -- Restrict ALL subsequent operations to batch sizeBEGIN TRANSACTIONSELECT @dtStart = GetDate() SELECT TOP 250 instantMessageIDINTO #TEMP_IMFROM tblinstantmessage WHERE deletedbysender =1 and deletedbyrecipient =1--ORDER BY instantMessageIDDELETE DFROM #TEMP_IM AS T JOIN tblinstantmessage AS D ON D.instantMessageID = T.instantMessageID-- WHERE deletedbysender =1 and deletedbyrecipient =1 -- Belt and braces!drop table #TEMP_IMSELECT @intRowCount = @@ROWCOUNT, -- Rows actually deleted@intRowTotal = @intRowTotal + @intRowCount -- Total rowsCOMMITSET ROWCOUNT 0 -- Remove the restrictionSELECT @intElapsedS = DATEDIFF(Second, @dtStart, GetDate())IF @intElapsedS > 120BEGINSELECT @intBatchSize = @intBatchSize / 2 -- Reduce batch sizeENDELSEIF @intElapsedS < 30BEGINSELECT @intBatchSize = @intBatchSize + (@intBatchSize / 4) -- Increase batch sizeENDSELECT @intTotalMax = @intTotalMax + 1-- ... Optionally backup TLog if @intRowTotal > 1,000,000 and reset @intRowTotal to 0 ...WAITFOR DELAY '000:00:10' -- Allow other queries to runEND |
 |
|
Kristen
Test
22859 Posts |
Posted - 2006-10-17 : 13:42:10
|
Note that, given where it is positioned:SET ROWCOUNT @intBatchSize will also restrict the size of:SELECT TOP 250 instantMessageIDso you may want to move the SET ROWCOUNT down so that it only influences the statement it needs to control. Your TOP 250 will also prevent deletion of more than that number of rows - if it is performing quickly enough.Kristen |
 |
|
mike123
Master Smack Fu Yak Hacker
1462 Posts |
Posted - 2006-10-18 : 00:27:55
|
good eye, thanks !:) |
 |
|
|
|
|
|
|