Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Administration (2000)
 deleting lots of rows, halting the system

Author  Topic 

mike123
Master Smack Fu Yak Hacker

1462 Posts

Posted - 2006-09-29 : 18:12:10
Hi,

I have a table with about 15 million records that is used for an internal mail system. Alot of these rows are just taking up space and I want to delete them, however every time I run a query to try it the system comes to a halt.

By halt, I mean I let the query run at 4am for 30 minutes and it doesnt complete. This is a live server and all other queries get timed out, so its definatley not something I can try again. Also it is filling up the transaction log.

Is there anyway I can delete these records without halting the system? I don't really care if it takes two days, its not urgent just that I dont have the storage space anymore:)


Thanks very much for any suggestions

mike123

Kristen
Test

22859 Posts

Posted - 2006-09-29 : 19:06:32
We do something like:

DECLARE @intRowCount int,
@intBatchSize int,
@intRowTotal int,
@dtStart datetime,
@intElapsedS int

SELECT @intRowCount = 1, -- Force first loop iteration
@intBatchSize = 100000, -- Records to delete in each batch
@intRowTotal = 0

WHILE @intRowCount >= 1
BEGIN
SET ROWCOUNT @intBatchSize -- Restrict ALL subsequent operations to batch size
BEGIN TRANSACTION
SELECT @dtStart = GetDate()
DELETE FROM MyTable WHERE ...
SELECT @intRowCount = @@ROWCOUNT, -- Rows actually deleted
@intRowTotal = @intRowTotal + @intRowCount -- Total rows
COMMIT
SET ROWCOUNT 0 -- Remove the restriction
SELECT @intElapsedS = DATEDIFF(Second, @dtStart, GetDate())
IF @intElapsedS > 120
BEGIN
SELECT @intBatchSize = @intBatchSize / 2 -- Reduce batch size
END
ELSE
IF @intElapsedS < 30
BEGIN
SELECT @intBatchSize = @intBatchSize + (@intBatchSize / 4) -- Increase batch size
END

-- ... Optionally backup TLog if @intRowTotal > 1,000,000 and reset @intRowTotal to 0 ...
WAITFOR DELAY '000:00:05' -- Allow other queries to run
END

it adjusts the batch size when the machine is busy, can take a TLog backup every 1,000,000 deletions, or so, which stops the LDF file getting huge, and pauses after each loop to allow other tasks to run.

You could set the batch sufficiently small that it can run all the time, or schedule it for 4AM each and limit it to looping, say, 100 times.

Kristen
Go to Top of Page

EugeneZ
Starting Member

26 Posts

Posted - 2006-09-29 : 19:41:04
try to copy "good" records in sometemptable
then:
TRUNCATE TABLE yourmainTable
---truncate is fast not logged operation unlike DELETE
and then
insert into yourmainTable
select * from sometemptable
Go to Top of Page

byrmol
Shed Building SQL Farmer

1591 Posts

Posted - 2006-09-29 : 23:24:01
>>---truncate is fast not logged operation unlike DELETE

Not quite true.... truncate is minimally logged

DavidM

Production is just another testing cycle
Go to Top of Page

Merkin
Funky Drop Bear Fearing SQL Dude!

4970 Posts

Posted - 2006-09-29 : 23:36:20
Another thing to check is whether you have any foreign keys referencing the table you are deleting from. When you delete a row, it's checking that you are not orphaning a row somewhere else. This can bring everything to a crawl in a big table.


Damian
"A foolish consistency is the hobgoblin of little minds." - Emerson
Go to Top of Page

mike123
Master Smack Fu Yak Hacker

1462 Posts

Posted - 2006-10-02 : 17:12:59

Hi Kristen,

I tried running your statement, but I still have had to stop it after a couple minutes of running. I even lowered the batch size to 2 as seen in the line below.

@intBatchSize = 2, -- Records to delete in each batch

This does sound like a great approach, just not sure why its running like this? Am I integrating my query incorrectly? You can see my DELETE statement below.

Also, you say it adjusts when the system is too busy. I am testing this on a development server that doesnt have any other load. Would this be the problem?

It looks to me like my DELETE query isn't structured the way its supposed to be?


Hey Damian, I do have some contraints but nothing too ridiculous I don't think. Plus I don't think removing them is a good idea?

Thanks everyone for the tips :)

Mike123



DECLARE @intRowCount int,
@intBatchSize int,
@intRowTotal int,
@dtStart datetime,
@intElapsedS int

SELECT @intRowCount = 1, -- Force first loop iteration
@intBatchSize = 2, -- Records to delete in each batch
@intRowTotal = 0

WHILE @intRowCount >= 1
BEGIN
SET ROWCOUNT @intBatchSize -- Restrict ALL subsequent operations to batch size
BEGIN TRANSACTION
SELECT @dtStart = GetDate()

delete tblmessage where deletedbysender =1 and deletedbyrecipient =1


SELECT @intRowCount = @@ROWCOUNT, -- Rows actually deleted
@intRowTotal = @intRowTotal + @intRowCount -- Total rows
COMMIT
SET ROWCOUNT 0 -- Remove the restriction
SELECT @intElapsedS = DATEDIFF(Second, @dtStart, GetDate())
IF @intElapsedS > 120
BEGIN
SELECT @intBatchSize = @intBatchSize / 2 -- Reduce batch size
END
ELSE
IF @intElapsedS < 30
BEGIN
SELECT @intBatchSize = @intBatchSize + (@intBatchSize / 4) -- Increase batch size
END

-- ... Optionally backup TLog if @intRowTotal > 1,000,000 and reset @intRowTotal to 0 ...
WAITFOR DELAY '000:00:05' -- Allow other queries to run
END
Go to Top of Page

Merkin
Funky Drop Bear Fearing SQL Dude!

4970 Posts

Posted - 2006-10-02 : 18:46:08
There isn't a value of ridiculousness on constraints.
You can always write your script to drop them, do your deletes then recreate them. Basically it's checking every row in the other table to make sure it isn't breaking integrity by deleting, and if you don't have an index in that other table, it's doing a table scan.

Give dropping and recreating the constraint a try.


Damian
"A foolish consistency is the hobgoblin of little minds." - Emerson
Go to Top of Page

eyechart
Master Smack Fu Yak Hacker

3575 Posts

Posted - 2006-10-02 : 19:10:53
as merkin pointed out, if you have foreign keys they are going to get checked for each delete. Make sure that each FK is indexed, so that the lookups are faster. by default, FK constraints are not indexed.



-ec
Go to Top of Page

mike123
Master Smack Fu Yak Hacker

1462 Posts

Posted - 2006-10-02 : 20:23:59
damn, this is alot more complex than I thought.

I've dropped all FK constraints, and it's still slow =[.

Any help appreciated.. I'd still like to try the original suggestion as I think thats good in theory if we can get it to work the way I'd like to.


Thanks very much once again!!
mike123


Here's some table info in case its any use.

tblMessage
rows:8204151
reserved: 4498560 KB
data: 889536 KB
index size: 3605568 KB
unused: 3456 KB
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2006-10-03 : 01:46:42
"damn, this is alot more complex than I thought"

Hey, have one of my T-shirts, I've got loads!

I find that my first loop takes ages. (I'm talking BIG [for us] tables that we are doing this on, you are in the same boat). I reckon it drags all the data into memory, or takes ages optimising the query, or somesuch.

You ought to output the elapsed time for each iteration so you can see how its jumping around. Kill the automatic adjust of batch size for now. What I'm after is you set the batch size to sometime reasonable, call it 1,000 rows, and see how the individual loop elapsed time jumps around - your "quiet" DEV system will be ideal for that. If it hits, say, 10 seconds regular as clockwork great, but our systems jump around - probably fragmentation in the tables and the way the wind is blowing etc.

The other thought that occurs to me is that:

delete tblmessage where deletedbysender =1 and deletedbyrecipient =1

may be doing a full table scan, making a list of everything to delete, and then only doing the first @intBatchSize deletions.

You may need to help it by also limiting the range of records that are checked, using the Clustered Index on your table (or the PK if you don't have a clustered index)

How long does

SELECT TOP 1000 MyPKColumn
INTO #TEMP
FROM tblmessage
WHERE deletedbysender =1 and deletedbyrecipient =1
ORDER BY MyPKColumn

take to run?

and having done that how long does:

DELETE D
FROM #TEMP AS T
JOIN tblmessage AS D
ON D.MyPKColumn = T.MyPKColumn
WHERE deletedbysender =1 and deletedbyrecipient =1 -- Belt and braces!

take?

If that ups the speed then you need to change the delete mechanics to first grab a list of N,000 rows into a temporary table, and then use that as the basis for a delete.

An index on

deletedbysender, deletedbyrecipient, MyPKColumn

would help here, but it may be a bit of an overkill.

If most of the records do NOT have both these set AND you have the enterprise version of SQL Server you could set up an Indexed View on:

CREATE MyView
AS
SELECT MyPKColumn
FROM tblmessage
WHERE deletedbysender =1 and deletedbyrecipient =1

and index that. That way the index would only be holding the appropriate rows, rather than also all the inappropriate ones, and then your delete will run way faster (assuming the query planner uses the index [it will ONLY use an the index from an Indexed View in the Enterprise Version])

Another rambling thought:

If there is no way around the killer "find all the ones flagged for deletion" then get that out of the way first:

CREATE TABLE #TEMP
(
T_ID int identity NOT NULL,
T_MyPKColumn datatype(999) NOT NULL,
PRIMARY KEY
(
T_ID
)
)

INSERT INTO #TEMP(T_MyPKColumn)
SELECT MyPKColumn
FROM tblmessage
WHERE deletedbysender =1 and deletedbyrecipient =1
ORDER BY MyPKColumn -- I think this helps the Clustered Index Insert on PK, but not sure

and then

DECLARE @intStartID int
SET @intStartID = 1

DELETE D
FROM #TEMP AS T
JOIN tblmessage AS D
ON D.MyPKColumn = T.MyPKColumn
WHERE T_ID BETWEEN @intStartID AND @intStartID + @intBatchSize
AND deletedbysender =1 and deletedbyrecipient =1 -- Belt and braces!

SELECT @intStartID = @intStartID + @intBatchSize -- Increment start point

Loop round until no more to delete (you can use @@ROWCOUNT=0 or MAX(T_ID) for that purpose)

Adjust the @intBatchSize on each iteration if you like/need.

Kristen
Go to Top of Page

mike123
Master Smack Fu Yak Hacker

1462 Posts

Posted - 2006-10-17 : 00:47:06
Hi Kristen,
Sorry for the delay on getting back to you. I just ended up getting it working with your suggestions. I don't have enterprise edition, so I ended up with the solution below.

It's a bit of a hassle, but it works! :)

Thanks!
Mike123

DECLARE @intRowCount int,
@intBatchSize int,
@intRowTotal int,
@dtStart datetime,
@intElapsedS int,
@intTotalMax int

SELECT @intRowCount = 1, -- Force first loop iteration
@intBatchSize = 2, -- Records to delete in each batch
@intRowTotal = 0,
@intTotalMax = 0

WHILE @intRowCount >= 1 or @intTotalMax <= 100
BEGIN
SET ROWCOUNT @intBatchSize -- Restrict ALL subsequent operations to batch size
BEGIN TRANSACTION
SELECT @dtStart = GetDate()


SELECT TOP 250 instantMessageID
INTO #TEMP_IM
FROM tblinstantmessage
WHERE deletedbysender =1 and deletedbyrecipient =1
--ORDER BY instantMessageID

DELETE D
FROM #TEMP_IM AS T
JOIN tblinstantmessage AS D
ON D.instantMessageID = T.instantMessageID
-- WHERE deletedbysender =1 and deletedbyrecipient =1 -- Belt and braces!

drop table #TEMP_IM



SELECT @intRowCount = @@ROWCOUNT, -- Rows actually deleted
@intRowTotal = @intRowTotal + @intRowCount -- Total rows
COMMIT
SET ROWCOUNT 0 -- Remove the restriction
SELECT @intElapsedS = DATEDIFF(Second, @dtStart, GetDate())
IF @intElapsedS > 120
BEGIN
SELECT @intBatchSize = @intBatchSize / 2 -- Reduce batch size
END
ELSE
IF @intElapsedS < 30
BEGIN
SELECT @intBatchSize = @intBatchSize + (@intBatchSize / 4) -- Increase batch size
END

SELECT @intTotalMax = @intTotalMax + 1

-- ... Optionally backup TLog if @intRowTotal > 1,000,000 and reset @intRowTotal to 0 ...
WAITFOR DELAY '000:00:10' -- Allow other queries to run
END
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2006-10-17 : 13:42:10
Note that, given where it is positioned:

SET ROWCOUNT @intBatchSize

will also restrict the size of:

SELECT TOP 250 instantMessageID

so you may want to move the SET ROWCOUNT down so that it only influences the statement it needs to control. Your TOP 250 will also prevent deletion of more than that number of rows - if it is performing quickly enough.

Kristen
Go to Top of Page

mike123
Master Smack Fu Yak Hacker

1462 Posts

Posted - 2006-10-18 : 00:27:55
good eye, thanks !:)
Go to Top of Page
   

- Advertisement -