Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 Is this the best delete? both speed and log size

Author  Topic 

waterduck
Aged Yak Warrior

982 Posts

Posted - 2011-06-08 : 03:31:28
Hi,

I need to perform huge delete from a table that consist of more than 30m rows. My query as following:

declare @rows int = 1
while(@rows > 0)
begin
delete top (100000) a
from Bigtable a
left join zdate b
on a.record_date = b.record_date
where b.date > '20091231' and b.date < '20100201'
set @rows = @@rowcount
end

Everyday, SSIS will insert both 1m+ rows of new data and delete 1m+ rows of old data into this table. for insert, i just perform normal insert statement(due to easy to maintain)...so nothing can do much to inserting....so now i just target delete statement to be more fast.


thx alot ^^

robvolk
Most Valuable Yak

15732 Posts

Posted - 2011-06-08 : 07:46:59
Why are you using a left join? This is equivalent:

while exists(select * from Bigtable where [date] > '20091231' and [date] < '20100201')
begin
delete top (100000) from Bigtable
where [date] > '20091231' and [date] < '20100201'
end
quote:
Everyday, SSIS will insert both 1m+ rows of new data and delete 1m+ rows of old data into this table. for insert, i just perform normal insert statement(due to easy to maintain)...so nothing can do much to inserting....so now i just target delete statement to be more fast.
I don't understand this part. Your SSIS package is inserting data and then deleting it afterward? Is it inserting data you don't want and that's why you're deleting it?
Go to Top of Page

nigelrivett
Master Smack Fu Yak Hacker

3385 Posts

Posted - 2011-06-08 : 07:55:45
The while exists statement won't be good there
I would go for

declare @rows int = 100000
while @@rowcount = 100000
begin
delete top (100000) a
from Bigtable a
where b.date > '20091231' and b.date < '20100201'
set @rows = @@rowcount
end

Do you have an index on date?
Another option would be to insert into a partitioned table and swap out a partition(s).
Maybe partition by month if there is nothing else you can do - like flag rows on insert (but I guess you wouldn't be inserting them if you could do that).



==========================================
Cursors are useful if you don't know sql.
SSIS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

waterduck
Aged Yak Warrior

982 Posts

Posted - 2011-06-08 : 08:08:11
Sorry for causing misunderstanding, SSIS will delete away date that are history and insert the new date data....

the reason i doing left join is because i saw significant improvement of processing it....it run alot faster then i straight away delete it using where condition.

index on date? nope....that will cause even longer for my insert statement.
Go to Top of Page

nigelrivett
Master Smack Fu Yak Hacker

3385 Posts

Posted - 2011-06-08 : 08:25:57
>> that will cause even longer for my insert statement

But should speed up the delete. You need to decide what you want then maybe you will get some help.
I would find out the diference adding the left join makes to find out whether you can do without it. There are a lot of ways of running the query you have.
I take it you don't delete any of the new rows just added so the obvious thing is to insert into an empty table, delete the rows you need from a partitioned table then swap the new table in. Then you could probably index by date - maybe partition by it then your delete should be a lot faster.

==========================================
Cursors are useful if you don't know sql.
SSIS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2011-06-08 : 10:00:10
quote:
Originally posted by nigelrivett

The while exists statement won't be good there
I would go for

declare @rows int = 100000
while @@rowcount = 100000
begin
delete top (100000) a
from Bigtable a
where b.date > '20091231' and b.date < '20100201'
set @rows = @@rowcount
end
Unfortunately that doesn't work. The initial run of the WHILE loop never occurs, as the DECLARE statement resets @@ROWCOUNT to 1. The same thing happens with the SET statement after the DELETE.

Anyway, I've tested both methods and WHILE EXISTS works as expected. I do agree that partitioning/switching is the best way to remove the unwanted data, if it's available to you.
Go to Top of Page

nigelrivett
Master Smack Fu Yak Hacker

3385 Posts

Posted - 2011-06-08 : 10:24:59
sorry I meant

declare @rows int = 100000
while @rows = 100000
begin
delete top (100000) a
from Bigtable a
where b.date > '20091231' and b.date < '20100201'
set @rows = @@rowcount
end

the problem with the while exists is that it will scan the table unnecessarily for every loop and as you don't have an index then it's resource intensive.
You could use @rowcount > 0 i the above and leave out the set statement but that would give one extra table scan (usually) - a lot better than the scan per loop though.

Odd that you started off looking for something more efficient.

==========================================
Cursors are useful if you don't know sql.
SSIS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

waterduck
Aged Yak Warrior

982 Posts

Posted - 2011-06-09 : 22:09:08
guess the only way to improve delete speed will be create index ya...nth much can edit the query anymore...
Go to Top of Page

jeffw8713
Aged Yak Warrior

819 Posts

Posted - 2011-06-09 : 22:39:26
Your LEFT JOIN is equivelant to an INNER JOIN because you have used b.date in the where clause. You might get better performance changing to an INNER JOIN - but, it really depends on the execution plan.

I would ask what the difference is between record_date and date in the zdate table. Are they not the same value or data type?
Go to Top of Page
   

- Advertisement -