| Author |
Topic |
|
waterduck
Aged Yak Warrior
982 Posts |
Posted - 2011-06-08 : 03:31:28
|
| Hi,I need to perform huge delete from a table that consist of more than 30m rows. My query as following:declare @rows int = 1while(@rows > 0)begindelete top (100000) afrom Bigtable a left join zdate b on a.record_date = b.record_datewhere b.date > '20091231' and b.date < '20100201'set @rows = @@rowcountendEveryday, SSIS will insert both 1m+ rows of new data and delete 1m+ rows of old data into this table. for insert, i just perform normal insert statement(due to easy to maintain)...so nothing can do much to inserting....so now i just target delete statement to be more fast.thx alot ^^ |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2011-06-08 : 07:46:59
|
Why are you using a left join? This is equivalent:while exists(select * from Bigtable where [date] > '20091231' and [date] < '20100201')begindelete top (100000) from Bigtable where [date] > '20091231' and [date] < '20100201'endquote: Everyday, SSIS will insert both 1m+ rows of new data and delete 1m+ rows of old data into this table. for insert, i just perform normal insert statement(due to easy to maintain)...so nothing can do much to inserting....so now i just target delete statement to be more fast.
I don't understand this part. Your SSIS package is inserting data and then deleting it afterward? Is it inserting data you don't want and that's why you're deleting it? |
 |
|
|
nigelrivett
Master Smack Fu Yak Hacker
3385 Posts |
Posted - 2011-06-08 : 07:55:45
|
| The while exists statement won't be good thereI would go fordeclare @rows int = 100000while @@rowcount = 100000begindelete top (100000) afrom Bigtable a where b.date > '20091231' and b.date < '20100201'set @rows = @@rowcountendDo you have an index on date?Another option would be to insert into a partitioned table and swap out a partition(s).Maybe partition by month if there is nothing else you can do - like flag rows on insert (but I guess you wouldn't be inserting them if you could do that).==========================================Cursors are useful if you don't know sql.SSIS can be used in a similar way.Beer is not cold and it isn't fizzy. |
 |
|
|
waterduck
Aged Yak Warrior
982 Posts |
Posted - 2011-06-08 : 08:08:11
|
| Sorry for causing misunderstanding, SSIS will delete away date that are history and insert the new date data....the reason i doing left join is because i saw significant improvement of processing it....it run alot faster then i straight away delete it using where condition. index on date? nope....that will cause even longer for my insert statement. |
 |
|
|
nigelrivett
Master Smack Fu Yak Hacker
3385 Posts |
Posted - 2011-06-08 : 08:25:57
|
| >> that will cause even longer for my insert statementBut should speed up the delete. You need to decide what you want then maybe you will get some help.I would find out the diference adding the left join makes to find out whether you can do without it. There are a lot of ways of running the query you have.I take it you don't delete any of the new rows just added so the obvious thing is to insert into an empty table, delete the rows you need from a partitioned table then swap the new table in. Then you could probably index by date - maybe partition by it then your delete should be a lot faster.==========================================Cursors are useful if you don't know sql.SSIS can be used in a similar way.Beer is not cold and it isn't fizzy. |
 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2011-06-08 : 10:00:10
|
quote: Originally posted by nigelrivett The while exists statement won't be good thereI would go fordeclare @rows int = 100000while @@rowcount = 100000begindelete top (100000) afrom Bigtable a where b.date > '20091231' and b.date < '20100201'set @rows = @@rowcountend
Unfortunately that doesn't work. The initial run of the WHILE loop never occurs, as the DECLARE statement resets @@ROWCOUNT to 1. The same thing happens with the SET statement after the DELETE.Anyway, I've tested both methods and WHILE EXISTS works as expected. I do agree that partitioning/switching is the best way to remove the unwanted data, if it's available to you. |
 |
|
|
nigelrivett
Master Smack Fu Yak Hacker
3385 Posts |
Posted - 2011-06-08 : 10:24:59
|
| sorry I meantdeclare @rows int = 100000while @rows = 100000begindelete top (100000) afrom Bigtable a where b.date > '20091231' and b.date < '20100201'set @rows = @@rowcountendthe problem with the while exists is that it will scan the table unnecessarily for every loop and as you don't have an index then it's resource intensive.You could use @rowcount > 0 i the above and leave out the set statement but that would give one extra table scan (usually) - a lot better than the scan per loop though.Odd that you started off looking for something more efficient.==========================================Cursors are useful if you don't know sql.SSIS can be used in a similar way.Beer is not cold and it isn't fizzy. |
 |
|
|
waterduck
Aged Yak Warrior
982 Posts |
Posted - 2011-06-09 : 22:09:08
|
| guess the only way to improve delete speed will be create index ya...nth much can edit the query anymore... |
 |
|
|
jeffw8713
Aged Yak Warrior
819 Posts |
Posted - 2011-06-09 : 22:39:26
|
| Your LEFT JOIN is equivelant to an INNER JOIN because you have used b.date in the where clause. You might get better performance changing to an INNER JOIN - but, it really depends on the execution plan.I would ask what the difference is between record_date and date in the zdate table. Are they not the same value or data type? |
 |
|
|
|