Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
Dev Null
Starting Member
39 Posts |
Posted - 2008-12-04 : 17:40:08
|
Hi folks,I've got a problem where I want to process say, 10,000 rows at a time, and then delete the ones I've processed.Basically, I've got a table full of data, and that data gets copied to another table with a trigger on it for the processing (done that way for historical reasons I won't go into.) The processing is all per-row (i.e. I can copy the rows across one-at-a-time and get the same result as copying them all at once - the data in different rows does not combine.) If I _do_ process them one at a time it takes much longer than doing it all at once - presumably the overhead for the various inserts, transactions, etc. - but if I process them all at once then on larger datasets it can take a very long time to run, I get timeouts, no user feedback, etc. So I'd like to copy the first 10,000 rows across in a block, commit the changes, delete them from the original table, and then move on to the next block.I've investigated using TOP (and ordering on the clustered primary key, to be certain that I delete the same rows that I've processed) but the only way I could work it out I have to do a self-join to find the matching rows for the delete (simplified example):insert into table2 select top 1 * from table1 order by pk_1, pk_2, pk_3delete from table1 from ( select top 1 * from table1 order by pk_1, pk_2, pk_3 ) as del where table1.pk_1 = del.pk_1 and table1.pk_2 = del.pk_2 and table1.pk_3 = del.pk_3 Basically, I've got a table full of data, and that data gets copied to another table with a trigger on it for the processing (done that way for historical reasons I won't go into.) The processing is all per-row (i.e. I can copy the rows across one-at-a-time and get the same result as copying them all at once - the data in different rows does not combine.) If I _do_ process them one at a time it takes much longer than doing it all at once - presumably the overhead for the various inserts, transactions, etc. - but if I process them all at once then on larger datasets it can take a very long time to run, I get timeouts, no user feedback, etc. So I'd like to copy the first 10,000 rows across in a block, commit the changes, delete them from the original table, and then move on to the next block.I've investigated using TOP (and ordering on the clustered primary key, to be certain that I delete the same rows that I've processed) but the only way I could work it out I have to do a self-join to find the matching rows for the delete (simplified example):insert into table2 select top 1 * from table1 order by pk_1, pk_2, pk_3delete from table1 from ( select top 1 * from table1 order by pk_1, pk_2, pk_3 ) as del where table1.pk_1 = del.pk_1 and table1.pk_2 = del.pk_2 and table1.pk_3 = del.pk_3 Which seems to work, but is terribly inefficient (takes almost a minute to delete a single row from the table this way, even though just selecting the top row is pretty much instantaneous.) So I wondered if you guys would know of a better way to do the delete (or a whole different better approach to the basic problem.)Any advice appreciated, - rob. |
|
|
Dev Null
Starting Member
39 Posts |
Posted - 2008-12-04 : 17:44:28
|
| Ack. Pardon the stutter. I was editing in another window as I tried a few things out, and apparently I fumbled the cut-paste... |
 |
|
|
darkdusky
Aged Yak Warrior
591 Posts |
Posted - 2008-12-05 : 10:17:45
|
| Doing the processing row by row is not the way you should be doing it. You need to think in "set based" terms. Can you elaborate on what the processing is and we will see if this can be done to the set rather than looping through each row.http://www.simple-talk.com/sql/t-sql-programming/rbar--row-by-agonizing-row/ |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
|
|
Transact Charlie
Master Smack Fu Yak Hacker
3451 Posts |
Posted - 2008-12-05 : 11:47:30
|
Hi Peso,Keep seeing these little things you post and think "**** me, why didn't I know about this?"Anyway,If you do aDELETE aOUTPUT deleted.bINTO #fooFROM #bar a does the INTO part do an INSERT as a set or as a loop? (I'm right about SELECT INTO doing things 1 at a time aren't I?)I await enlightenment.Charlie===============================================================Msg 3903, Level 16, State 1, Line 1736The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2008-12-05 : 11:57:15
|
It's a set. E 12°55'05.63"N 56°04'39.26" |
 |
|
|
Dev Null
Starting Member
39 Posts |
Posted - 2008-12-05 : 17:06:18
|
| Peso: Thanks! Thats exactly what I needed, and I'd never heard of it before. Unfortunately I also have to be compatable with SQL Server 2000... which means I posted this query in the wrong forum; sorry about that. But I learned something new and useful, so I'm not complaining.Darkdusky: Sorry, I wasn't being very clear. I don't do the processing row-by-row, its just that the processing comes up with the same answer for a given set of rows, regardless of how I break those rows up. Where I'm grouping the data by, say, time, the processing has already been set up to correctly recalculate the values if I later get a new row that refers to a time already calculated. Is there a term for that? Grouping independent? So storing a sum or a maximum over time is safe, because you can always add to the sum or check the max of a new addition, but storing the average is not because you can't recalculate the new average from the old and a new value. Basically, I was just trying to reassure you folks that changing from doing my processing all at once to doing it in sets of 10,000 or so wouldn't change my answers and invalidate my processing.Thanks for the offer of help though! |
 |
|
|
|
|
|
|
|