Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 SQL Server Administration (2005)
 Blowing up my transaction log

Author  Topic 

jholovacs
Posting Yak Master

163 Posts

Posted - 2009-06-02 : 11:55:02
I have a table that has around a billion rows in it and I have been tasked to "clean up" records older than a certain point (about half of the records qualify)

To keep my transaction logging relatively low and provide maximum uptime (this is a production machine) I planned to:

1) create a table with the same schema, compound clustered index, and primary key identity column
2) insert records I want to keep into this new table (with IDENTITY_INSERT ORDERed BY the columns of the clustered index to reduce IO thrashing)
3) rename the original table to table_old (keep it around for a little bit
4) rename the new table to the original table name and rename the constraints accordingly.

So I started off. Thing went swimmingly as I inserted rows into my new table (about 500M rows). When my new table was finished populating I waited for my query to return saying "X row(s) affected" so I could move on.

That never happened. My activity monitor saw my insert statement still going, chewing up processor and IO, although I clearly was no longer adding records to the new table. Then I checked my transaction log and saw it was continuing to grow at an alarming rate. Eventually I had to kill the PID and force a rollback so that I did not jeopardize my other databases by running out of transaction log space. It was a frustrating night.

My question is, what the heck could it have been doing? It was literally a simple INSERT...SELECT statement, ORDERED by the clustered index columns. Once the records were in the new table, why would the log continue to grow out of control like that, and what can I do to prevent that from happening in the future.

I'd also like to make a note of the fact that I have no control over the schema; I'm a DBA, not a designer, I'm working with what I have. (usually somebody responding to my posts always asks "why do you have XXX, that's not a good idea" and I'd just like to nip that in the bud.)



SELECT TOP 1
w.[name]
FROM
dbo.women w
INNER JOIN
dbo.inlaws i
ON
i.inlaw_id = w.parent_id
WHERE
i.net_worth > 10000000
AND
i.status IN ('dead', 'dying')
AND
w.husband_id IS NULL
ORDER BY
w.hotness_factor DESC

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2009-06-02 : 12:18:22
whats the current recovery model used for db?
Go to Top of Page

sodeep
Master Smack Fu Yak Hacker

7174 Posts

Posted - 2009-06-02 : 12:30:08
Insert in batches. Even if you put simple recovery model,transaction log will grow huge if you insert billions of rows in 1 transaction.
Go to Top of Page

jholovacs
Posting Yak Master

163 Posts

Posted - 2009-06-02 : 13:27:54
the recovery model is full.

I understand that I will likely have to insert in batches; I'm pretty much resigned to that at this point.

My problem right now is I cannot understand what the server is trying to do that is exploding the transaction log once the DML manipulation appears to be complete. There should be no physical reordering of the clustered index (which I don't think is logged anyway) and I'm pretty sure populating the PK identity column is a synchronous action with the insert. 300GB+ of transaction log for 60GB of simple insert seems quite excessive to me. I'm wondering if it might have to do with the primary key, the clustered index, or the ORDER BY clause in the SELECT...INSERT statement that is forcing the server to do something... extra... after the fact. If it is something like that, I may be able to remove that factor to insert in one transaction (which I would definitely prefer, as there are some "rules" that I would need to apply if I did this in separate chunks that promise to be difficult to set up and even more difficult to verify)

What about locking? These records are history and statistical data, so no updates are performed, just lots of inserts and selects. Do you think it would make a difference if I did my INSERT...SELECT statement with a (NOLOCK) hint? is it possible that the extra transaction log is trying to avoid dirty reads (when there should already be none?)



SELECT TOP 1
w.[name]
FROM
dbo.women w
INNER JOIN
dbo.inlaws i
ON
i.inlaw_id = w.parent_id
WHERE
i.net_worth > 10000000
AND
i.status IN ('dead', 'dying')
AND
w.husband_id IS NULL
ORDER BY
w.hotness_factor DESC
Go to Top of Page
   

- Advertisement -