Logs filling up - limit transaction size? - SQL Server Forums

Home | Weblogs | Forums

Site Sponsored By: SQLDSC - SQL Server Desired State Configuration

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2008 Forums

Transact-SQL (2008)

Logs filling up - limit transaction size?

Author

Topic

duanecwilson
Constraint Violating Yak Guru

273 Posts

Posted - 2010-11-18 : 10:43:40

I have some simple inserts that are an attempt to copy data from one table to the other, but the tables are huge, sometimes over 100 million rows with 10 or so average columns. They run as high as 10GB to 20GB. Ther is enough room in the db to do this, but not in the logs. I have been told I need to find a way to use a counter to limit the number of rows per transaction or setting a batch size. How would I do this? These are standard SQL Server tables on the same server in the same database. Here is a sample query I have been using:

INSERT INTO dbo.Add_Remove_Programs_DATA(
       [MachineID]
      ,[InstanceKey]
      ,[RevisionID]
      ,[AgentID]
      ,[TimeKey]
      ,[ProdID00]
      ,[DisplayName00]
      ,[InstallDate00]
      ,[Publisher00]
      ,[Version00])
SELECT [MachineID]
      ,[InstanceKey]
      ,[RevisionID]
      ,[AgentID]
      ,[TimeKey]
      ,[ProdID00]
      ,[DisplayName00] 
      ,[InstallDate00]
      ,[Publisher00]
      ,[Version00]
FROM [SAM].[staging].[Add_Remove_Programs_DATA]

Also, we are set to Simple recovery. I have researched minimal logging, but I fear I have missed something. I am sure this operation is done all the time by big companies. I have been told to sift through the data to create filters and make 10 or 20 queries out of one. Do I really have to resort to that? I have also done some Select Into with minimal logging, but I don't want to have to drop and select into a new table and indexes every time. Thank you for any input.

Duane

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2010-11-18 : 12:56:43

You just need to do this in batches. I do 1000 or 10000 row batches.

Here is a quick example:


WHILE @@ROWCOUNT <> 0
BEGIN
  INSERT INTO...
  SELECT TOP 1000 *
  FROM YourTable
  WHERE (You'll need to keep track of where you are at via an identity or something else)
END

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2010-11-18 : 12:57:53

And for the batch size, I use a variable as my code is wrapped into stored procedures and that variable is an input parameter. By using an input variable, I can easily and quickly change the batch size without having to check my change into source control. And my stuff that calls these stored procedures are done through SQL Agent jobs. We use this batch approach for our nightly purges.

Doing it in batches will slow your process down, which is actually a good thing in most cases. By slowing it down, you are reducing the impact on the server, thus allowing other queries to run while your process is running. And by doing it in batches, you are able to control the transaction log size.

Also, there is no way to not log this. SIMPLE recovery model just means that it gets clear from the log after the transaction completes. But while the transaction is running, it is stored in the log. This is necessary for data consistency, in the case of rollback.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2010-11-18 : 13:06:05

Here's an example of what I've got deployed to production for one of our purge processes:



CREATE proc [dbo].[Table1_Purge] (@RetainDays int = 200, @Batch int = 1000)
AS
SET NOCOUNT ON

DECLARE 
	@rowcnt int,
	@retaindate datetime,
	@maxRetries int,
	@retries tinyint

SELECT @rowcnt = 1, @retries = 0, @maxRetries = 10
SELECT @retaindate = DATEADD( dd, @RetainDays * -1, GETUTCDATE() )

WHILE @rowcnt > 0 AND @retries <= @maxRetries
BEGIN
	BEGIN TRY
		BEGIN TRAN

		DELETE	TOP(@Batch)
		FROM	Table1
		WHERE 	DateColumn < @retaindate

		SET @rowcnt = @@ROWCOUNT

		COMMIT TRAN
	END TRY
	BEGIN CATCH
		IF @@TRANCOUNT > 0
			ROLLBACK TRAN
			
		IF ERROR_NUMBER() = 1205 AND @retries < @maxRetries -- 1205 is deadlock error
		BEGIN
			SET @retries = @retries + 1
			WAITFOR DELAY '00:00:10'
		END
		ELSE -- some other error or done retrying
		BEGIN
			RETURN
		END
	END CATCH
END

RETURN

Note: I edited it down to remove all proprietary information, but hopefully it still compiles and shows you a "controlled delete". The same concept can be used for a "controlled insert", but you'll need to add a little bit more logic to keep track of where you are at.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

duanecwilson
Constraint Violating Yak Guru

273 Posts

Posted - 2010-11-18 : 18:04:23

Thank you - I didn't realize try and catch was part of t-sql. I'll be making use of this. I still am not sure of how to keep track of where I am in a 200 million row table, but this is a good start. It gives me some things to think about.

Duane

robvolk
Most Valuable Yak

15732 Posts

Posted - 2010-11-18 : 18:15:48

quote:
I am sure this operation is done all the time by big companies

For the row counts you're describing, usually they'll use table partitioning. It's an Enterprise Edition feature, and therefore a bit more expensive. If you're not already on Enterprise and can't upgrade, you may be able to get reasonable performance from partitioned views, without the hassle of inserting rows and log growth. There are other hassles with either approach, but may be worth investigating.

X002548
Not Just a Number

15586 Posts

Posted - 2010-11-18 : 18:35:38

I would bcp the data out in native format (hopefully some keeps a copy of the DDL) and then bcp the data in

This would be the least intrusive method

Brett

8-)

Hint: Want your questions answered fast? Follow the direction in this link
http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx

Want to help yourself?

http://msdn.microsoft.com/en-us/library/ms130214.aspx

http://weblogs.sqlteam.com/brettk/

http://brettkaiser.blogspot.com/

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2010-11-18 : 18:36:09

Duane, does your table have an identity column? If it doesn't then, we can use NOT EXISTS via the PK. An identity is easiest though.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

Kristen
Test

22859 Posts

Posted - 2010-11-19 : 06:07:07

To add to what Tara said:

We put a

WAITFOR DELAY '00:00:02'

in the loop to give other processes time to run. (Tara has this in the DEADLOCK Catch, which is a nice idea

) For loops like this that have dire impact on other users we also dynamically adjust the Loop Size Variable according to how long the loop took - more than N seconds and we halve the loop size (if not below some minimum size, you don't want it auto-adjusting down to one row per loop!!), less than M seconds then increase loop size by 10%

I agree with Brett that BCP [native format] out and back in is easiest, has the smallest impact on logging, and may actually be the fastest route. If you generate the BCP file in Clustered Index order [of the receiving machine] (which is probably your PK) then you can provide a hint on import that the file is pre-sorted, which will speed things up.

Alternative is to use SSIS - which comes equipped to do such transfers in batches

Not sure what I would do for 100 million row transfer. We usually put the IDs / PK of the Source Rows into a temp table and then "page" that for the batches we transfer because repeated:

SELECT ...
FROM MyTable
WHERE reasonably complex criteria

is relatively slow to perform, as is adding:

AND NOT EXISTS (SELECT * FROM TargetTable WHERE PK-matches)

which (I think) will get slower and slower as more and more batches are run, and more rows exist on Target.

jeffw8713
Aged Yak Warrior

819 Posts

Posted - 2010-11-19 : 13:57:31

Another option would be to use SSIS. You can define the commit batch size on the OLEDB destination which will batch the load for you.

Jeff

duanecwilson
Constraint Violating Yak Guru

273 Posts

Posted - 2010-11-19 : 17:37:02

Thank you all for these answers. I am going to. I have been distracted for a while, but should be able to get back to this soon.

Duane

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources