SQL Server Forums
Profile | Register | Active Topics | Members | Search | Forum FAQ
 
Register Now and get your question answered!
Username:
Password:
Save Password
Forgot your Password?

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 Batch Insert
 New Topic  Reply to Topic
 Printer Friendly
Author Previous Topic Topic Next Topic  

sccrsurfer
Starting Member

USA
43 Posts

Posted - 04/16/2013 :  17:55:40  Show Profile  Reply with Quote
I have a query that is supposed to do batch inserts of TOP 5000 until @RowCount = 0. However, this loop just keeps going and going and going and...

Im trying to take data from a staging table and move only certain fields to a destination table, and each record needs to be inserted just one time. Fields may have duplicat values, an entire row must be unique (imagine if an entire row was concatenated).

Basically I want to go down the list of a source table, and batch insert certain fields into a destination table until @rowcount = 0.

The query is this:

DECLARE @RowCount int;

SET @RowCount = 1;

WHILE @RowCount > 0

BEGIN

INSERT INTO [DB].[dbo].[DB_DestName] (Field1, Field2, Field3)
SELECT TOP (5000) Field1, Field2, Field3
FROM [DB].[dbo].[DB_SourceName]

SET @RowCount = @@RowCount;

END;

The query only stops executing because it hits the database size limit! The source table has 7 million records and the destination table has 15.2 million records. This can't be the case unless the query is wrong and doing an INSERT for repeated data.

Any insight would help greatly. Thanks!

James K
Flowing Fount of Yak Knowledge

3744 Posts

Posted - 04/16/2013 :  18:11:34  Show Profile  Reply with Quote
The query simply selects the top 5000 rows from the source table every time it goes through the loop. So you have to do something to make sure that the rows that have been visited do not get picked up again.

An efficient way to do this might be to use SSIS or Import/Export wizard, rather than trying to construct your own loop. You can control the batch size when you do that, and it is very efficient at moving data from one table to another. To use Import/Export wizard, right-click on the database name in SSMS object explorer and select Tasks -> Import Data and follow through the wizard's instructions.
Go to Top of Page

Lamprey
Flowing Fount of Yak Knowledge

4614 Posts

Posted - 04/16/2013 :  18:16:30  Show Profile  Reply with Quote
What you have there will just keep inserting the TOP 5000 random rows from DB_SourceName. You need some way to limit the rows. There are several ways you could do that. One might be to use a ID, assuming the table has one or to use the ROW_NUMBER() funciton to generate a number. Antoher way, is to use an OUTER JOIN or NOT EXITS on the destination so you won't insert duplicate rows.. etc..

Here are some links on how to do that kinda of thing:
http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/7d7c91a6-0484-42a5-b12f-fcbc6aed0c71/
http://stackoverflow.com/questions/4729697/break-up-a-sql-server-2008-query-into-batches
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

India
52325 Posts

Posted - 04/17/2013 :  00:37:04  Show Profile  Reply with Quote
What you could do is to add an identity column to staging table and then loop based on that. i still didnt get reason for doing 5000 inserts per batch. Is it because of large data volume? At the end of each population you may truncate the table and prepare it for next data load so that identity will again get reset.

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/
https://www.facebook.com/VmBlogs
Go to Top of Page

sccrsurfer
Starting Member

USA
43 Posts

Posted - 04/17/2013 :  11:52:49  Show Profile  Reply with Quote
I tried the SSIS route, and it was so easy to set up, but unfortunately it errored out after the first 1,071 rows and I can't figure out why. I even have a data conversion process between the source and destination (nvarchar to int), and the process stops there.

So no I'm running the process with the query above in batches of 5000, but using a
left join, ON source.XYZ = dest.XYZ where dest.XYZ IS NULL;

This query is running super slow. I'm on the 45 minute mark and it's not even half way through! Any suggestions? I have to do it in batches by the way because the DBA's got a little PO'd :D
Go to Top of Page

Lamprey
Flowing Fount of Yak Knowledge

4614 Posts

Posted - 04/17/2013 :  13:03:40  Show Profile  Reply with Quote
If the only conversion you are doing is a VARCHAR to INT, then I'd suggest you check your souce table for any VARCHAR values that don't/won't convert to INT.
Go to Top of Page

MIK_2008
Flowing Fount of Yak Knowledge

Pakistan
1054 Posts

Posted - 04/17/2013 :  13:11:03  Show Profile  Reply with Quote
quote:
Originally posted by sccrsurfer

INSERT INTO [DB].[dbo].[DB_DestName] (Field1, Field2, Field3)
SELECT TOP (5000) Field1, Field2, Field3
FROM [DB].[dbo].[DB_SourceName]




Any of Field1, Field2 and Field3 is unique? or a combination of any two? or all three together makes it unique for a single record? If none, then is there any unique key/field in the DB_SourceName table? if source table does .. then use its unique key for looping purposes


Cheers
MIK

Edited by - MIK_2008 on 04/17/2013 13:23:38
Go to Top of Page
  Previous Topic Topic Next Topic  
 New Topic  Reply to Topic
 Printer Friendly
Jump To:
SQL Server Forums © 2000-2009 SQLTeam Publishing, LLC Go To Top Of Page
This page was generated in 0.06 seconds. Powered By: Snitz Forums 2000