Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
hacktothefuture
Starting Member
10 Posts |
Posted - 2007-11-07 : 15:53:30
|
| Hello AllI was hoping I could get some advice from everyone concerning this performance issue Im having. I have the task of importing hundreds of thousands of records into our MSSQL2005 DB. They are done in batches of about 20,000. They are flat files and the data is pretty nicely formated and consistent. Here is the process:- I import the flat files using the Enterprise Manager GUI into a holder table.- I then have the following process running (Im using pseudocode here to make it simpler to understand) from within an SP:- Fill a cursor with the newly imported records from the holder table- Go through the cursor and process the records one by one. This process involves inserting the data from the holder table in 4 other tables, two of them having one to one relationship with the holder table and 2 having a one to many relationship with about 15 records inserted in each of those tables per holder table record.- If the record cannot be inserted because a duplicate exists I keep track of those in a table variable and display those records along with some staistics at the end of the procedure.The process works flawlessly, but it takes a long time. It seems to take longer each time I run it. The current process is at 3 hours and 30 minutes for 25,000 records.My initial attempt was the same basic concept except instead of using a cursor, I used a while loop on the holder table and its identity field to know which record I was operating on. I think it might have actually been faster now that Im looking at the length of the current process but still in the neighborhood of about 100-150 records inserted per minute.Some additional info:- Im using a read only, forward only cursor- Im calling an SP that saves the data inside the SP that contains the cursor or the while loopSo my question, how can I speed this up or is this normal? Is there anything in my code that can be rearranged to provide for quicker processing?Thanks in advance for your help!!! |
|
|
evilDBA
Posting Yak Master
155 Posts |
Posted - 2007-11-07 : 16:00:28
|
| Looks like you have to 'open' a stored procedure. SO, if your SP in inserting data into several tables, you need to insert all 20'000 rows from your holder table. Could you provide a text of your SP? |
 |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2007-11-07 : 16:01:02
|
| Switching from a cursor to a WHILE loop will not improve things. The performance issue in regards to cursors has to do with processing things one row at a time. Same goes for WHILE loops. If you can provide sample data for your staging table, show us via more sample data what needs to happen to move the data out of the staging tables to the real tables, DDL for all tables involved (minus any columns that don't illustrate your problem), then we can see if a better solution is available for you.Tara KizerMicrosoft MVP for Windows Server System - SQL Serverhttp://weblogs.sqlteam.com/tarad/ |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-11-07 : 16:02:26
|
Why not rewrite the whole shebang as a set-based solution? E 12°55'05.25"N 56°04'39.16" |
 |
|
|
hacktothefuture
Starting Member
10 Posts |
Posted - 2007-11-07 : 16:16:26
|
| Thanks very much for your help! Holder Table------------[FirstName][LastName][Company][Address][City][State][Zip][Telephone][Fax][Email][Region]The main SP called:SET NOCOUNT ON -- declare cursor and fill DECLARE SubsToImport CURSOR LOCAL FORWARD_ONLY FOR SELECT FirstName, LastName, Company, Address, City, State, Zip, Telephone, Fax, Email, Region FROM HolderTable WHERE Region = @NewRegion FOR READ ONLY -- create table to hold subscribers not inserted or updated for followup DECLARE @DuplicateSubs table(FirstName nvarchar(50), LastName nvarchar(50), Email nvarchar(150), DTCreated smalldatetime) -- declare vars for logging/counting and set DECLARE @n Int DECLARE @InsertCnt Int DECLARE @DuplicateCnt Int SET @n = 1 SET @InsertCnt = 0 SET @DuplicateCnt = 0 -- declare vars for holding current variables FROM DB DECLARE @FirstName nvarchar(50) DECLARE @LastName nvarchar(50) DECLARE @Company nvarchar(100) DECLARE @Address nvarchar(50) DECLARE @City nvarchar(50) DECLARE @State nvarchar(2) DECLARE @Zip nvarchar(10) DECLARE @Telephone nvarchar(20) DECLARE @Fax nvarchar(20) DECLARE @Email nvarchar(150) DECLARE @Region int DECLARE @RC int -- declare and set vars that are static DECLARE @MarketingLead int DECLARE @MemberType int DECLARE @EmailOptIn bit SET @MarketingLead = 0 SET @MemberType = 190 SET @EmailOptIn = 1 -- open the cursor OPEN SubsToImport -- move to first row FETCH NEXT FROM SubsToImport INTO @FirstName, @LastName, @Company, @Address, @City, @State, @Zip, @Telephone, @Fax, @Email, @Region -- check for cursor status and see if we have no rows IF (@@FETCH_STATUS = -1) BEGIN PRINT 'NO UPDATEABLE ROWS FOUND' CLOSE SubsToImport DEALLOCATE SubsToImport RETURN END -- only preform while we have rows in the cursor WHILE (@@FETCH_STATUS = 0) BEGIN -- execute sp EXECUTE usp_SaveSubscriberBulk @Email ,@Region ,@MarketingLead ,@FirstName ,@LastName ,@Address ,@City ,@State ,@Zip ,@Company ,@Telephone ,@Fax ,@MemberType ,@EmailOptIn ,@RC OUTPUT -- check to see if we have a success insert, if not record user If @RC >= 0 BEGIN --success Set @InsertCnt = @InsertCnt + 1 END Else BEGIN -- we have a duplicate! Record the user and increment the duplicate count Set @DuplicateCnt = @DuplicateCnt + 1 INSERT INTO @DuplicateSubs VALUES (@FirstName, @LastName, @Email, GETDATE()) END --increment the counting var Set @n = @n + 1 -- get next record in the cursor FETCH NEXT FROM SubsToImport INTO @FirstName, @LastName, @Company, @Address, @City, @State, @Zip, @Telephone, @Fax, @Email, @Region END -- clean up CLOSE SubsToImport DEALLOCATE SubsToImport -- display results to user PRINT 'Attempted To Insert:' PRINT @n - 1 PRINT 'Inserted:' PRINT @InsertCnt PRINT 'Duplicates:' PRINT @DuplicateCnt -- display duplcate records select * from @DuplicateSubsSP that preforms actual save (summarized for brevity, it checks for duplicate first and if found returns -1)CREATE PROCEDURE [dbo].[usp_SaveSubscriberBulk] -- Add the parameters for the stored procedure here @Email nvarchar(150), @Region Int, @MarketingLead Int, @FirstName nvarchar(50), @LastName nvarchar(50), @Address nvarchar(50), @City nvarchar(50), @State nvarchar(2), @Zip nvarchar(10), @Company nvarchar(100) = '', @Telephone nvarchar(20) = '', @Fax nvarchar(20) = '', @MemberType Int, @EmailOptIn bit = 1, @InsertedId Int OUTPUT ASBEGIN DECLARE @NewId Int SET NOCOUNT ON; -- Insert into the subscriber table INSERT INTO Subscribers ( [Email] ,[Password] ,[MemberType] ,[FirstName] ,[LastName] ,[Region] ,[DTCreated] ,[LastUpdate] ,[GenKey] ,[EmailOptIn], Verified ) VALUES ( @Email, '', @MemberType, @FirstName, @LastName, @Region, GETDATE(), GETDATE(), @GenKey, @EmailOptIn, 0 ) Set @NewId = @@IDENTITY -- Now insert in the extras table INSERT INTO SubscribersExtra ( [SubscriberId] ,[MarketingLead] ,[DTCreated] ,[Address] ,[City] ,[State],[Zip],Company,Telephone ,Fax,[LastUpdate] ) VALUES ( @NewId, @MarketingLead, GETDATE(), @Address, @City, @State, @Zip, @Company, @Telephone, @fax, GETDATE() ) -- insert a bunch of other records for this user in 2 other tables EXEC usp_SaveNewSubscriberMarketing @NewId SET @InsertedId = @NewId ENDENDThanks very much once again!!! |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2007-11-08 : 00:35:37
|
I think you basically want to do:-- Work out which are duplicates. Do NOT insert themINSERT INTO DuplicateTable(MyPK)SELECT MyPKFROM HolderTable AS HWHERE Region = @NewRegion AND EXISTS (SELECT * FROM Subscribers AS S WHERE S.SomeColumn = H.SomeColumn)-- Insert Subscribers, excluding duplicatesINSERT INTO Subscribers( [Email] ,[Password] ,[MemberType] ,[FirstName] ,[LastName] ,[Region] ,[DTCreated] ,[LastUpdate] ,[GenKey] ,[EmailOptIn], Verified)SELECT Email, '', [MemberType] = 190, FirstName, LastName, Region, GETDATE(), GETDATE(), @GenKey, [EmailOptIn] = 1, [Verified] = 0FROM HolderTable AS HWHERE Region = @NewRegion AND NO EXISTS (SELECT * FROM DuplicateTable AS D WHERE D.MyPK = H.MyPK)-- Update holding table with the ID allocated by previous insert (requires extra column [NewID] in HolderTableUPDATE HSET [NewID] = IDFROM HolderTable AS H JOIN Subscribers AS S ON S.MyPK = H.MyPK Repeat for other tables that need to be inserted into"MyPK" represents the field(s) in HolderTable that uniquely identify the data.Kristen |
 |
|
|
hacktothefuture
Starting Member
10 Posts |
Posted - 2007-11-08 : 09:40:41
|
| Thanks very much Kristen! Do you think I'll get significant performance increase by re-writing the procedure this way? |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-11-08 : 09:56:28
|
Oh yes! E 12°55'05.25"N 56°04'39.16" |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2007-11-08 : 13:39:21
|
| For a job of the order of 20,000 rows I would guess 100-fold improvement in performance, maybe more. Actually possibly a 1,000 fold improvement - i.e. Hours-become-secondsKristen |
 |
|
|
hacktothefuture
Starting Member
10 Posts |
Posted - 2007-11-09 : 15:16:24
|
| That worked great!!! Thanks so much for your help! |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
|
|
jackv
Master Smack Fu Yak Hacker
2179 Posts |
Posted - 2007-11-10 : 06:09:23
|
| Depending on the architecture of the system , you could add the TABLOCK to to your INSERT statementsJack Vamvas--------------------Search IT jobs from multiple sources- http://www.ITjobfeed.com |
 |
|
|
|
|
|
|
|