Serious Performance Issue: With or Without Cursor

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2005 Forums

Transact-SQL (2005)

Serious Performance Issue: With or Without Cursor

Author

Topic

hacktothefuture
Starting Member

10 Posts

Posted - 2007-11-07 : 15:53:30

Hello All

I was hoping I could get some advice from everyone concerning this performance issue Im having. I have the task of importing hundreds of thousands of records into our MSSQL2005 DB. They are done in batches of about 20,000. They are flat files and the data is pretty nicely formated and consistent. Here is the process:

- I import the flat files using the Enterprise Manager GUI into a holder table.

- I then have the following process running (Im using pseudocode here to make it simpler to understand) from within an SP:

- Fill a cursor with the newly imported records from the holder table

- Go through the cursor and process the records one by one. This process involves inserting the data from the holder table in 4 other tables, two of them having one to one relationship with the holder table and 2 having a one to many relationship with about 15 records inserted in each of those tables per holder table record.

- If the record cannot be inserted because a duplicate exists I keep track of those in a table variable and display those records along with some staistics at the end of the procedure.

The process works flawlessly, but it takes a long time. It seems to take longer each time I run it. The current process is at 3 hours and 30 minutes for 25,000 records.

My initial attempt was the same basic concept except instead of using a cursor, I used a while loop on the holder table and its identity field to know which record I was operating on.

I think it might have actually been faster now that Im looking at the length of the current process but still in the neighborhood of about 100-150 records inserted per minute.

Some additional info:
- Im using a read only, forward only cursor
- Im calling an SP that saves the data inside the SP that contains the cursor or the while loop

So my question, how can I speed this up or is this normal? Is there anything in my code that can be rearranged to provide for quicker processing?

Thanks in advance for your help!!!

evilDBA
Posting Yak Master

155 Posts

Posted - 2007-11-07 : 16:00:28

Looks like you have to 'open' a stored procedure. SO, if your SP in inserting data into several tables, you need to insert all 20'000 rows from your holder table. Could you provide a text of your SP?

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2007-11-07 : 16:01:02

Switching from a cursor to a WHILE loop will not improve things. The performance issue in regards to cursors has to do with processing things one row at a time. Same goes for WHILE loops.

If you can provide sample data for your staging table, show us via more sample data what needs to happen to move the data out of the staging tables to the real tables, DDL for all tables involved (minus any columns that don't illustrate your problem), then we can see if a better solution is available for you.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-11-07 : 16:02:26

Why not rewrite the whole shebang as a set-based solution?

E 12°55'05.25"
N 56°04'39.16"

hacktothefuture
Starting Member

10 Posts

Posted - 2007-11-07 : 16:16:26

Thanks very much for your help!

Holder Table
------------
[FirstName]
[LastName]
[Company]
[Address]
[City]
[State]
[Zip]
[Telephone]
[Fax]
[Email]
[Region]

The main SP called:

SET NOCOUNT ON

-- declare cursor and fill
DECLARE SubsToImport CURSOR LOCAL FORWARD_ONLY FOR
SELECT FirstName, LastName, Company, Address, City, State, Zip,
Telephone, Fax, Email, Region FROM HolderTable WHERE Region = @NewRegion
FOR READ ONLY

-- create table to hold subscribers not inserted or updated for followup
DECLARE @DuplicateSubs table(FirstName nvarchar(50), LastName nvarchar(50), Email nvarchar(150), DTCreated smalldatetime)

-- declare vars for logging/counting and set
DECLARE @n Int
DECLARE @InsertCnt Int
DECLARE @DuplicateCnt Int
SET @n = 1
SET @InsertCnt = 0
SET @DuplicateCnt = 0

-- declare vars for holding current variables FROM DB
DECLARE @FirstName nvarchar(50)
DECLARE @LastName nvarchar(50)
DECLARE @Company nvarchar(100)
DECLARE @Address nvarchar(50)
DECLARE @City nvarchar(50)
DECLARE @State nvarchar(2)
DECLARE @Zip nvarchar(10)
DECLARE @Telephone nvarchar(20)
DECLARE @Fax nvarchar(20)
DECLARE @Email nvarchar(150)
DECLARE @Region int
DECLARE @RC int
-- declare and set vars that are static
DECLARE @MarketingLead int
DECLARE @MemberType int
DECLARE @EmailOptIn bit
SET @MarketingLead = 0
SET @MemberType = 190
SET @EmailOptIn = 1
-- open the cursor
OPEN SubsToImport
-- move to first row
FETCH NEXT FROM SubsToImport INTO
@FirstName, @LastName, @Company, @Address, @City, @State, @Zip,
@Telephone, @Fax, @Email, @Region
-- check for cursor status and see if we have no rows
IF (@@FETCH_STATUS = -1)
BEGIN
PRINT 'NO UPDATEABLE ROWS FOUND'
CLOSE SubsToImport
DEALLOCATE SubsToImport
RETURN
END
-- only preform while we have rows in the cursor
WHILE (@@FETCH_STATUS = 0)
BEGIN
-- execute sp
EXECUTE usp_SaveSubscriberBulk
@Email
,@Region
,@MarketingLead
,@FirstName
,@LastName
,@Address
,@City
,@State
,@Zip
,@Company
,@Telephone
,@Fax
,@MemberType
,@EmailOptIn
,@RC OUTPUT

-- check to see if we have a success insert, if not record user
If @RC >= 0
BEGIN
--success
Set @InsertCnt = @InsertCnt + 1
END
Else
BEGIN
-- we have a duplicate! Record the user and increment the duplicate count
Set @DuplicateCnt = @DuplicateCnt + 1
INSERT INTO @DuplicateSubs VALUES
(@FirstName, @LastName, @Email, GETDATE())
END

--increment the counting var
Set @n = @n + 1

-- get next record in the cursor
FETCH NEXT FROM SubsToImport INTO
@FirstName, @LastName, @Company, @Address, @City, @State, @Zip,
@Telephone, @Fax, @Email, @Region
END

-- clean up
CLOSE SubsToImport
DEALLOCATE SubsToImport
-- display results to user
PRINT 'Attempted To Insert:'
PRINT @n - 1
PRINT 'Inserted:'
PRINT @InsertCnt
PRINT 'Duplicates:'
PRINT @DuplicateCnt
-- display duplcate records
select * from @DuplicateSubs

SP that preforms actual save (summarized for brevity, it checks for duplicate first and if found returns -1)

CREATE PROCEDURE [dbo].[usp_SaveSubscriberBulk]
-- Add the parameters for the stored procedure here
@Email nvarchar(150),
@Region Int,
@MarketingLead Int,
@FirstName nvarchar(50),
@LastName nvarchar(50),
@Address nvarchar(50),
@City nvarchar(50),
@State nvarchar(2),
@Zip nvarchar(10),
@Company nvarchar(100) = '',
@Telephone nvarchar(20) = '',
@Fax nvarchar(20) = '',
@MemberType Int,
@EmailOptIn bit = 1,
@InsertedId Int OUTPUT

AS
BEGIN

DECLARE @NewId Int

SET NOCOUNT ON;
-- Insert into the subscriber table
INSERT INTO Subscribers
(
[Email] ,[Password] ,[MemberType] ,[FirstName] ,[LastName]
,[Region] ,[DTCreated] ,[LastUpdate] ,[GenKey]
,[EmailOptIn], Verified
)
VALUES
(
@Email, '', @MemberType, @FirstName,
@LastName, @Region, GETDATE(), GETDATE(),
@GenKey, @EmailOptIn, 0
)
Set @NewId = @@IDENTITY
-- Now insert in the extras table
INSERT INTO SubscribersExtra
(
[SubscriberId] ,[MarketingLead] ,[DTCreated] ,[Address]
,[City] ,[State],[Zip],Company,Telephone
,Fax,[LastUpdate]
)
VALUES
(
@NewId, @MarketingLead, GETDATE(), @Address,
@City, @State, @Zip, @Company, @Telephone,
@fax, GETDATE()
)

-- insert a bunch of other records for this user in 2 other tables
EXEC usp_SaveNewSubscriberMarketing @NewId

SET @InsertedId = @NewId
END
END

Thanks very much once again!!!

Kristen
Test

22859 Posts

Posted - 2007-11-08 : 00:35:37

I think you basically want to do:


-- Work out which are duplicates.  Do NOT insert them
INSERT INTO DuplicateTable(MyPK)
SELECT	MyPK
FROM	HolderTable AS H
WHERE	Region = @NewRegion
	AND EXISTS (SELECT * FROM Subscribers AS S WHERE S.SomeColumn = H.SomeColumn)

-- Insert Subscribers, excluding duplicates
INSERT INTO Subscribers
(
	[Email] ,[Password] ,[MemberType] ,[FirstName] ,[LastName]
	,[Region] ,[DTCreated] ,[LastUpdate] ,[GenKey]
	,[EmailOptIn], Verified
)
SELECT	Email, '', [MemberType] = 190, FirstName, LastName, 
	Region, GETDATE(), GETDATE(), @GenKey, 
	[EmailOptIn] = 1, [Verified] = 0
FROM	HolderTable AS H
WHERE	Region = @NewRegion
	AND NO EXISTS (SELECT * FROM DuplicateTable AS D WHERE D.MyPK = H.MyPK)

-- Update holding table with the ID allocated by previous insert (requires extra column [NewID] in HolderTable
UPDATE	H
SET	[NewID] = ID
FROM	HolderTable AS H
	JOIN Subscribers AS S
		ON S.MyPK = H.MyPK

Repeat for other tables that need to be inserted into

"MyPK" represents the field(s) in HolderTable that uniquely identify the data.

Kristen

hacktothefuture
Starting Member

10 Posts

Posted - 2007-11-08 : 09:40:41

Thanks very much Kristen! Do you think I'll get significant performance increase by re-writing the procedure this way?

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-11-08 : 09:56:28

Oh yes!

E 12°55'05.25"
N 56°04'39.16"

Kristen
Test

22859 Posts

Posted - 2007-11-08 : 13:39:21

For a job of the order of 20,000 rows I would guess 100-fold improvement in performance, maybe more. Actually possibly a 1,000 fold improvement - i.e. Hours-become-seconds

Kristen

hacktothefuture
Starting Member

10 Posts

Posted - 2007-11-09 : 15:16:24

That worked great!!! Thanks so much for your help!

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-11-09 : 15:21:07

Or you can try out the new OUTPUT operator for SQL Server 2005, to do both the UPDATE and INSERT in one step.
See blog here http://weblogs.sqlteam.com/peterl/archive/2007/10/03/New-OUTPUT-operator.aspx

E 12°55'05.25"
N 56°04'39.16"

jackv
Master Smack Fu Yak Hacker

2179 Posts

Posted - 2007-11-10 : 06:09:23

Depending on the architecture of the system , you could add the TABLOCK to to your INSERT statements

Jack Vamvas
--------------------
Search IT jobs from multiple sources- http://www.ITjobfeed.com

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources