Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 help with if exists() - very bad performance

Author  Topic 

workpeoplle
Starting Member

10 Posts

Posted - 2009-03-26 : 12:27:28
Hi All,

I am trying to check if a record exists and update the same using the classic approach of if EXISTS(). The problem is, my source table - ACCOUNTS_METADATA contains several duplicate records and this needs to be filtered too and the records inserted into the final table - accounts_metadata_final.

If the row exists in the final table, the record needs to be updated with the value of PAGENUMBER, else a new record needs to be inserted.

As you can see from the code below and the commented lines:
1. using cursors to iterate thru the records
2. iterating using the min(id) approach.

The moment i start to run this, the CP usage jumps to 100% and no one can run any more queries on the SQL SERVER. it just stalls. Any suggestions?


/********************************************************************/

ALTER PROCEDURE [dbo].[sp_create_accounts_metadata_final]
-- Add the parameters for the stored procedure here
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;

DECLARE @id as int,
@accountnumber as varchar,
@newlocation as varchar

-- using min() to iterate through the table one row at a time
--SELECT @id = min(id) from ACCOUNTS_METADATA
DECLARE accounts_cursor CURSOR READ_ONLY FAST_FORWARD
FOR SELECT top 1000 ID from ACCOUNTS_METADATA
OPEN accounts_cursor

FETCH NEXT FROM accounts_cursor into @id

--WHILE @id IS NOT NULL
WHILE @@FETCH_STATUS = 0
BEGIN

SELECT @accountnumber = ACCOUNTNUMBER,
@newlocation = NEWLOCATION
FROM ACCOUNTS_METADATA
WHERE ID = @id

/*
UPDATE ACCOUNTS_METADATA_FINAL
SET PAGENUMBER = '0'--(SELECT PAGENUMBER FROM ACCOUNTS_METADATA WHERE NEWLOCATION = @newlocation)
WHERE NEWLOCATION = @newlocation
IF @@ROWCOUNT = 0
INSERT INTO ACCOUNTS_METADATA_FINAL
SELECT ACCOUNTNUMBER, CABINETNUMBER, CABINETNAME, TYPEOFDOC,
PAGENUMBER, NEWLOCATION, IMAGETYPE, DOCID, VOLID, LOCATION
FROM ACCOUNTS_METADATA WHERE NEWLOCATION = @newlocation
AND ID =@id
*/

IF EXISTS(select * from ACCOUNTS_METADATA_FINAL where NEWLOCATION = @newlocation)
BEGIN
UPDATE ACCOUNTS_METADATA_FINAL
SET PAGENUMBER = (SELECT PAGENUMBER FROM ACCOUNTS_METADATA WHERE NEWLOCATION = @newlocation)
WHERE NEWLOCATION = @newlocation
END
ELSE
BEGIN
INSERT INTO ACCOUNTS_METADATA_FINAL
SELECT ACCOUNTNUMBER, CABINETNUMBER, CABINETNAME, TYPEOFDOC,
PAGENUMBER, NEWLOCATION, IMAGETYPE, DOCID, VOLID, LOCATION
FROM ACCOUNTS_METADATA WHERE NEWLOCATION = @newlocation
AND ID =@id

END




-- select @id
FETCH NEXT FROM accounts_cursor into @id

-- SELECT @id = min (id) from ACCOUNTS_METADATA where ID > @id

END

CLOSE accounts_cursor
DEALLOCATE accounts_cursor
END

/*******************************************************************?

sakets_2000
Master Smack Fu Yak Hacker

1472 Posts

Posted - 2009-03-26 : 12:57:51
have you tried a set based approach ??
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2009-03-26 : 13:09:58
For example:
UPDATE AMF SET PageNumber=AM.PageNumber
FROM ACCOUNTS_METADATA_FINAL AMF
INNER JOIN ACCOUNTS_METADATA AM ON AMF.NewLocation=AM.NewLocation

INSERT INTO ACCOUNTS_METADATA_FINAL
SELECT ACCOUNTNUMBER, CABINETNUMBER, CABINETNAME, TYPEOFDOC,
PAGENUMBER, NEWLOCATION, IMAGETYPE, DOCID, VOLID, LOCATION
FROM ACCOUNTS_METADATA AM
WHERE NOT EXISTS(SELECT * FROM ACCOUNTS_METADATA_FINAL WHERE NEWLOCATION=AM.NEWLOCATION)
This does what your current procedure does, the only difference is it will work on all rows, not just the top 1000.

This may be a bug, but the ID from ACCOUNTS_METADATA is irrelevant to your INSERT and UPDATE statements, so the cursor is especially needless.
Go to Top of Page

workpeoplle
Starting Member

10 Posts

Posted - 2009-03-26 : 15:45:11
Thanks for your response!

I'm sorry... obviously, I have done a very poor job of explaining the requirements. Here is what I have to do.

AMF is an empty table. AM has around 5.5+ mil records (one of the smaller tables. The larger ones are a little over 25 mil). Now, i have to insert every unique record as identified by NEWLOCATION into AMF and do a bit more, which i will get to shortly.

AM has the same structure as AMF. Here's some sample data

AM
ColA ColB ColD
A NULL \\a.tif
NULL B \\a.tif

C NULL \\b.tif
NULL D \\b.tif

AMF should look like:
ColA ColB ColD
A B \\a.tif
C D \\b.tif

Won't i need to iterate thought the AMF for every ColD value and insert distinct values into ColD? The code that I wrote in my first post isn't necessary, ofcourse.

And you're right. The ID column is unnecessary. Thanks a bunch for your help!
Go to Top of Page

workpeoplle
Starting Member

10 Posts

Posted - 2009-03-26 : 15:47:09
I mean,

Won't i need to iterate thought AM for every ColD value and insert distinct records into ColD into AMF? The code that I wrote in my first post isn't necessary now, ofcourse
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-03-26 : 16:00:21
SELECT MAX(ColA), MAX(ColB), ColD
FROM AM
GROUP BY ColD



E 12°55'05.63"
N 56°04'39.26"
Go to Top of Page

workpeoplle
Starting Member

10 Posts

Posted - 2009-03-26 : 21:42:09
Thanks for your response!

Won't a group by cause too much of overhead for 25 million records?
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2009-03-26 : 21:50:20
Much less than a cursor, that's for sure. If the columns are indexed then it should go very quickly.

Don't want to bust your bubble, but 25 million rows is not a lot these days. Lots of people add that many rows to their databases every day.
Go to Top of Page

workpeoplle
Starting Member

10 Posts

Posted - 2009-03-26 : 22:03:43
I agree that it's much less than a cursor. I'm relatively new at this and don't know much about performance of queries hence so many questions. Please bear with me!

Now, my AM & AMF tables have 29 columns having data like ColA and ColB which I need to essentially roll up into a single record. Can you please tell me a step by step approach to accomplishing this using a group by?

I am running a simple groupp by query alone - INSERT INTO AMF (SELECT MAX(ColA), MAX(ColB)..., ColC From AM GROUP BY ColC) - on 5.5 million records and its been running for 1.5 hours now. Is this normal? How should I go about creating indexes and on which columns? Can you please give me an example?
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2009-03-26 : 22:08:31
For starters:

CREATE INDEX ix_ColA on AMF(ColA)
CREATE INDEX ix_ColB on AMF(ColB)
CREATE INDEX ix_ColC on AMF(ColC)


Also, does this table have a clustered index? You can find out by running sp_helpindex 'AMF'. And if you can, can you post the full DDL (CREATE TABLE) definition for these tables?
Go to Top of Page

workpeoplle
Starting Member

10 Posts

Posted - 2009-03-26 : 22:19:05
Thanks much for your replies! I am on a deadline too so this really helps.

Nope, no clustered indexes. The tables were created with a straightforward create table query:
The columns INVNUMBER to TID are in the fashion ColA and ColB are and need to be rolled up into one record. CABINETNUMBER onwards the data is the same in every row for which one of the other columns are not null.

USE [FDIC_Extract]
GO
/****** Object: Table [dbo].[FINANCIAL_METADATA] Script Date: 03/26/2009 21:13:37 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[AMF](
[INVNUMBER] [varchar](50) NULL,
[IN_TOTAL] [varchar](50) NULL,
[DATE_OR_GROUP] [varchar](50) NULL,
[D_NUMBER] [varchar](50) NULL,
[T_NUMBER] [varchar](50) NULL,
[SBOXNUM] [varchar](50) NULL,
[VENDORNUMBER_GL] [varchar](50) NULL,
[VENDOR_NUMBER] [varchar](50) NULL,
[YEAR] [varchar](10) NULL,
[MONTH] [varchar](10) NULL,
[QUARTER] [varchar](50) NULL,
[DATE] [datetime] NULL,
[DATE_INDEXED] [datetime] NULL,
[DATE_CLOSED] [datetime] NULL,
[DATE_PAID] [datetime] NULL,
[AMT] [varchar](50) NULL,
[F_NUMBER] [varchar](50) NULL,
[PNUM] [varchar](50) NULL,
[B_LOCATION_NUMBER] [varchar](50) NULL,
[TID] [varchar](50) NULL,
[CABINETNUMBER] [int] NULL,
[CABINETNAME] [varchar](80) NULL,
[TYPEOFDOC] [varchar](113) NULL,
[PAGENUMBER] [varchar](50) NULL,
[LOCATIONOFDOCUMENT] [varchar](500) NULL,
[IMAGETYPE] [varchar](255) NULL,
[DOCID] [int] NULL,
[VOLID] [int] NULL,
[ORIGINALLOCATION] [int] NULL
) ON [PRIMARY]

GO
SET ANSI_PADDING OFF
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2009-03-26 : 22:39:26
If you're going to have to pull a max() for most of those columns, don't bother with the indexes then, it will still have to scan the table. Are you grouping by CABINETNUMBER only, or all columns from CABINETNUMBER and after?
Go to Top of Page

workpeoplle
Starting Member

10 Posts

Posted - 2009-03-26 : 23:03:37
All the columns after Cabinetnumber too. I'll have to, won't I, since it's not in an aggregate function? Or should I do a max on the rest of the columns as well and group by only CABINETNUMBER?
Go to Top of Page

workpeoplle
Starting Member

10 Posts

Posted - 2009-03-26 : 23:05:53
What's more expensive? A group by many columns or a max() function?
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2009-03-26 : 23:10:46
If all the other columns are the same for each unique Cabinetnumber value, then yes, you can max() them too.

Regarding the columns from INVNUMBER to TID, do these have lots of distinct values, and you need the actual MAX for each? Or are they mostly null, and you just need any value that's in there?

edit: regarding expensive, either operation will probably work out about the same for this set of data, if each column has statistics on them. It won't matter if you're pulling a max on all the other columns though.
Go to Top of Page

workpeoplle
Starting Member

10 Posts

Posted - 2009-03-26 : 23:19:18
Yes all the other columns are the same for each unique CABINETNUMBER value.

The columns from INVNUMBER to TID don't have a lot of distinct values. Say there are 29 rows from INVNUMBER to TID right? So, every row has one of these columns as NOT NULL, while the rest of the columns are NULL for each unique value of CABINETNUMBER. So every uniques CABINETVALUE will have at the most 29 rows. So, yes, I need any values that's in there.
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2009-03-26 : 23:30:57
I had a few ideas that might speed up the process, but it doesn't sound like they'd be more efficient than just doing a max() on everything and grouping by Cabinetnumber.
Go to Top of Page

workpeoplle
Starting Member

10 Posts

Posted - 2009-03-26 : 23:35:04
oh ok, thanks!
Go to Top of Page
   

- Advertisement -