| Author |
Topic |
|
workpeoplle
Starting Member
10 Posts |
Posted - 2009-03-26 : 12:27:28
|
| Hi All,I am trying to check if a record exists and update the same using the classic approach of if EXISTS(). The problem is, my source table - ACCOUNTS_METADATA contains several duplicate records and this needs to be filtered too and the records inserted into the final table - accounts_metadata_final.If the row exists in the final table, the record needs to be updated with the value of PAGENUMBER, else a new record needs to be inserted. As you can see from the code below and the commented lines:1. using cursors to iterate thru the records2. iterating using the min(id) approach.The moment i start to run this, the CP usage jumps to 100% and no one can run any more queries on the SQL SERVER. it just stalls. Any suggestions?/********************************************************************/ALTER PROCEDURE [dbo].[sp_create_accounts_metadata_final] -- Add the parameters for the stored procedure hereASBEGIN -- SET NOCOUNT ON added to prevent extra result sets from -- interfering with SELECT statements. SET NOCOUNT ON; DECLARE @id as int, @accountnumber as varchar, @newlocation as varchar -- using min() to iterate through the table one row at a time --SELECT @id = min(id) from ACCOUNTS_METADATA DECLARE accounts_cursor CURSOR READ_ONLY FAST_FORWARD FOR SELECT top 1000 ID from ACCOUNTS_METADATA OPEN accounts_cursor FETCH NEXT FROM accounts_cursor into @id --WHILE @id IS NOT NULL WHILE @@FETCH_STATUS = 0 BEGIN SELECT @accountnumber = ACCOUNTNUMBER, @newlocation = NEWLOCATION FROM ACCOUNTS_METADATA WHERE ID = @id/* UPDATE ACCOUNTS_METADATA_FINAL SET PAGENUMBER = '0'--(SELECT PAGENUMBER FROM ACCOUNTS_METADATA WHERE NEWLOCATION = @newlocation) WHERE NEWLOCATION = @newlocation IF @@ROWCOUNT = 0 INSERT INTO ACCOUNTS_METADATA_FINAL SELECT ACCOUNTNUMBER, CABINETNUMBER, CABINETNAME, TYPEOFDOC, PAGENUMBER, NEWLOCATION, IMAGETYPE, DOCID, VOLID, LOCATION FROM ACCOUNTS_METADATA WHERE NEWLOCATION = @newlocation AND ID =@id */ IF EXISTS(select * from ACCOUNTS_METADATA_FINAL where NEWLOCATION = @newlocation) BEGIN UPDATE ACCOUNTS_METADATA_FINAL SET PAGENUMBER = (SELECT PAGENUMBER FROM ACCOUNTS_METADATA WHERE NEWLOCATION = @newlocation) WHERE NEWLOCATION = @newlocation END ELSE BEGIN INSERT INTO ACCOUNTS_METADATA_FINAL SELECT ACCOUNTNUMBER, CABINETNUMBER, CABINETNAME, TYPEOFDOC, PAGENUMBER, NEWLOCATION, IMAGETYPE, DOCID, VOLID, LOCATION FROM ACCOUNTS_METADATA WHERE NEWLOCATION = @newlocation AND ID =@id END -- select @id FETCH NEXT FROM accounts_cursor into @id -- SELECT @id = min (id) from ACCOUNTS_METADATA where ID > @id END CLOSE accounts_cursor DEALLOCATE accounts_cursor END/*******************************************************************? |
|
|
sakets_2000
Master Smack Fu Yak Hacker
1472 Posts |
Posted - 2009-03-26 : 12:57:51
|
| have you tried a set based approach ?? |
 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2009-03-26 : 13:09:58
|
For example:UPDATE AMF SET PageNumber=AM.PageNumberFROM ACCOUNTS_METADATA_FINAL AMFINNER JOIN ACCOUNTS_METADATA AM ON AMF.NewLocation=AM.NewLocationINSERT INTO ACCOUNTS_METADATA_FINAL SELECT ACCOUNTNUMBER, CABINETNUMBER, CABINETNAME, TYPEOFDOC,PAGENUMBER, NEWLOCATION, IMAGETYPE, DOCID, VOLID, LOCATION FROM ACCOUNTS_METADATA AMWHERE NOT EXISTS(SELECT * FROM ACCOUNTS_METADATA_FINAL WHERE NEWLOCATION=AM.NEWLOCATION) This does what your current procedure does, the only difference is it will work on all rows, not just the top 1000.This may be a bug, but the ID from ACCOUNTS_METADATA is irrelevant to your INSERT and UPDATE statements, so the cursor is especially needless. |
 |
|
|
workpeoplle
Starting Member
10 Posts |
Posted - 2009-03-26 : 15:45:11
|
| Thanks for your response!I'm sorry... obviously, I have done a very poor job of explaining the requirements. Here is what I have to do. AMF is an empty table. AM has around 5.5+ mil records (one of the smaller tables. The larger ones are a little over 25 mil). Now, i have to insert every unique record as identified by NEWLOCATION into AMF and do a bit more, which i will get to shortly.AM has the same structure as AMF. Here's some sample dataAM ColA ColB ColDA NULL \\a.tifNULL B \\a.tifC NULL \\b.tifNULL D \\b.tifAMF should look like:ColA ColB ColDA B \\a.tifC D \\b.tifWon't i need to iterate thought the AMF for every ColD value and insert distinct values into ColD? The code that I wrote in my first post isn't necessary, ofcourse.And you're right. The ID column is unnecessary. Thanks a bunch for your help! |
 |
|
|
workpeoplle
Starting Member
10 Posts |
Posted - 2009-03-26 : 15:47:09
|
| I mean, Won't i need to iterate thought AM for every ColD value and insert distinct records into ColD into AMF? The code that I wrote in my first post isn't necessary now, ofcourse |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2009-03-26 : 16:00:21
|
SELECT MAX(ColA), MAX(ColB), ColDFROM AMGROUP BY ColD E 12°55'05.63"N 56°04'39.26" |
 |
|
|
workpeoplle
Starting Member
10 Posts |
Posted - 2009-03-26 : 21:42:09
|
| Thanks for your response!Won't a group by cause too much of overhead for 25 million records? |
 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2009-03-26 : 21:50:20
|
| Much less than a cursor, that's for sure. If the columns are indexed then it should go very quickly.Don't want to bust your bubble, but 25 million rows is not a lot these days. Lots of people add that many rows to their databases every day. |
 |
|
|
workpeoplle
Starting Member
10 Posts |
Posted - 2009-03-26 : 22:03:43
|
| I agree that it's much less than a cursor. I'm relatively new at this and don't know much about performance of queries hence so many questions. Please bear with me!Now, my AM & AMF tables have 29 columns having data like ColA and ColB which I need to essentially roll up into a single record. Can you please tell me a step by step approach to accomplishing this using a group by?I am running a simple groupp by query alone - INSERT INTO AMF (SELECT MAX(ColA), MAX(ColB)..., ColC From AM GROUP BY ColC) - on 5.5 million records and its been running for 1.5 hours now. Is this normal? How should I go about creating indexes and on which columns? Can you please give me an example? |
 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2009-03-26 : 22:08:31
|
| For starters:CREATE INDEX ix_ColA on AMF(ColA)CREATE INDEX ix_ColB on AMF(ColB)CREATE INDEX ix_ColC on AMF(ColC)Also, does this table have a clustered index? You can find out by running sp_helpindex 'AMF'. And if you can, can you post the full DDL (CREATE TABLE) definition for these tables? |
 |
|
|
workpeoplle
Starting Member
10 Posts |
Posted - 2009-03-26 : 22:19:05
|
| Thanks much for your replies! I am on a deadline too so this really helps.Nope, no clustered indexes. The tables were created with a straightforward create table query:The columns INVNUMBER to TID are in the fashion ColA and ColB are and need to be rolled up into one record. CABINETNUMBER onwards the data is the same in every row for which one of the other columns are not null.USE [FDIC_Extract]GO/****** Object: Table [dbo].[FINANCIAL_METADATA] Script Date: 03/26/2009 21:13:37 ******/SET ANSI_NULLS ONGOSET QUOTED_IDENTIFIER ONGOSET ANSI_PADDING ONGOCREATE TABLE [dbo].[AMF]( [INVNUMBER] [varchar](50) NULL, [IN_TOTAL] [varchar](50) NULL, [DATE_OR_GROUP] [varchar](50) NULL, [D_NUMBER] [varchar](50) NULL, [T_NUMBER] [varchar](50) NULL, [SBOXNUM] [varchar](50) NULL, [VENDORNUMBER_GL] [varchar](50) NULL, [VENDOR_NUMBER] [varchar](50) NULL, [YEAR] [varchar](10) NULL, [MONTH] [varchar](10) NULL, [QUARTER] [varchar](50) NULL, [DATE] [datetime] NULL, [DATE_INDEXED] [datetime] NULL, [DATE_CLOSED] [datetime] NULL, [DATE_PAID] [datetime] NULL, [AMT] [varchar](50) NULL, [F_NUMBER] [varchar](50) NULL, [PNUM] [varchar](50) NULL, [B_LOCATION_NUMBER] [varchar](50) NULL, [TID] [varchar](50) NULL, [CABINETNUMBER] [int] NULL, [CABINETNAME] [varchar](80) NULL, [TYPEOFDOC] [varchar](113) NULL, [PAGENUMBER] [varchar](50) NULL, [LOCATIONOFDOCUMENT] [varchar](500) NULL, [IMAGETYPE] [varchar](255) NULL, [DOCID] [int] NULL, [VOLID] [int] NULL, [ORIGINALLOCATION] [int] NULL) ON [PRIMARY]GOSET ANSI_PADDING OFF |
 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2009-03-26 : 22:39:26
|
| If you're going to have to pull a max() for most of those columns, don't bother with the indexes then, it will still have to scan the table. Are you grouping by CABINETNUMBER only, or all columns from CABINETNUMBER and after? |
 |
|
|
workpeoplle
Starting Member
10 Posts |
Posted - 2009-03-26 : 23:03:37
|
| All the columns after Cabinetnumber too. I'll have to, won't I, since it's not in an aggregate function? Or should I do a max on the rest of the columns as well and group by only CABINETNUMBER? |
 |
|
|
workpeoplle
Starting Member
10 Posts |
Posted - 2009-03-26 : 23:05:53
|
| What's more expensive? A group by many columns or a max() function? |
 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2009-03-26 : 23:10:46
|
| If all the other columns are the same for each unique Cabinetnumber value, then yes, you can max() them too.Regarding the columns from INVNUMBER to TID, do these have lots of distinct values, and you need the actual MAX for each? Or are they mostly null, and you just need any value that's in there?edit: regarding expensive, either operation will probably work out about the same for this set of data, if each column has statistics on them. It won't matter if you're pulling a max on all the other columns though. |
 |
|
|
workpeoplle
Starting Member
10 Posts |
Posted - 2009-03-26 : 23:19:18
|
| Yes all the other columns are the same for each unique CABINETNUMBER value.The columns from INVNUMBER to TID don't have a lot of distinct values. Say there are 29 rows from INVNUMBER to TID right? So, every row has one of these columns as NOT NULL, while the rest of the columns are NULL for each unique value of CABINETNUMBER. So every uniques CABINETVALUE will have at the most 29 rows. So, yes, I need any values that's in there. |
 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2009-03-26 : 23:30:57
|
| I had a few ideas that might speed up the process, but it doesn't sound like they'd be more efficient than just doing a max() on everything and grouping by Cabinetnumber. |
 |
|
|
workpeoplle
Starting Member
10 Posts |
Posted - 2009-03-26 : 23:35:04
|
| oh ok, thanks! |
 |
|
|
|