Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
Will H
Yak Posting Veteran
56 Posts |
Posted - 2008-02-26 : 21:24:58
|
| Hi - I'm in a situation with a very large table, and trying to run an update that, any way I've approached it so far, seems to be taking unnacceptably long to run. Table has about 20 million rows, looks something like this:ID - int, identityType - varchar(50)PurchaseNumber - varchar(50)SalesAmount - MoneyID Type PurchaseNumber SalesAmount1 A 3834AA38384 20.32$2 B 3834AA38384 11837.32$3 C 3834AA38384 666.32$4 C 887DF88U01H 23423.32$5 A 887DF88U01H 12.32$6 B OI83999FH28 4747.1$7 D 38438495985 9384.68 E 02939DDJJWI 22.22$9 F 07939SDFDF2 33.33$The goal of the update is to make the [Type] uniform across [PurchaseNumbers], according to the max sales amount. For each PurchaseNumber a, set the type = the type of the row that has the MAX salesAmount. If there is only one entry for PurchaseNumber, leave the type alone. Expected update after completion would look like this:ID Type PurchaseNumber SalesAmount1 B 3834AA38384 20.32$2 B 3834AA38384 11837.32$3 B 3834AA38384 666.32$4 C 887DF88U01H 23423.32$5 C 887DF88U01H 12.32$6 B OI83999FH28 4747.1$7 D 38438495985 9384.68 E 02939DDJJWI 22.22$9 F 07939SDFDF2 33.33$I got this out of a warehouse, and it definitely isn't normalized well. Was considering breaking down into a better model, but I'm not yet sure if that would make the update easier. I've been approaching this with sub-queries (finding all the PurchaseNumbers with more then one entry, then the max sales purchase of that purchase Number, then the type of that purchase number and sales amount to update all of that purchase number) but this not only ends up a little messy, but also very slow. The only other detail that may be important is that out of the 20 million total rows, about 19.5 million purchaseNumbers are unique. So, really, there are only about 500k rows I actually have to update.I've thought of a few ways to make this work, but none of them seem fast and wanted to see if anyone had a pointer. Thanks! |
|
|
dataguru1971
Master Smack Fu Yak Hacker
1464 Posts |
Posted - 2008-02-26 : 22:40:31
|
While only 500K rows need to be updated, 20 million need to be looked at in order to change the ID as per your specs..Make an index on ID and PurchaseNumber on the 20 million row table.Create a distinct list of PurchaseNumbers and the "appropriate" Type but filter only for those that need to be updated (having count>1) so you have a small list. create an index on that table on PurchaseNumberRun your update doing an inner join on those purchase numbers and change the ID to the value from your smaller table.something like this (without the create index statements..but just the general idea)It should take that long...(I have had to updates on well over 50 million rows that do something similar...it works pretty fast)Create Table UpdateSource (PurchaseNumber char(10) not null,[NewType] char(1) Not null, [OLDType] char(1) not null)Insert into TypeSourceSelect PurchaseNumberFROM [TableName]Group by PurchaseNumberHaving count(*) > 1Update TypeSourceSET [NewType] = {your requirements here}, [OldType] = a.[Type]From TypeSource b inner join (Select PurchaseNumber,[Type] FROM [TableName] Where [Type] = {your requirements here?}) a on b.PurchaseNumber = a.PurchaseNumberUpdate [TableName]Set [Type] = b.TypeFROM TypeSource b inner join [TableName] a on b.PurchaseNumber = a.PurchaseNumber Poor planning on your part does not constitute an emergency on my part. |
 |
|
|
ayamas
Aged Yak Warrior
552 Posts |
Posted - 2008-02-27 : 02:07:30
|
| Hi.Try this one.Create table #Test(ID int identity,[Type]varchar(50),PurchaseNumber varchar(50),SalesAmount Money)Insert into #Testselect 'A' ,'3834AA38384', 20.32 union all select'B' ,'3834AA38384', 11837.32 union all select 'C', '3834AA38384', 666.32 union all select'C', '887DF88U01H', 23423.32 union all select'A' ,'887DF88U01H' ,12.32 union all select'B', 'OI83999FH28', 4747.1 union all select'D', '38438495985', 9384.6 union all select'E','02939DDJJWI', 22.22 union all select 'F' ,'07939SDFDF2', 33.33;With CTE(ID,[Type],PurchaseNumber,SalesAmount,rowid)as(select * from ( select ID,[Type],PurchaseNumber,SalesAmount ,row_number() over(partition by PurchaseNumber order by PurchaseNumber,SalesAmount desc)as rowid from #Test ) t)update CTE set CTE.[Type]=C1.[Type] from CTE C1 inner join CTE on CTE.PurchaseNumber=C1.PurchaseNumber and C1.rowid=1select * from #Testdrop table #Test |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2008-02-27 : 02:25:46
|
As long as Purchasenumber is not reused after a single streak, try thisDECLARE @Sample TABLE (ID INT, Type VARCHAR(50), PurchaseNumber VARCHAR(50), SalesAmount MONEY)INSERT @SampleSELECT 1, 'A', '3834AA38384', 20.32 UNION ALLSELECT 2, 'B', '3834AA38384', 11837.32 UNION ALLSELECT 3, 'C', '3834AA38384', 666.32 UNION ALLSELECT 4, 'C', '887DF88U01H', 23423.32 UNION ALLSELECT 5, 'A', '887DF88U01H', 12.32 UNION ALLSELECT 6, 'B', 'OI83999FH28', 4747.1 UNION ALLSELECT 7, 'D', '38438495985', 9384.6 UNION ALLSELECT 8, 'E', '02939DDJJWI', 22.22 UNION ALLSELECT 9, 'F', '07939SDFDF2', 33.33SELECT * FROM @SampleUPDATE eSET e.Type = y.TypeFROM @Sample AS eINNER JOIN ( SELECT s.PurchaseNumber, s.Type FROM @Sample AS s INNER JOIN ( SELECT PurchaseNumber, MAX(SalesAmount) AS theAmount FROM @Sample GROUP BY PurchaseNumber HAVING COUNT(*) > 1 ) AS x ON x.PurchaseNumber = s.PurchaseNumber WHERE s.SalesAmount = x.theAmount ) AS y ON y.PurchaseNumber = e.PurchaseNumberSELECT * FROM @Sample E 12°55'05.25"N 56°04'39.16" |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
|
|
Will H
Yak Posting Veteran
56 Posts |
Posted - 2008-02-27 : 20:13:19
|
| Thanks all - This works like a charm now. DataGuru, I think the addition of the indexes were the key to making this work. The Update process takes about 20 seconds, which is "fast enough" for me on a process that should only run nightly. |
 |
|
|
|
|
|
|
|