Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 Can't make this update work - advice??

Author  Topic 

Will H
Yak Posting Veteran

56 Posts

Posted - 2008-02-26 : 21:24:58
Hi - I'm in a situation with a very large table, and trying to run an update that, any way I've approached it so far, seems to be taking unnacceptably long to run. Table has about 20 million rows, looks something like this:

ID - int, identity
Type - varchar(50)
PurchaseNumber - varchar(50)
SalesAmount - Money

ID Type PurchaseNumber SalesAmount
1 A 3834AA38384 20.32$
2 B 3834AA38384 11837.32$
3 C 3834AA38384 666.32$
4 C 887DF88U01H 23423.32$
5 A 887DF88U01H 12.32$
6 B OI83999FH28 4747.1$
7 D 38438495985 9384.6
8 E 02939DDJJWI 22.22$
9 F 07939SDFDF2 33.33$

The goal of the update is to make the [Type] uniform across [PurchaseNumbers], according to the max sales amount. For each PurchaseNumber a, set the type = the type of the row that has the MAX salesAmount. If there is only one entry for PurchaseNumber, leave the type alone. Expected update after completion would look like this:

ID Type PurchaseNumber SalesAmount
1 B 3834AA38384 20.32$
2 B 3834AA38384 11837.32$
3 B 3834AA38384 666.32$
4 C 887DF88U01H 23423.32$
5 C 887DF88U01H 12.32$
6 B OI83999FH28 4747.1$
7 D 38438495985 9384.6
8 E 02939DDJJWI 22.22$
9 F 07939SDFDF2 33.33$

I got this out of a warehouse, and it definitely isn't normalized well. Was considering breaking down into a better model, but I'm not yet sure if that would make the update easier.

I've been approaching this with sub-queries (finding all the PurchaseNumbers with more then one entry, then the max sales purchase of that purchase Number, then the type of that purchase number and sales amount to update all of that purchase number) but this not only ends up a little messy, but also very slow.

The only other detail that may be important is that out of the 20 million total rows, about 19.5 million purchaseNumbers are unique. So, really, there are only about 500k rows I actually have to update.

I've thought of a few ways to make this work, but none of them seem fast and wanted to see if anyone had a pointer. Thanks!


dataguru1971
Master Smack Fu Yak Hacker

1464 Posts

Posted - 2008-02-26 : 22:40:31
While only 500K rows need to be updated, 20 million need to be looked at in order to change the ID as per your specs..

Make an index on ID and PurchaseNumber on the 20 million row table.

Create a distinct list of PurchaseNumbers and the "appropriate" Type but filter only for those that need to be updated (having count>1) so you have a small list.
create an index on that table on PurchaseNumber
Run your update doing an inner join on those purchase numbers and change the ID to the value from your smaller table.
something like this (without the create index statements..but just the general idea)

It should take that long...(I have had to updates on well over 50 million rows that do something similar...it works pretty fast)


Create Table UpdateSource (PurchaseNumber char(10) not null,[NewType] char(1) Not null, [OLDType] char(1) not null)

Insert into TypeSource
Select PurchaseNumber
FROM [TableName]
Group by PurchaseNumber
Having count(*) > 1

Update TypeSource
SET [NewType] = {your requirements here},
[OldType] = a.[Type]
From TypeSource b inner join (Select PurchaseNumber,[Type]
FROM [TableName]
Where [Type] = {your requirements here?}) a
on b.PurchaseNumber = a.PurchaseNumber

Update [TableName]
Set [Type] = b.Type
FROM TypeSource b inner join [TableName] a
on b.PurchaseNumber = a.PurchaseNumber






Poor planning on your part does not constitute an emergency on my part.

Go to Top of Page

ayamas
Aged Yak Warrior

552 Posts

Posted - 2008-02-27 : 02:07:30
Hi.Try this one.

Create table #Test(ID int identity,
[Type]varchar(50),
PurchaseNumber varchar(50),
SalesAmount Money)
Insert into #Test
select 'A' ,'3834AA38384', 20.32 union all
select'B' ,'3834AA38384', 11837.32 union all
select 'C', '3834AA38384', 666.32 union all
select'C', '887DF88U01H', 23423.32 union all
select'A' ,'887DF88U01H' ,12.32 union all
select'B', 'OI83999FH28', 4747.1 union all
select'D', '38438495985', 9384.6 union all
select'E','02939DDJJWI', 22.22 union all
select 'F' ,'07939SDFDF2', 33.33

;With CTE(ID,[Type],PurchaseNumber,SalesAmount,rowid)as
(

select * from
(
select ID,[Type],PurchaseNumber,SalesAmount ,row_number() over(partition by PurchaseNumber
order by PurchaseNumber,SalesAmount desc)as rowid from #Test
) t

)

update CTE set CTE.[Type]=C1.[Type] from CTE C1 inner join CTE
on CTE.PurchaseNumber=C1.PurchaseNumber and C1.rowid=1

select * from #Test

drop table #Test
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-02-27 : 02:25:46
As long as Purchasenumber is not reused after a single streak, try this
DECLARE @Sample TABLE (ID INT, Type VARCHAR(50), PurchaseNumber VARCHAR(50), SalesAmount MONEY)

INSERT @Sample
SELECT 1, 'A', '3834AA38384', 20.32 UNION ALL
SELECT 2, 'B', '3834AA38384', 11837.32 UNION ALL
SELECT 3, 'C', '3834AA38384', 666.32 UNION ALL
SELECT 4, 'C', '887DF88U01H', 23423.32 UNION ALL
SELECT 5, 'A', '887DF88U01H', 12.32 UNION ALL
SELECT 6, 'B', 'OI83999FH28', 4747.1 UNION ALL
SELECT 7, 'D', '38438495985', 9384.6 UNION ALL
SELECT 8, 'E', '02939DDJJWI', 22.22 UNION ALL
SELECT 9, 'F', '07939SDFDF2', 33.33

SELECT * FROM @Sample

UPDATE e
SET e.Type = y.Type
FROM @Sample AS e
INNER JOIN (
SELECT s.PurchaseNumber,
s.Type
FROM @Sample AS s
INNER JOIN (
SELECT PurchaseNumber,
MAX(SalesAmount) AS theAmount
FROM @Sample
GROUP BY PurchaseNumber
HAVING COUNT(*) > 1
) AS x ON x.PurchaseNumber = s.PurchaseNumber
WHERE s.SalesAmount = x.theAmount
) AS y ON y.PurchaseNumber = e.PurchaseNumber

SELECT * FROM @Sample



E 12°55'05.25"
N 56°04'39.16"
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-02-27 : 02:26:49
Also see this article
http://www.sqlteam.com/article/detecting-runs-or-streaks-in-your-data



E 12°55'05.25"
N 56°04'39.16"
Go to Top of Page

Will H
Yak Posting Veteran

56 Posts

Posted - 2008-02-27 : 20:13:19
Thanks all - This works like a charm now. DataGuru, I think the addition of the indexes were the key to making this work. The Update process takes about 20 seconds, which is "fast enough" for me on a process that should only run nightly.
Go to Top of Page
   

- Advertisement -