best way to update 222million names

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2005 Forums

Transact-SQL (2005)

best way to update 222million names

Author

Topic

albertkohl
Aged Yak Warrior

740 Posts

Posted - 2009-02-23 : 13:31:19

hello all, i was wondering if there is a better way to update approx 222 million rows in a table that's 313gb large.

currently im using:


update t1 set
	areacode = substring(phone,1,3),
	citystate = rtrim(t1.city)+' '+state,
	countystate = rtrim(t1.county_name)+' '+state,
	ltv = case when HMVAL>'0' 
          AND HMVAL<>'0000' then 
	  cast(mortgage_amount as float)/cast(HMVAL as float) end
from mytable t1

i know that the citystate and countystate columns are redundant, but we use them for a webapp i wrote awhile ago. using better code for that is on the list of things to do, but it's a very long list :(

is there a better way for me to update all these rows? i'm looking for speed, not really worried about locks or up-time. database model is simple.

Thanks!

(also, might be worth mentioning, a) this is an olap table we update only quarterly, and b) i have a uid column that is an incramented identity, so in batches was one idea.)

sodeep
Master Smack Fu Yak Hacker

7174 Posts

Posted - 2009-02-23 : 13:43:26

Yes do in batches so you don't fill up log.

albertkohl
Aged Yak Warrior

740 Posts

Posted - 2009-02-23 : 13:46:49

can you give me an idea of how to write a loop or something that'll do the job?

i imported the flat text file in 500000 record incraments, that appeared to work well while increasing the logs a minimal amount.

mfemenel
Professor Frink

1421 Posts

Posted - 2009-02-23 : 14:00:32

Something like this would perform a statement in million row chunks up to the max of your identity column in your table.

declare @max int
declare @counter int
select @max=MAX(uid) from tablename

set @counter=1000000

while @counter<@max
begin
update tablename
set field=xyz
where uid between @counter and @counter + 1000000
set @counter=@counter+1000000
select @counter
END

Mike
"oh, that monkey is going to pay"

albertkohl
Aged Yak Warrior

740 Posts

Posted - 2009-02-23 : 14:09:22

SWEET! thanks. now last question. i know it's a HUGE SQL No-no, but is there any way to run this update w/o locking the table and creating blocks?

until the update is done, and i finish indexing, i pull orders off the table manually, none of the fields i use will be affected by this update so i'm not worried about dirty reads.

Thanks!

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2009-02-23 : 14:29:15

Dirty reads have to do with SELECT statements.

Locking shouldn't be an issue if you do the update in small enough batches. I would recommend the batch size to be around 1000 rows.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

albertkohl
Aged Yak Warrior

740 Posts

Posted - 2009-02-23 : 14:35:55

that's what i was talking about, sorry i wasnt clear. but i would be running select statements against that table while it's updating. i wouldnt really be worried about the updating columns though, because like i said my web-app uses those, and i havnt cut over from our old to our new update yet.

so you think if i do 1000 rows @ a time i would cut-down on any locking either way?

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2009-02-23 : 15:06:27

It will still lock, but the locks will clear faster than if you did larger multiple batches or one large batch. By not keeping locks for very long, you will reduce blocking.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

mfemenel
Professor Frink

1421 Posts

Posted - 2009-02-23 : 15:12:16

In my earlier post I didn't mention timing. If you don't have a maintenace window to run this in then at least choose an off peak time, late night, early morning, weekend to run this. It will run faster, your users will notice it less and generally everyone will be happier (except for maybe you working off hours..but that's the job).

Mike
"oh, that monkey is going to pay"

albertkohl
Aged Yak Warrior

740 Posts

Posted - 2009-02-23 : 15:22:50

thanks guys!!!

so one more quick question. will larger batches cause the entire job to finish sooner, or will the difference between large/small batches really only matter on locking?

Thanks!

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2009-02-23 : 15:28:56

The single batch or multiple larger batches will finish faster than multiple smaller batches. But it will lock for much longer and therefore blocking may be an issue.

Doing things in smaller batches runs slower and that's the whole point, you want it to run slow so that you don't impact production.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

albertkohl
Aged Yak Warrior

740 Posts

Posted - 2009-02-23 : 15:32:45

got ya, okay, another one :D (i know, i'm full of them)

should i index the table by the UID and columns i'm updating? right now it's a raw table, no indexing or partitioning... so it seems like finding the UID blocks to update is taking longer

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2009-02-23 : 15:53:36

Yes UID should be indexed if that's what you are using for the batches.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

mfemenel
Professor Frink

1421 Posts

Posted - 2009-02-23 : 15:54:43

IMHO indexing the UID would make sense if it's part of your where or join clause. I would not create a clustered index though or you'll slow your self down quite a bit because you'll end up updating the table and the index. I don't see any benefit to updating your updated columns.

Mike
"oh, that monkey is going to pay"

albertkohl
Aged Yak Warrior

740 Posts

Posted - 2009-02-23 : 16:48:16

sounds good, thank you all again, i'm in the process of running it now.

Much Appreciated!

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2009-02-23 : 17:08:03

You're welcome, let us know how it goes.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

albertkohl
Aged Yak Warrior

740 Posts

Posted - 2009-02-23 : 17:25:50

well, i have it running on one server now (since b4 i started this post) and it's been runing for 5hrs now, should be done soon. once it is, i have the same exact table on a test DB on the same server, i'm gunna try it this way and see if i can shave some time off. hopefully i can.

switching from import/export wizard to bulk insert shaved my import time down from 8hrs to 2, so i'm sure this will be a huge impact as well. not to mention i'm not always gunna have a few tarabytes free on the server for log expansion during huge updates like this, so cutting down the logging will help alot in the future. not to mention this has a number of other applications for me too...

Thanks again, i'll give you guys some numbers when i'm done if your curious.

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources