Author |
Topic |
SHardy
Starting Member
35 Posts |
Posted - 2007-05-11 : 07:16:12
|
Hi all,As part of an "overnight" datawarehouse build, I have an update query which is taking quite some time to run, and I was hoping to get some suggestions as to how I could improve upon it.There are two tables in the query, both of which are also created during the build & neither of which have any indexes.Table 1 - "nav_glentry" - Is a large table reflective of the general ledger. It currently has approx 4.5 million records. "entry_no" is a unique field within this table. It has a populated "dim_cost_centre" field and an empty "dim_old_ini" which needs to be updated.Table 2 - "nav_cost_centre_lookup" - This is a small table (approx 1,000 records) that maps "dim_cost_centre" to "dim_old_ini". These are the only two fields in the table and are both (currently) unique. "dim_cost_centre" will always be unique.So I am updating "dim_old_ini" in Table 1 from Table 2. SQL is as follows:update ngset ng.dim_old_ini = nccl.dim_old_inifrom nav_glentry ng, nav_cost_centre_lookup ncclwhere ng.dim_cost_centre = nccl.dim_cost_centre This is currently taking nearly 3 hours to run.I have tried adding indexes to both tables, but the index tuning wizrd suggests that this would have quite the opposite effect. As such, I have not yet been willing to actually try test running it.I would welcome any suggestions as to how I may be able to cut down on the run time for this action. Would indexes be the way forward? Or will they only add unnecessary processing?Thanks in advance,Simon |
|
nr
SQLTeam MVY
12543 Posts |
Posted - 2007-05-11 : 07:31:43
|
If nav_cost_centre_lookup is that small then it will probably be in memory so indexes won't help.What dataatype is dim_old_ini?Make sure when you populate nav_glentry that the value isn't null, if it's fixed length then set the value to that length otherwise to the length it usually is. In that way you won't get pageb splitting when the update takes place and it can be a lot faster.How long to read through nav_glentry - try a max on a non-indexed column?How long does an update take for a single value - try using literals for the most common value in the table.You don't say how big nav_glentry is - could be faster to create another table rather than update.==========================================Cursors are useful if you don't know sql.DTS can be used in a similar way.Beer is not cold and it isn't fizzy. |
 |
|
Kristen
Test
22859 Posts |
Posted - 2007-05-11 : 07:55:00
|
3 hours to update 4.5 million rows? That's scary! Maybe they are all NULL and being extended, and thus each record has to be moved to a space large enough for it?If so could you DEFAULT ng.dim_old_ini to some benign value when you populate the table initially, so that the row doesn't grow when it is Updated (assuming, say, a fixed length datatype like INT)Could any of the rows already have the correct value? If so should be able to reduce the amount of work with:update ngset ng.dim_old_ini = nccl.dim_old_inifrom nav_glentry AS ng JOIN nav_cost_centre_lookup AS nccl ON ng.dim_cost_centre = nccl.dim_cost_centre[red]WHERE ng.dim_old_ini <> nccl.dim_old_ini OR (ng.dim_old_ini IS NULL AND nccl.dim_old_ini IS NOT NULL) OR (ng.dim_old_ini IS NOT NULL AND nccl.dim_old_ini IS NULL) you can slim that down a bit if either of the columns do not permit NULLsIf there is an index on dim_old_ini drop that before you start, and recreate it afterwards.Kristen |
 |
|
harsh_athalye
Master Smack Fu Yak Hacker
5581 Posts |
Posted - 2007-05-11 : 08:05:11
|
"Maybe they are all NULL and being extended, and thus each record has to be moved to a space large enough for it?"Can you explain that a bit more, Kristen?Harsh AthalyeIndia."The IMPOSSIBLE is often UNTRIED" |
 |
|
nr
SQLTeam MVY
12543 Posts |
Posted - 2007-05-11 : 08:06:17
|
quote: Originally posted by harsh_athalye "Maybe they are all NULL and being extended, and thus each record has to be moved to a space large enough for it?"Can you explain that a bit more, Kristen?Harsh AthalyeIndia."The IMPOSSIBLE is often UNTRIED"
Have a look at my post.==========================================Cursors are useful if you don't know sql.DTS can be used in a similar way.Beer is not cold and it isn't fizzy. |
 |
|
nr
SQLTeam MVY
12543 Posts |
Posted - 2007-05-11 : 08:11:48
|
Actually for that time it is probably doing a nested loop - i.e. executing each row in turn (have a look at the query plan).You could put an index on the large table (preferably clustered but that's probably not possible) and try updating for each value in turn from the smaller table in a loop - look at the query plan to see what it is doing.If you have a clustered index (hopefully on a single small column like an ID) then first of all run a query to get the ID's and new values into a temp table then do the update using that joining on the ID.==========================================Cursors are useful if you don't know sql.DTS can be used in a similar way.Beer is not cold and it isn't fizzy. |
 |
|
Kristen
Test
22859 Posts |
Posted - 2007-05-11 : 08:20:20
|
"Have a look at my post.[i/]"Doh! Sorry Nigel, I missed reading that first time around."[i]Can you explain that a bit more, Kristen?"If you have a record with a variable length column (such as VARCHAR) which is NULL / empty / contains short data, and you put a bigger piece of data in it, then the record won't fit back where it was. SQL Server may then have to move it somewhere else, and that means adjusting the index too (ALL of them if there is no clustered index!)Kristen |
 |
|
Kristen
Test
22859 Posts |
Posted - 2007-05-11 : 08:22:03
|
@SHardy:Make sure you have a Clustered Index on nav_glentryKristen |
 |
|
harsh_athalye
Master Smack Fu Yak Hacker
5581 Posts |
Posted - 2007-05-11 : 08:39:14
|
quote: Originally posted by Kristen "Have a look at my post.[i/]"Doh! Sorry Nigel, I missed reading that first time around."[i]Can you explain that a bit more, Kristen?"If you have a record with a variable length column (such as VARCHAR) which is NULL / empty / contains short data, and you put a bigger piece of data in it, then the record won't fit back where it was. SQL Server may then have to move it somewhere else, and that means adjusting the index too (ALL of them if there is no clustered index!)Kristen
Perfect ! Got it.Thanks Kristen.Harsh AthalyeIndia."The IMPOSSIBLE is often UNTRIED" |
 |
|
SHardy
Starting Member
35 Posts |
Posted - 2007-05-11 : 08:54:17
|
Thanks for the comments. I have taken them onboard and reviewed what is actually going on.Both the "dim_old_ini" & "dim_cost_centre" fields are varchar20 in both tables.When "nav_glentry" is initially created, both fields are created as "space(20)".The "dim_cost_centre" field in "nav_glentry" is then updated (along with some other fields).It is then that the above update happens.As prevously stated, there were no indexes on either table. I have manually added the following indexes for testing:nav_glentry: 1) entry_no - unique index, non clustered 2) dim_cost_centre - non-unique index, non-clusterednav_cost_centre_lookup 1) dim_cost_centre - unique index, non-clusteredI decided to ignore the index tuning wizard & to try running the above statement again. It only took 10 minutes. I did not first reset the field values within nav_glentry, but surely that wouldn't make any difference if they were initially space(20) anyway?I might have to add in the index creation & see how it goes in the actual run. Will update here on Monday with results.I will still check on here if there are any other suggestions/ comments.Many thanks,Simon |
 |
|
Kristen
Test
22859 Posts |
Posted - 2007-05-11 : 09:13:16
|
"It only took 10 minutes"That sounds better!"I did not first reset the field values within nav_glentry"Well, change them all now to 'X' and see if that takes 10 minutes too?I reckon that if you have ANY indexes on nav_glentry you should make one of them Clustered (preferably whatever constitutes the PK or something else "Unique"). Otherwise each Update will have to update ALL the indexes for that table, even though the actual index entry won't have changed.(In the absence of a clustered index EVERY Index stores the "disk address" of the record, whereas with a Clustered Index all the secondary indexes store the Key from the Clustered Index, and thus do not need to be updated when the record "moves")In fact I think both the indexes that you refer to as (1) should be CLUSTERED (in the absence of a Clustered PK that is ...)KristenKristen |
 |
|
SHardy
Starting Member
35 Posts |
Posted - 2007-05-11 : 11:33:58
|
Hello again!I have been doing a bit more testing. And this time I did first reset the field to be updated.Even though I had got it down to 10 minutes, I saw a big improvement on this by dropping the index on the dim_cost_centre field in the nav_glentry table.However, by making the remaining unique index clustered, it was slightly slower than when non-clustered. This was on top of the time it takes to set the clustered index.I will now leave it as follows:nav_glentry:1) entry_no - unique index, non clusterednav_cost_centre_lookup:1) dim_cost_centre - unique index, non-clusteredThis job will run each day over the weekend, so I will check on Monday how it goes and report back.Thanks |
 |
|
Kristen
Test
22859 Posts |
Posted - 2007-05-11 : 12:50:12
|
"making the remaining unique index clustered, it was slightly slower than when non-clustered"I'm surprised. Did you have the dim_cost_centre in place with this test?My expectation would be that a clustered index would mean that you would not have to drop (and recreate) the index on dim_cost_centre - or any other secondary index for that matter, with the exception of any index on dim_old_ini [which I would definitely drop and recreate if you need one])Kristen |
 |
|
|