| Author |
Topic |
|
MikeArnold
Starting Member
11 Posts |
Posted - 2003-06-24 : 03:04:38
|
| I have a situation where I am updating 2 columns of a very large (~15,000,000 rows) table with separate UPDATE statements. These are1.UPDATE dbo.[lasAG_DWFactTable_0000000011] WITH (TABLOCK) SET [H_001] = ISNULL((SELECT TOP 1 A.AttributeId FROM (lasAG_DWAttribute A WITH(NOLOCK) LEFT JOIN (lasAG_DWAttributeRange B WITH (NOLOCK) LEFT JOIN lasAG_DWDriver C WITH (NOLOCK) ON B.DriverId = C.DriverId) ON A.AttributeId = B.AttributeId) WHERE A.HierarchyId = 1 AND ((CAST(dbo.[lasAG_DWFactTable_0000000011].[AccountCode] AS varbinary) BETWEEN B.RangeFrom AND B.RangeTo) OR ( CAST (dbo.[lasAG_DWFactTable_0000000011].[AccountCode] AS nvarchar) LIKE B.Mask COLLATE DATABASE_DEFAULT)) AND CAST(dbo.[lasAG_DWFactTable_0000000011].[Database] AS varbinary) = C.DriverValue), 22)2.UPDATE dbo.[lasAG_DWFactTable_0000000011] WITH (TABLOCK) SET [H_003] = ISNULL((SELECT TOP 1 A.AttributeId FROM (lasAG_DWAttribute A WITH(NOLOCK) LEFT JOIN (lasAG_DWAttributeRange B WITH (NOLOCK) LEFT JOIN lasAG_DWDriver C WITH (NOLOCK) ON B.DriverId = C.DriverId) ON A.AttributeId = B.AttributeId) WHERE A.HierarchyId = 3 AND ((CAST(dbo.[lasAG_DWFactTable_0000000011].[AccountCode] AS varbinary) BETWEEN B.RangeFrom AND B.RangeTo) OR ( CAST (dbo.[lasAG_DWFactTable_0000000011].[AccountCode] AS nvarchar) LIKE B.Mask COLLATE DATABASE_DEFAULT)) AND CAST(dbo.[lasAG_DWFactTable_0000000011].[Database] AS varbinary) = C.DriverValue), 40)In Case 1 the result of the 3 table join in the embedded SELECT is about 150 rows, in Case 2 about 80 rows.However, when I run the statements, the execution times are: Case 1: 2 - 5 minutesCase 2: 40 - 60 minutesThe only difference (that I can see) is that after examining the rows returned (by turning the UPDATE into a SELECT statement) is that in Case 2 the ISNULL logic is being invoked much more often - ie there are relatively few rows that satisfy the correlation between the FactTable and the other tables.Anyone got any ideas on how I can speed this up, and why is Case 2 so much slower??? |
|
|
byrmol
Shed Building SQL Farmer
1591 Posts |
Posted - 2003-06-24 : 03:24:19
|
Mike,Those Access type queries do my head in.. I formatted that first one slightly differently..UPDATE dbo.[lasAG_DWFactTable_0000000011] WITH (TABLOCK) SET [H_001] = ISNULL((SELECT TOP 1 A.AttributeId FROM lasAG_DWAttribute A WITH(NOLOCK) LEFT JOIN lasAG_DWAttributeRange B WITH (NOLOCK) ON A.AttributeId = B.AttributeIdLEFT JOIN lasAG_DWDriver C WITH (NOLOCK) ON B.DriverId = C.DriverIdWHERE A.HierarchyId = 1 AND ((CAST(dbo.[lasAG_DWFactTable_0000000011].[AccountCode] AS varbinary) BETWEEN B.RangeFrom AND B.RangeTo) OR ( CAST (dbo.[lasAG_DWFactTable_0000000011].[AccountCode] AS nvarchar) LIKE B.Mask COLLATE DATABASE_DEFAULT)) AND CAST(dbo.[lasAG_DWFactTable_0000000011].[Database] AS varbinary) = C.DriverValue), 22) Highlight this statement along with your original 2 and Press CTRL+L in QA.. This will give you a Estimated execution plan.Tell us if there is any difference...DavidM"SQL-3 is an abomination.." |
 |
|
|
MikeArnold
Starting Member
11 Posts |
Posted - 2003-06-24 : 04:32:39
|
| Thanks - I checked the execution plans as you suggested, and there is no difference. I noticed how you changed the join, but I have tried that. As there are so few rows which satisfy the join conditions it's a bit hard to see that this would make any difference.I just tried rolling the two update statements into one, and it took 1h 1m to finish. I'm in the process of getting a server trace to see if they use resources differently.Thanks againMike |
 |
|
|
byrmol
Shed Building SQL Farmer
1591 Posts |
Posted - 2003-06-24 : 04:46:00
|
I think you can write this a different way...UPDATE dbo.[lasAG_DWFactTable_0000000011] WITH (TABLOCK) SET [H_001] = ISNULL(X.AttributeId,22)FROM(SELECT TOP 1 A.AttributeId, B.RangeFrom, B.RangeTo, B.Mask, C.DriverValueFROM lasAG_DWAttribute A WITH(NOLOCK) LEFT JOIN lasAG_DWAttributeRange B WITH (NOLOCK) ON A.AttributeId = B.AttributeIdLEFT JOIN lasAG_DWDriver C WITH (NOLOCK) ON B.DriverId = C.DriverIdWHERE A.HierarchyId = 1) AS XWHERE ((CAST([AccountCode] AS varbinary) BETWEEN X.RangeFrom AND X.RangeTo) OR ( CAST([AccountCode] AS nvarchar) LIKE X.Mask COLLATE DATABASE_DEFAULT)) AND CAST([Database] AS varbinary) = X.DriverValue Is that the same result and how does it fair?DavidM"SQL-3 is an abomination.." |
 |
|
|
Wanderer
Master Smack Fu Yak Hacker
1168 Posts |
Posted - 2003-06-24 : 05:43:25
|
| Is there some reason I can't figure that you are using TABLOCK and not TABLOCKX ? I assume that you are running at a time when no-one wants access to that table, since you deny access for your entire execution time. *#* *#* *#* *#* Chaos, Disorder and Panic ... my work is done here!Edited by - wanderer on 06/24/2003 05:44:36 |
 |
|
|
MikeArnold
Starting Member
11 Posts |
Posted - 2003-06-24 : 20:04:00
|
| Hi David - I've tried the UPDATE ... FROM syntax and it is much much slower (again for reasons I don't understand) than the correlated subselect UPDATE. |
 |
|
|
MikeArnold
Starting Member
11 Posts |
Posted - 2003-06-24 : 20:16:10
|
| Hi Wanderer .. The only reason I want a TABLOCK is to save server resources - remember the table being updated is ~15,000,000 rows, and in some sites could be much much bigger again. With TABLOCK there is only one lock; without it page locking kicks in, and I've had a look at the syslocks table and there are a bloody lot of locks. There is actually a noticable improvement in response time with the TABLOCK compared to without it. As far as I can see TABLOCK behaves the same as TABLOCKX anyway - I don't see the advantage in this application. |
 |
|
|
byrmol
Shed Building SQL Farmer
1591 Posts |
Posted - 2003-06-24 : 20:44:27
|
Mike,One more go at a rewrite...UPDATE Fact WITH (TABLOCK) SET Fact.[H_001] = ISNULL(A.AttributeId,22)FROM dbo.[lasAG_DWFactTable_0000000011] AS FactINNER JOIN lasAG_DWAttributeRange B WITH(NOLOCK) ON CAST([AccountCode] AS varbinary) BETWEEN B.RangeFrom AND B.RangeTo OR CAST([AccountCode] AS nvarchar) LIKE B.Mask COLLATE DATABASE_DEFAULTLEFT JOIN lasAG_DWDriver C WITH(NOLOCK) ON CAST([Database] AS varbinary) = C.DriverValue AND B.DriverId = C.DriverIdLEFT JOIN lasAG_DWAttribute A WITH(NOLOCK) ON A.AttributeId = B.AttributeId AND A.HierarchyId = 1 EDIT: I forgot the TOP 1.. can you alter it?DavidM"SQL-3 is an abomination.."Edited by - byrmol on 06/24/2003 20:47:40 |
 |
|
|
MikeArnold
Starting Member
11 Posts |
Posted - 2003-06-24 : 22:52:54
|
| David ... I'm running your suggested SQL statement now, and I'm up to 10 minutes for the "fast" set of parameters instead of 2m+. I checked the execution plan, and while it generates a plan very different to the "standard" plan, it appears that at one point it comes up with a cartesion product - in this case a rowset of 382,000,000 or so rows; that's why it's so slow, I guess.I'm very grateful for your help - any more suggestions are very welcome.PS I ran Performance Monitor for the 2 cases, and the results appear very different. The counters I used were Physical Disk - Avg Disk Queue Length, and Procesor - % processor time.The "fast" update displayed high cpu and high disk queue levels, while the "slow" update displayed high (but very steady) cpu usage, but very low levels of disk usage (with a very occasional burst).Very interesting, but no closer to solving the mystery.CheersMike |
 |
|
|
MikeArnold
Starting Member
11 Posts |
Posted - 2003-06-25 : 01:50:06
|
| Hi all ... NEWS FLASH I have conclusively proved that the execution speed is a function of the number of times that the correlated subselect returns NULL. I tweaked the AttributeRange data for HierarchyId 3 (the "slow" update) so that all the data in the FactTable matches at least one row of the 3-table join. The execution speed now matches that of the "fast" update - ie ~2 minutes.The question is, can anyone think of a way of speeding up the "slow" updates - ie where the subselect returns lots of nulls when correlated to the FactTable???Unfortunately the SQL is generated by a stored procedure distributed as part of a shrink-wrapped application, and is always going to be dependant on whatever the users enter (or don't enter) - so I need to generate "optimal" SQL - I havn't got the luxury of massaging the SQL by hand to match a particular set of circumstances.CheersMike |
 |
|
|
byrmol
Shed Building SQL Farmer
1591 Posts |
Posted - 2003-06-25 : 02:49:53
|
| Mike,Do you have to have LEFT JOINS? That usually produces a bucket of NULLS..."Bucket of NULLS"Definition: Large quantities of crap..... ;-)DavidM"SQL-3 is an abomination.." |
 |
|
|
MikeArnold
Starting Member
11 Posts |
Posted - 2003-06-25 : 21:14:30
|
| Hi David ... Unfortunately it's the correlation that's producing the nulls - it's what happens when a row in the FactTable can't find at least one corresponding row in the 3-table join, in which case I plug a default value in - and before you ask, I can't use a column DEFAULT to deal with this situation because the default value is different depending on the value of the HierarchyId.I suspect that the reason why it's slow when there are lots of nulls is because each row in the FactTable (restricted over the DriverValue-[Database] join condition) has to search each row in the 3-table join result set (maybe a couple of hundred rows) before it can say for sure - "oh, no match, i'm null - better call the ISNULL function". In the other case, where the FactTable correlation always or nearly always finds a matching row in the 3-table join result set it only has to find the first one before it returns. Even so, I am surprised at the huge differential in execution time.CheersMikeFatDaddy |
 |
|
|
|