Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 Refactor - Duration/Reads Lower, Writes Higher

Author  Topic 

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2014-12-11 : 16:58:55
Details for #sqlhelp question:

I am working my way through long-running, high CPU and high Reads stored procedures and am puzzled by this one.

Original code:

UPDATE t1
SET [status] = 'C'
FROM dbo.table1 t1
WHERE t1.planned_status = 'Y'
AND t1.[status] = 'N'
AND t1.[wave_type_id] = 1
AND NOT EXISTS (
SELECT *
FROM dbo.table2 t2
JOIN dbo.table3 t3
ON t3.order_number = t2.order_number
AND t3.wh_id = t2.wh_id
WHERE t2.[status] <> 'SHIPPED'
AND t3.wave_id = t1.wave_id
AND t2.[type_id] = 1);

UPDATE t1
SET [status] = 'N'
FROM dbo.table1 t1
WHERE t1.planned_status = 'Y'
AND t1.[status] = 'C'
AND t1.wave_type_id = 1
AND EXISTS (
SELECT *
FROM dbo.table2 t2
JOIN dbo.table3 t3
ON t3.order_number = t2.order_number
AND t3.wh_id = t2.wh_id
WHERE t2.[status] <> 'SHIPPED'
AND t3.wave_id = t1.wave_id
AND t2.[type_id] = 1);


They looked so similar and since it's scanning table3, I figured I'd combine them.

Refactored:

UPDATE t1
SET [status] =
CASE
WHEN t1.[status] = 'N' AND dt.wave_id IS NULL THEN 'C'
WHEN t1.[status] = 'C' AND dt.wave_id IS NOT NULL THEN 'N'
END
FROM dbo.table1 t1
LEFT JOIN (
SELECT DISTINCT t3.wave_id
FROM dbo.table2 t2
JOIN dbo.table3 t3
ON t3.order_number = t2.order_number
AND t3.wh_id = t2.wh_id
WHERE t2.[status] <> 'SHIPPED'
AND t2.[type_id] = 1) dt
ON t1.wave_id = dt.wave_id
WHERE t1.planned_status = 'Y'
AND t1.wave_type_id = 1;


CPU has improved a little, Reads have improved a lot, Duration has improved a bit...but Writes have gone up significantly.

Here's the aggregated results from Profiler:

StoredProcVersion DurationAvgMs DurationMinMs DurationMaxMs ReadsAvg ReadsMin ReadsMax WritesAvg WritesMin WritesMax CpuAvg CpuMin CpuMax
----------------- ------------- ------------- ------------- -------- -------- -------- --------- --------- --------- ------ ------ ------
Baseline 13441 7238 39801 2654503 2610548 2707097 11 0 56 7688 6770 8642
Refactored 7443 6106 16633 611615 603664 613221 4461 580 4605 6410 6068 6973


Why would Writes go up SO much?

If you see a more efficient way to do this, let me know. I'm all ears.

Tara Kizer
SQL Server MVP since 2007
http://weblogs.sqlteam.com/tarad/

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2014-12-11 : 17:26:27
I've added "AND t1.[status] IN ('C', 'N')" to the refactored code's WHERE clause. We are re-running a load test to see how it does. The status column can be C, N or NULL.

2nd code change:

UPDATE t1
SET [status] =
CASE
WHEN t1.[status] = 'N' AND dt.wave_id IS NULL THEN 'C'
WHEN t1.[status] = 'C' AND dt.wave_id IS NOT NULL THEN 'N'
END
FROM dbo.table1 t1
LEFT JOIN (
SELECT DISTINCT t3.wave_id
FROM dbo.table2 t2
JOIN dbo.table3 t3
ON t3.order_number = t2.order_number
AND t3.wh_id = t2.wh_id
WHERE t2.[status] <> 'SHIPPED'
AND t2.[type_id] = 1) dt
ON t1.wave_id = dt.wave_id
WHERE t1.planned_status = 'Y'
AND t1.[status] IN ('C', 'N')
AND t1.wave_type_id = 1;


Tara Kizer
SQL Server MVP since 2007
http://weblogs.sqlteam.com/tarad/
Go to Top of Page

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2014-12-11 : 17:57:53
3rd code change as 1st and 2nd produced some inaccurate results:

UPDATE t1
SET [status] =
CASE
WHEN t1.[status] = 'N' AND dt.wave_id IS NULL THEN 'C'
WHEN t1.[status] = 'C' AND dt.wave_id IS NOT NULL THEN 'N'
ELSE t1.[status]
END
FROM dbo.table1 t1
LEFT JOIN (
SELECT DISTINCT t3.wave_id
FROM dbo.table2 t2
JOIN dbo.table3 t3
ON t3.order_number = t2.order_number
AND t3.wh_id = t2.wh_id
WHERE t2.[status] <> 'SHIPPED'
AND t2.[type_id] = 1) dt
ON t1.wave_id = dt.wave_id
WHERE t1.planned_status = 'Y'
AND (
(t1.[status] = 'N' AND dt.wave_id IS NULL)
OR
(t1.[status] = 'C' AND dt.wave_id IS NOT NULL)
)
AND t1.wave_type_id = 1;


Tara Kizer
SQL Server MVP since 2007
http://weblogs.sqlteam.com/tarad/
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2014-12-11 : 23:34:56
I don't have a better suggestion, but I'm guessing the DISTINCT is spooling somewhere and probably causing the writes to go up. However, once spooled, it would have fewer pages to read and process, leading to the CPU and read reductions you're seeing.

I imagine if you put this on #sqlhelp you had someone weigh in who knows what they're talking about. (I haven't checked).
Go to Top of Page

ScottPletcher
Aged Yak Warrior

550 Posts

Posted - 2014-12-12 : 12:46:33
I suggest this:

UPDATE t1
SET [status] = CASE WHEN t1.[status] = 'C' THEN 'N' ELSE 'C' END
FROM dbo.table1 t1
OUTER APPLY (
SELECT TOP (1) 1 AS check_exists
FROM dbo.table2 t2
JOIN dbo.table3 t3
ON t3.order_number = t2.order_number
AND t3.wh_id = t2.wh_id
WHERE t2.[status] <> 'SHIPPED'
AND t3.wave_id = t1.wave_id
AND t2.[type_id] = 1
) AS ca1
WHERE t1.planned_status = 'Y'
AND t1.[status] IN ('C', 'N')
AND t1.[wave_type_id] = 1
AND ((t1.[status] = 'C' AND ca1.check_exists IS NULL) OR
(t1.[status] = 'N' AND ca1.check_exists IS NOT NULL))

Go to Top of Page

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2014-12-12 : 12:59:26
We just completed a load test of my 3rd code change above, and here are the results:


StoredProcVersion DurationAvgMs DurationMinMs DurationMaxMs ReadsAvg ReadsMin ReadsMax WritesAvg WritesMin WritesMax CpuAvg CpuMin CpuMax
----------------- ------------- ------------- ------------- -------- -------- -------- --------- --------- --------- ------ ------ ------
Baseline 13441 7238 39801 2654503 2610548 2707097 11 0 56 7688 6770 8642
Refactored 7796 5126 22099 60355 52910 62064 3 0 23 5495 5070 6209


Scott, I will try out your suggestion! I like the idea of TOP 1. I've seen OUTER APPLY many times, but it isn't something I understand yet. Will dig into that.

I'll schedule a load test for your solution. Thank you!

Tara Kizer
SQL Server MVP since 2007
http://weblogs.sqlteam.com/tarad/
Go to Top of Page

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2014-12-18 : 13:37:23
We've finally completed a load test that used Scott's solution. I'm a little confused by the results. The OUTER APPLY solution has lower duration and CPU but much higher reads than the LEFT JOIN/DISTINCT solution (but still lower reads than the baseline).


StoredProcVersion DurationAvgMs DurationMinMs DurationMaxMs ReadsAvg ReadsMin ReadsMax WritesAvg WritesMin WritesMax CpuAvg CpuMin CpuMax
-------------------------- ------------- ------------- ------------- -------- -------- --------- --------- --------- --------- ------ ------ -----------
Baseline 13441 7238 39801 2654503 2610548 2707097 11 0 56 7688 6770 8642
RefactoredLeftJoinDistinct 7796 5126 22099 60355 52910 62064 3 0 23 5495 5070 6209
RefactoredOuterApply 1435 481 4595 624327 396091 859621 6 0 26 863 483 1310


Tara Kizer
SQL Server MVP since 2007
http://weblogs.sqlteam.com/tarad/
Go to Top of Page

mandm
Posting Yak Master

120 Posts

Posted - 2014-12-18 : 15:50:27
First off let me say that you guys and gals rock when it comes to this stuff and I am not in the same league as you when it comes to SQL Server knowledge. That said I do sometimes find that CTE's work well for situations like this so here is my stab at it.



WITH Sub_Query_CTE (wave_id)
AS
(

SELECT DISTINCT t3.wave_id
FROM dbo.table2 t2
JOIN dbo.table3 t3
ON t3.order_number = t2.order_number
AND t3.wh_id = t2.wh_id
WHERE t2.[status] <> 'SHIPPED'
AND t2.[type_id] = 1

)
UPDATE t1
SET [status] = CASE WHEN t1.[status] = 'N' AND dt.wave_id IS NULL THEN 'C'
WHEN t1.[status] = 'C' AND dt.wave_id IS NOT NULL THEN 'N'
ELSE t1.[status]
END
FROM dbo.table1 t1
LEFT JOIN Sub_Query_CTE AS dt
ON t1.wave_id = dt.wave_id
WHERE t1.planned_status = 'Y'
AND (
(t1.[status] = 'N' AND dt.wave_id IS NULL)
OR
(t1.[status] = 'C' AND dt.wave_id IS NOT NULL)
)
AND t1.wave_type_id = 1;




Go to Top of Page

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2014-12-18 : 15:55:40
The CTE solution is the same as the LEFT JOIN/DISTINCT solution. It's just a different way of writing it: CTE vs derived tables. So yes it'll be better than the baseline and should be equivalent to the solution I wrote and tested last week, but the OUTER APPLY solution from Scott beats all solutions thus far.

Editing this post as it sounds like I am dismissing your idea. I wanted to say THANK YOU for offering another solution. I will do a test to make sure that what I wrote above regarding CTE vs. derived tables being the same.

Tara Kizer
SQL Server MVP since 2007
http://weblogs.sqlteam.com/tarad/
Go to Top of Page

ScottPletcher
Aged Yak Warrior

550 Posts

Posted - 2014-12-18 : 16:02:39
Very interesting. I'd have to see the query plan for my query. I'd hope/expect that SQL would use a lazy/table spool (Edit: or maybe, less hopefully, a work table) for the OUTER APPLY to prevent "re-lookups". That will generate more "reads" (logical reads), but the spool reads will be much more efficient vs. re-running the entire lookup query again.

Thus, if a spool is being used, I can see a fixed LEFT JOIN having technically less overall reads but still possibly taking more time to run.
Go to Top of Page

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2014-12-18 : 16:20:37
I did post the showplan xml but have now deleted it because I realized it was for a different stored procedure that has very similar code. The whoisactive table doesn't contain the info for your query as it's too efficient now.

I'll grab it through a trace or XE session the next time we have a load test run. We're refreshing the PERF environment now, which will take at least a day.

Tara Kizer
SQL Server MVP since 2007
http://weblogs.sqlteam.com/tarad/
Go to Top of Page

namman
Constraint Violating Yak Guru

285 Posts

Posted - 2014-12-29 : 18:53:27
Why not doing simple like this :


UPDATE t1
SET [status] = case when t1.status='C' then 'N' when t1.status='N' then 'C' else t1.status end
FROM dbo.table1 t1
WHERE t1.planned_status = 'Y'
AND t1.[wave_type_id] = 1
AND NOT EXISTS (
SELECT *
FROM dbo.table2 t2
JOIN dbo.table3 t3
ON t3.order_number = t2.order_number
AND t3.wh_id = t2.wh_id
WHERE t2.[status] <> 'SHIPPED'
AND t3.wave_id = t1.wave_id
AND t2.[type_id] = 1);
Go to Top of Page

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2014-12-29 : 19:20:22
quote:
Originally posted by namman

Why not doing simple like this :


UPDATE t1
SET [status] = case when t1.status='C' then 'N' when t1.status='N' then 'C' else t1.status end
FROM dbo.table1 t1
WHERE t1.planned_status = 'Y'
AND t1.[wave_type_id] = 1
AND NOT EXISTS (
SELECT *
FROM dbo.table2 t2
JOIN dbo.table3 t3
ON t3.order_number = t2.order_number
AND t3.wh_id = t2.wh_id
WHERE t2.[status] <> 'SHIPPED'
AND t3.wave_id = t1.wave_id
AND t2.[type_id] = 1);




I did a brief test on it as I was suspicious it wasn't functionally equivalent to the original or the OUTER APPLY that we went with. When I switch it to a SELECT for testing purposes, it returns no data but the original does. I didn't dig into it.

Tara Kizer
SQL Server MVP since 2007
http://weblogs.sqlteam.com/tarad/
Go to Top of Page

namman
Constraint Violating Yak Guru

285 Posts

Posted - 2014-12-29 : 20:21:51
Surprise ... if returning no data. I Just remove 1 condition from your original post to include all status


UPDATE t1
SET [status] = 'C'
SET [status] = case when t1.status='C' then 'N' when t1.status='N' then 'C' else t1.status end
FROM dbo.table1 t1
WHERE t1.planned_status = 'Y'
AND t1.[status] = 'N'
AND t1.[wave_type_id] = 1
AND NOT EXISTS (
SELECT *
FROM dbo.table2 t2
JOIN dbo.table3 t3
ON t3.order_number = t2.order_number
AND t3.wh_id = t2.wh_id
WHERE t2.[status] <> 'SHIPPED'
AND t3.wave_id = t1.wave_id
AND t2.[type_id] = 1);


Perhaps, I miss something ....
Go to Top of Page

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2014-12-29 : 20:52:45
The original is now returning no data (due to the time of day), but yours is returning 118,000 rows. I think the issue is with status=C needs to be paired with EXISTS in the WHERE clause, while status=N needs to be paired with NOT EXISTS. Yours is both statuses (only two that exist in the table) and NOT EXISTS.

I don't understand the business logic of this stored procedure to be able to explain what it's doing. I can from a T-SQL standpoint, but I wouldn't be adding any value to this thread since those replying are T-SQL geniuses.

Tara Kizer
SQL Server MVP since 2007
http://weblogs.sqlteam.com/tarad/
Go to Top of Page

namman
Constraint Violating Yak Guru

285 Posts

Posted - 2014-12-29 : 22:39:43
Got it, that what I miss....

Go to Top of Page
   

- Advertisement -