SQL Server Forums
Profile | Register | Active Topics | Members | Search | Forum FAQ
 
Register Now and get your question answered!
Username:
Password:
Save Password
Forgot your Password?

 All Forums
 SQL Server 2012 Forums
 Transact-SQL (2012)
 Verify WHERE clause
 New Topic  Reply to Topic
 Printer Friendly
Author Previous Topic Topic Next Topic  

tkizer
Almighty SQL Goddess

USA
37132 Posts

Posted - 12/03/2013 :  13:39:33  Show Profile  Visit tkizer's Homepage  Reply with Quote
For performance reasons, I'm analyzing and rewriting purge code that's currently in production. Could someone verify that I've properly rewritten it so that the two versions are equivalent as far as data goes?

Current: WHERE DATEADD(day, TTL_DAYS, CREATED_TIME) < GETDATE()

There is an index on CREATED_TIME, but as you know we need to isolate the indexed column to one side of the WHERE clause. The execution plan currently shows a clustered index scan, and this is being done in a loop (deleting in batches) so the clustered index scan is happening numerous times.

Is this the equivalent to the above?:
New: WHERE CREATED_TIME < DATEADD(day, -TTL_DAYS, GETDATE())

Is there a better way (performance-wise) to write it?

Tara Kizer
SQL Server MVP since 2007
http://weblogs.sqlteam.com/tarad/

Lamprey
Flowing Fount of Yak Knowledge

4614 Posts

Posted - 12/03/2013 :  13:45:43  Show Profile  Reply with Quote
Looks equivalent to me.

I don't think you are going to gain any performance there, but you might be able to take advantage of a computed column..

Edited by - Lamprey on 12/03/2013 13:46:04
Go to Top of Page

tkizer
Almighty SQL Goddess

USA
37132 Posts

Posted - 12/03/2013 :  13:51:51  Show Profile  Visit tkizer's Homepage  Reply with Quote
I thought functions on the indexed column cause execution plan issues (scan)? That was the case in earlier versions of SQL Server at least. I haven't yet tested it in 2012 though. The recommendation in the past was always to isolate the indexed column to one side of the where clause.

Tara Kizer
SQL Server MVP since 2007
http://weblogs.sqlteam.com/tarad/
Go to Top of Page

Lamprey
Flowing Fount of Yak Knowledge

4614 Posts

Posted - 12/03/2013 :  15:14:22  Show Profile  Reply with Quote
That is true, but since you are using another column in the calculation I think it will still scan. Unless TTL_DAYS is not a column...?

Obviously you can/should try it out and see if it makes a difference. :)
Go to Top of Page

TG
Flowing Fount of Yak Knowledge

USA
6062 Posts

Posted - 12/03/2013 :  15:36:20  Show Profile  Reply with Quote
>>Is there a better way (performance-wise) to write it?

If you can't avoid the scan for each iteration then maybe insert the PKs for all the rows to delete into a temp table in one shot. Then chunk delete the real table JOINed to the temp by PK.

Be One with the Optimizer
TG
Go to Top of Page

tkizer
Almighty SQL Goddess

USA
37132 Posts

Posted - 12/03/2013 :  15:46:19  Show Profile  Visit tkizer's Homepage  Reply with Quote
Yes TTL_DAYS is a column. It stands for "time to live", so basically is the purge retention setting.

Using the temp table approach is how I've done it in the past. I'm not the author of this particular code, just am the one having to analyze it.

I definitely will be testing the performance of the current vs. new. Will post results when done.

Thank you for your help!

Tara Kizer
SQL Server MVP since 2007
http://weblogs.sqlteam.com/tarad/
Go to Top of Page

Kristen
Test

United Kingdom
22415 Posts

Posted - 12/03/2013 :  19:25:07  Show Profile  Reply with Quote
quote:
Originally posted by TG

>>Is there a better way (performance-wise) to write it?

If you can't avoid the scan for each iteration then maybe insert the PKs for all the rows to delete into a temp table in one shot. Then chunk delete the real table JOINed to the temp by PK.


That's what we do too. We insert into TEMP table with an IDENTITY and ORDER BY on the PK (actually it needs to be the Clustered Index keys of course) of the table we are deleting from, so when we delete in chunks that is done in PK order - so hopefully we are deleted contiguous rows from the table, and if not then many index entries from same page in the clustered index.

We have a 2s WAIT between each loop iteration too (to allow other processes to run) and we calculate the elapsed time of each loop and increase / decrease batch size dynamically - so if workload on server increases the purge reduces its CPU impact.

Final thing, we backup TLogs at 2 minute interval (normally 10 minutes) during Purge so that the Purge itself is not responsible for Log file extension. It represents, for us, the large Log transaction volume of any activity we perform as at other times we are far more Read than Write.
Go to Top of Page

Kristen
Test

United Kingdom
22415 Posts

Posted - 12/03/2013 :  19:28:21  Show Profile  Reply with Quote
quote:
Originally posted by tkizer

New: WHERE CREATED_TIME < DATEADD(day, -TTL_DAYS, GETDATE())

Is there a better way (performance-wise) to write it?


Personally I would pre-calculate DATEADD(day, -TTL_DAYS, GETDATE()) into an @WorkingVariable and then do

WHERE CREATED_TIME < @WorkingVariable

but I expect the optimiser is smart enough to do the Calc upfront, so probably no difference.

Using an @Variable is sometimes helpful if we need a DEBUG in there to check what the calculated cutoff actually was, or need to log it, or some other jiggery-pokery, hence I prefer to use the @Variable from the get-go rather than modify the WHERE clause later to accommodate it (and then find it kills performance unexpectedly!)

Sorry ... waffling again!
Go to Top of Page

tkizer
Almighty SQL Goddess

USA
37132 Posts

Posted - 12/03/2013 :  19:48:56  Show Profile  Visit tkizer's Homepage  Reply with Quote
quote:
Originally posted by Kristen


Personally I would pre-calculate DATEADD(day, -TTL_DAYS, GETDATE()) into an @WorkingVariable and then do

WHERE CREATED_TIME < @WorkingVariable



TTL_DAYS can be different for each row in the table. Right now there are only 3 distinct values amongst the several million row table. It's a shared database that other applications use (I'm not clear for what, I'm mostly a production DBA and do not get involved with business logic), and each application specifies its own data retention (TTL_DAYS column). Right now only 3 applications are using it, and the values are 1, 7, and 60. So I could use a variable, but I would need to set it for each distinct TTL_DAYS.

I am hoping that isolating the CREATED_DATE column to the left is enough to get around the scan. I'm currently baselining the production version. My next test will be adding 3 indexes that are missing from various tables being purged. And then I'll test the new where clause. If it doesn't pan out, then I'll look into rewriting it. I'd rather avoid a total rewrite because it'll mean the I&T team has to test it more thoroughly.

The purge job is currently failing in production due to a FK violation. When I started looking into why (it's due to using GETDATE() for each table and not deleting by a set date stored in @purgeDate), I realized we've got a major performance issue when it runs. I'm glad I caught this perf issue now instead of later when many more applications will be using it.

Tara Kizer
SQL Server MVP since 2007
http://weblogs.sqlteam.com/tarad/
Go to Top of Page

Kristen
Test

United Kingdom
22415 Posts

Posted - 12/04/2013 :  03:53:47  Show Profile  Reply with Quote
quote:
Originally posted by tkizer


TTL_DAYS can be different for each row in the table.


Apologies, it was late and I didn't read it carefully; your code is perfectly clear - now that its daylight!

quote:
Right now there are only 3 distinct values amongst the several million row table. ... So I could use a variable, but I would need to set it for each distinct TTL_DAYS.


I've understood that you want to minimally fix it for now, but my gut feeling is that there would be merit in your DISTINCT TTL_DAYS solution:

SELECT DISTINCT TTL_DAYS,
       DATEADD(day, -TTL_DAYS, GETDATE()) AS [MyCutoffDate]
INTO #TEMP
FROM MyMainTable

DELETE D
FROM MyMainTable AS M
     JOIN #TEMP AS T
         ON T.TTL_DAYS = M.TTL_DAYS
WHERE CREATED_TIME < MyCutoffDate


If you put the PKeys into a #TEMP table and then use that as a "list" for the actual purge, in batches, the initial SELECT will be low-impact (even if it takes a while), its the DELETEs that are more likely to BLOCK etc. and my aim is always to keep the elapsed time of those to a minimum.

Although if you have quiet-time in the middle of the night you could maybe just let the purge run brute-force

But I'm sure you know all this Tara, so I'm just talking out loud in case I say something that sparks an idea for you
Go to Top of Page

tkizer
Almighty SQL Goddess

USA
37132 Posts

Posted - 12/04/2013 :  17:27:15  Show Profile  Visit tkizer's Homepage  Reply with Quote
The new WHERE clause didn't fix the scan, just as Lamprey suspected.

I'm going to find out if we can add a computed column, otherwise I'm going down the temp table/PK approach.



Tara Kizer
SQL Server MVP since 2007
http://weblogs.sqlteam.com/tarad/
Go to Top of Page
  Previous Topic Topic Next Topic  
 New Topic  Reply to Topic
 Printer Friendly
Jump To:
SQL Server Forums © 2000-2009 SQLTeam Publishing, LLC Go To Top Of Page
This page was generated in 0.09 seconds. Powered By: Snitz Forums 2000