SQL Server Forums
Profile | Register | Active Topics | Members | Search | Forum FAQ
 
Register Now and get your question answered!
Username:
Password:
Save Password
Forgot your Password?

 All Forums
 SQL Server 2008 Forums
 SQL Server Administration (2008)
 regular drops of page life expectancy to zero
 New Topic  Reply to Topic
 Printer Friendly
Next Page
Author Previous Topic Topic Next Topic
Page: of 2

influent
Constraint Violating Yak Guru

USA
367 Posts

Posted - 10/27/2011 :  18:41:55  Show Profile  Reply with Quote
Every day on my production server (which is a beast with 64 GB of RAM and 64 cores) the page life expectancy suddenly drops to zero twice per day. It gradually goes up to around 60,000 before dropping, shortly after noon and shortly after midnight. My full backups are around these times, which makes me think the maintenance plan is somehow responsible. Does the Check Database Integrity Task clean out the buffer pool or something?

tkizer
Almighty SQL Goddess

USA
37316 Posts

Posted - 10/27/2011 :  19:39:22  Show Profile  Visit tkizer's Homepage  Reply with Quote
Update statistics will affect the PLE, plus rebuilding of indexes. I don't think CHECKDB does, but I could be wrong.

What all does your plan do?

Outside of the plan, I'd look into large scans occurring. Have you profiled it?

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

Edited by - tkizer on 10/27/2011 19:40:28
Go to Top of Page

influent
Constraint Violating Yak Guru

USA
367 Posts

Posted - 11/03/2011 :  22:02:21  Show Profile  Reply with Quote
Actually not all the backups run at these times, a couple start at 10 PM. Also the maintenance plans only create full and transaction log backups, they don't do any sort of index maintenance. I'll try profiling it soon. What template should I use?
Go to Top of Page

Sachin.Nand
Flowing Fount of Yak Knowledge

2937 Posts

Posted - 11/04/2011 :  00:21:21  Show Profile  Reply with Quote
quote:
Originally posted by influent

Actually not all the backups run at these times, a couple start at 10 PM. Also the maintenance plans only create full and transaction log backups, they don't do any sort of index maintenance. I'll try profiling it soon. What template should I use?



Is there any other job scheduled at the same time which might be scanning a huge table ?

PBUH

Go to Top of Page

tkizer
Almighty SQL Goddess

USA
37316 Posts

Posted - 11/04/2011 :  13:56:46  Show Profile  Visit tkizer's Homepage  Reply with Quote
quote:
Originally posted by influent

Actually not all the backups run at these times, a couple start at 10 PM. Also the maintenance plans only create full and transaction log backups, they don't do any sort of index maintenance. I'll try profiling it soon. What template should I use?



I don't use templates. If your system is using pure stored procedures for data access, then RPC:Completed is enough. If you've got prepared statements, then add SP:StmtCompleted. Also add in SP:Completed just in case. If you've got adhoc stuff, add SQL:BatchCompleted and SQL:StmtCompleted. You will likely have some duplicates in the trace with these, but that's okay. The goal is to capture the problem.

You'll be looking for very high reads. Be sure to save the trace to a file to avoid causing performance issues on production, and also run the trace from a different machine. The best possible way to trace is through a server-side trace, but it doesn't sound like you have enough experience to dip your hands into that.

Once you've captured the timeframe where the PLE plummets, load it into a table (you can use profiler with save as...) and then query it using:

SELECT TOP 1000 Reads, TextData, Duration/1000.0 AS DurationInMs
FROM YourTable
ORDER BY Reads DESC

Now you've got to do some analysis starting with the first row in the result set.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog
Go to Top of Page

influent
Constraint Violating Yak Guru

USA
367 Posts

Posted - 11/05/2011 :  01:57:32  Show Profile  Reply with Quote
quote:
The best possible way to trace is through a server-side trace, but it doesn't sound like you have enough experience to dip your hands into that.

Ouch! I shed a tear when I read that. I'm really not very intelligent though so I forgive you. Actually I've run server-side traces several times and that's what I was planning on doing, I'll schedule one for tomorrow night if I have time. I still don't understand why a server with way more RAM than it needs and several active databases would lose all of its PLE due to high reads, which is why I was wondering what template I wanted. I wasn't sure if I was looking for lock escalations or something weird. Doesn't the PLE dropping to zero mean that my entire buffer pool has been suddenly wiped out?
Go to Top of Page

tkizer
Almighty SQL Goddess

USA
37316 Posts

Posted - 11/05/2011 :  02:40:52  Show Profile  Visit tkizer's Homepage  Reply with Quote
I meant no offense of course. You shouldn't need anything special in the trace to find the culprit here.

"Way more RAM than it needs" isn't actually the case if your PLE is plummeting. Well sort of. The PLE plummeting has a high probability of a large scan due to a missing index. Have you run the "missing indexes" report to see what SQL Server thinks you are missing? That report plus very high reads in the trace should get you going on finding the root cause here.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog
Go to Top of Page

Sachin.Nand
Flowing Fount of Yak Knowledge

2937 Posts

Posted - 11/06/2011 :  01:13:39  Show Profile  Reply with Quote
quote:
Originally posted by tkizer

... The PLE plummeting has a high probability of a large scan due to a missing index...



Hmmm.....Isn't that what I had said in the first place ???

@influent

Set up a trace and a perfmon with this 3 counters CPU time,page faults and PLE value and see if when any high value of CPU time and page faults coincide with PLE value becoming 0.


PBUH

Go to Top of Page

influent
Constraint Violating Yak Guru

USA
367 Posts

Posted - 11/07/2011 :  16:50:27  Show Profile  Reply with Quote
I tried to run the trace at noon but the file filled up too fast so I didn't capture anything useful. I did notice another person seems to have had a similar issue with CHECKDB and PLE: http://www.sqlservercentral.com/Forums/Topic891578-146-1.aspx#bm891598

I'm removing CHECKDB from the maintenance plan for tonight to see how that goes. I guess I'm not even sure how CHECKDB helps me, will the job fail if it finds a problem?

From this post it sounds like I won't be able to get a reliable estimate of how much space CHECKDB will use in tempdb.
http://www.sqlskills.com/BLOGS/PAUL/post/How-does-DBCC-CHECKDB-WITH-ESTIMATEONLY-work.aspx

BTW, my "missing index" query only suggests two indexes, neither of which is likely to be very helpful.
Go to Top of Page

tkizer
Almighty SQL Goddess

USA
37316 Posts

Posted - 11/07/2011 :  16:52:31  Show Profile  Visit tkizer's Homepage  Reply with Quote
quote:
Originally posted by Sachin.Nand

quote:
Originally posted by tkizer

... The PLE plummeting has a high probability of a large scan due to a missing index...



Isn't that what I had said in the first place ???



Who said that you didn't? Your post was very short, so we are trying to elaborate this for the OP.

And furthermore, I said it in the very first reply to the OP. So what is your point?

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

Edited by - tkizer on 11/07/2011 16:53:27
Go to Top of Page

tkizer
Almighty SQL Goddess

USA
37316 Posts

Posted - 11/07/2011 :  16:56:46  Show Profile  Visit tkizer's Homepage  Reply with Quote
quote:
Originally posted by influent


I'm removing CHECKDB from the maintenance plan for tonight to see how that goes. I guess I'm not even sure how CHECKDB helps me, will the job fail if it finds a problem?



If there is data corruption, yes it will fail and a DBA should look into it immediately. This type of job should page.

quote:
Originally posted by influent


BTW, my "missing index" query only suggests two indexes, neither of which is likely to be very helpful.



What value do these two have for the "Impact" column?

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog
Go to Top of Page

influent
Constraint Violating Yak Guru

USA
367 Posts

Posted - 11/07/2011 :  18:14:01  Show Profile  Reply with Quote
6% and 42%. The one with 42% is a narrow index but it's for a table with more writes than reads, and the table has 3 non-clustered indexes already. There was already a similar index to the suggested one, it was just missing one column, so I updated the old index with that column and will see how it goes. Anyway I would be shocked if these tables had anything to do with the PLE drops.
Go to Top of Page

tkizer
Almighty SQL Goddess

USA
37316 Posts

Posted - 11/07/2011 :  18:18:06  Show Profile  Visit tkizer's Homepage  Reply with Quote
I think you are using a different "missing indexes" report than me. Impact column is a derived column. If any are over 10 million, you should consider adding them. And if they are waaaayyyy over 10 million, then definitely add them. I had one at 6 billion before I I added it, and boy did it make a huge performance difference when I finally added it (for the better of course).

http://sqlserverpedia.com/wiki/Find_Missing_Indexes

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog
Go to Top of Page

influent
Constraint Violating Yak Guru

USA
367 Posts

Posted - 11/07/2011 :  19:23:56  Show Profile  Reply with Quote
Doh! I think we're looking at the same query (effectively) but I was looking at the Avg User Impact column rather than the Total Cost column, which I assume is equivalent to the Impact column. Also I was only returning TOP 2 by accident. Now I get a much bigger list. One index is at 820 million and the other is at 181 million, for Total Cost. I will give those a shot tomorrow after I see what effect my removal of CHECKDB has. Thanks!
Go to Top of Page

tkizer
Almighty SQL Goddess

USA
37316 Posts

Posted - 11/07/2011 :  19:27:10  Show Profile  Visit tkizer's Homepage  Reply with Quote
I am referring to this column: (avg_total_user_cost * avg_user_impact) * (user_seeks + user_scans) AS Impact

I throw the entire thing into a derived table and then use Impact > 10000000 so that I don't have to be bothered with the output of the the stuff that doesn't matter (yet). I do look at the rest of the output too, just not initially.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog
Go to Top of Page

influent
Constraint Violating Yak Guru

USA
367 Posts

Posted - 11/08/2011 :  16:53:54  Show Profile  Reply with Quote
At this point I'm certain that CHECKDB is what was causing the PLE drops. Should I put it back in the maintenance plan since the PLE drops don't seem to be causing any problems?
Go to Top of Page

tkizer
Almighty SQL Goddess

USA
37316 Posts

Posted - 11/08/2011 :  17:24:55  Show Profile  Visit tkizer's Homepage  Reply with Quote
Well it depends. The best practice is to not run CheckDb on production and instead do a scheduled backup/restore to a test machine and do CheckDb there. That's for the user databases at least. You would do the CheckDb on production for the system databases.

If you don't have an automated backup/restore test system for CheckDb of user databases, then you definitely need to run CheckDb in production for those user databases.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog
Go to Top of Page

tkizer
Almighty SQL Goddess

USA
37316 Posts

Posted - 11/08/2011 :  17:27:29  Show Profile  Visit tkizer's Homepage  Reply with Quote
I'm going to ask Paul Randal about the PLE and CHECKDB. He wrote it, so he'll know what's going on.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog
Go to Top of Page

tkizer
Almighty SQL Goddess

USA
37316 Posts

Posted - 11/08/2011 :  17:28:55  Show Profile  Visit tkizer's Homepage  Reply with Quote
Nevermind, I just did a google search on the topic and yes it is impacted. If it completely plummets, then you probably don't have enough memory to counter just how many pages it is pulling into memory for the CHECKDB process. If it does ever affect performance, consider adding more RAM.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

Edited by - tkizer on 11/08/2011 17:29:52
Go to Top of Page

influent
Constraint Violating Yak Guru

USA
367 Posts

Posted - 11/08/2011 :  19:05:45  Show Profile  Reply with Quote
I have a nightly process that backs up the primary user databases, restores them with a different name (on the prod server), scrubs a few tables for testers, then backs up the scrubbed databases. Am I correct in assuming that if I were to set up a maintenance plan that runs CHECKDB against the scrubbed databases, that would still fill up memory and force the PLE to drop?
Go to Top of Page

tkizer
Almighty SQL Goddess

USA
37316 Posts

Posted - 11/08/2011 :  19:11:04  Show Profile  Visit tkizer's Homepage  Reply with Quote
I replied with yes initially, but I need someone else like Gail Shaw to answer this.

Memory is shared on an instance, but each database has its own pool. I am not sure if pages read into memory for CHECKDB of db1 would affect what pages are in memory for db2, assuming the db2 pages are "old".

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog
Go to Top of Page
Page: of 2 Previous Topic Topic Next Topic  
Next Page
 New Topic  Reply to Topic
 Printer Friendly
Jump To:
SQL Server Forums © 2000-2009 SQLTeam Publishing, LLC Go To Top Of Page
This page was generated in 0.39 seconds. Powered By: Snitz Forums 2000