regular drops of page life expectancy to zero

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2008 Forums

SQL Server Administration (2008)

regular drops of page life expectancy to zero

Author

Topic

influent
Constraint Violating Yak Guru

367 Posts

Posted - 2011-10-27 : 18:41:55

Every day on my production server (which is a beast with 64 GB of RAM and 64 cores) the page life expectancy suddenly drops to zero twice per day. It gradually goes up to around 60,000 before dropping, shortly after noon and shortly after midnight. My full backups are around these times, which makes me think the maintenance plan is somehow responsible. Does the Check Database Integrity Task clean out the buffer pool or something?

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2011-10-27 : 19:39:22

Update statistics will affect the PLE, plus rebuilding of indexes. I don't think CHECKDB does, but I could be wrong.

What all does your plan do?

Outside of the plan, I'd look into large scans occurring. Have you profiled it?

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

influent
Constraint Violating Yak Guru

367 Posts

Posted - 2011-11-03 : 22:02:21

Actually not all the backups run at these times, a couple start at 10 PM. Also the maintenance plans only create full and transaction log backups, they don't do any sort of index maintenance. I'll try profiling it soon. What template should I use?

Sachin.Nand

2937 Posts

Posted - 2011-11-04 : 00:21:21

quote:
Originally posted by influent

Actually not all the backups run at these times, a couple start at 10 PM. Also the maintenance plans only create full and transaction log backups, they don't do any sort of index maintenance. I'll try profiling it soon. What template should I use?

Is there any other job scheduled at the same time which might be scanning a huge table ?

PBUH

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2011-11-04 : 13:56:46

quote:
Originally posted by influent

Actually not all the backups run at these times, a couple start at 10 PM. Also the maintenance plans only create full and transaction log backups, they don't do any sort of index maintenance. I'll try profiling it soon. What template should I use?

I don't use templates. If your system is using pure stored procedures for data access, then RPC:Completed is enough. If you've got prepared statements, then add SP:StmtCompleted. Also add in SP:Completed just in case. If you've got adhoc stuff, add SQL:BatchCompleted and SQL:StmtCompleted. You will likely have some duplicates in the trace with these, but that's okay. The goal is to capture the problem.

You'll be looking for very high reads. Be sure to save the trace to a file to avoid causing performance issues on production, and also run the trace from a different machine. The best possible way to trace is through a server-side trace, but it doesn't sound like you have enough experience to dip your hands into that.

Once you've captured the timeframe where the PLE plummets, load it into a table (you can use profiler with save as...) and then query it using:

SELECT TOP 1000 Reads, TextData, Duration/1000.0 AS DurationInMs
FROM YourTable
ORDER BY Reads DESC

Now you've got to do some analysis starting with the first row in the result set.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

influent
Constraint Violating Yak Guru

367 Posts

Posted - 2011-11-05 : 01:57:32

quote:
The best possible way to trace is through a server-side trace, but it doesn't sound like you have enough experience to dip your hands into that.

Ouch! I shed a tear when I read that. I'm really not very intelligent though so I forgive you. Actually I've run server-side traces several times and that's what I was planning on doing, I'll schedule one for tomorrow night if I have time. I still don't understand why a server with way more RAM than it needs and several active databases would lose all of its PLE due to high reads, which is why I was wondering what template I wanted. I wasn't sure if I was looking for lock escalations or something weird. Doesn't the PLE dropping to zero mean that my entire buffer pool has been suddenly wiped out?

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2011-11-05 : 02:40:52

I meant no offense of course. You shouldn't need anything special in the trace to find the culprit here.

"Way more RAM than it needs" isn't actually the case if your PLE is plummeting. Well sort of. The PLE plummeting has a high probability of a large scan due to a missing index. Have you run the "missing indexes" report to see what SQL Server thinks you are missing? That report plus very high reads in the trace should get you going on finding the root cause here.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

Sachin.Nand

2937 Posts

Posted - 2011-11-06 : 01:13:39

quote:
Originally posted by tkizer

... The PLE plummeting has a high probability of a large scan due to a missing index...

Hmmm.....Isn't that what I had said in the first place ???

@influent

Set up a trace and a perfmon with this 3 counters CPU time,page faults and PLE value and see if when any high value of CPU time and page faults coincide with PLE value becoming 0.

PBUH

influent
Constraint Violating Yak Guru

367 Posts

Posted - 2011-11-07 : 16:50:27

I tried to run the trace at noon but the file filled up too fast so I didn't capture anything useful. I did notice another person seems to have had a similar issue with CHECKDB and PLE: http://www.sqlservercentral.com/Forums/Topic891578-146-1.aspx#bm891598

I'm removing CHECKDB from the maintenance plan for tonight to see how that goes. I guess I'm not even sure how CHECKDB helps me, will the job fail if it finds a problem?

From this post it sounds like I won't be able to get a reliable estimate of how much space CHECKDB will use in tempdb.
http://www.sqlskills.com/BLOGS/PAUL/post/How-does-DBCC-CHECKDB-WITH-ESTIMATEONLY-work.aspx

BTW, my "missing index" query only suggests two indexes, neither of which is likely to be very helpful.

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2011-11-07 : 16:52:31

quote:
Originally posted by Sachin.Nand

quote:
Originally posted by tkizer

... The PLE plummeting has a high probability of a large scan due to a missing index...

Isn't that what I had said in the first place ???

Who said that you didn't? Your post was very short, so we are trying to elaborate this for the OP.

And furthermore, I said it in the very first reply to the OP. So what is your point?

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2011-11-07 : 16:56:46

quote:
Originally posted by influent

I'm removing CHECKDB from the maintenance plan for tonight to see how that goes. I guess I'm not even sure how CHECKDB helps me, will the job fail if it finds a problem?

If there is data corruption, yes it will fail and a DBA should look into it immediately. This type of job should page.

quote:
Originally posted by influent

BTW, my "missing index" query only suggests two indexes, neither of which is likely to be very helpful.

What value do these two have for the "Impact" column?

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

influent
Constraint Violating Yak Guru

367 Posts

Posted - 2011-11-07 : 18:14:01

6% and 42%. The one with 42% is a narrow index but it's for a table with more writes than reads, and the table has 3 non-clustered indexes already. There was already a similar index to the suggested one, it was just missing one column, so I updated the old index with that column and will see how it goes. Anyway I would be shocked if these tables had anything to do with the PLE drops.

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2011-11-07 : 18:18:06

I think you are using a different "missing indexes" report than me. Impact column is a derived column. If any are over 10 million, you should consider adding them. And if they are waaaayyyy over 10 million, then definitely add them. I had one at 6 billion before I I added it, and boy did it make a huge performance difference when I finally added it (for the better of course).

http://sqlserverpedia.com/wiki/Find_Missing_Indexes

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

influent
Constraint Violating Yak Guru

367 Posts

Posted - 2011-11-07 : 19:23:56

Doh! I think we're looking at the same query (effectively) but I was looking at the Avg User Impact column rather than the Total Cost column, which I assume is equivalent to the Impact column. Also I was only returning TOP 2 by accident. Now I get a much bigger list. One index is at 820 million and the other is at 181 million, for Total Cost. I will give those a shot tomorrow after I see what effect my removal of CHECKDB has. Thanks!

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2011-11-07 : 19:27:10

I am referring to this column: (avg_total_user_cost * avg_user_impact) * (user_seeks + user_scans) AS Impact

I throw the entire thing into a derived table and then use Impact > 10000000 so that I don't have to be bothered with the output of the the stuff that doesn't matter (yet). I do look at the rest of the output too, just not initially.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

influent
Constraint Violating Yak Guru

367 Posts

Posted - 2011-11-08 : 16:53:54

At this point I'm certain that CHECKDB is what was causing the PLE drops. Should I put it back in the maintenance plan since the PLE drops don't seem to be causing any problems?

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2011-11-08 : 17:24:55

Well it depends. The best practice is to not run CheckDb on production and instead do a scheduled backup/restore to a test machine and do CheckDb there. That's for the user databases at least. You would do the CheckDb on production for the system databases.

If you don't have an automated backup/restore test system for CheckDb of user databases, then you definitely need to run CheckDb in production for those user databases.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2011-11-08 : 17:27:29

I'm going to ask Paul Randal about the PLE and CHECKDB. He wrote it, so he'll know what's going on.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2011-11-08 : 17:28:55

Nevermind, I just did a google search on the topic and yes it is impacted. If it completely plummets, then you probably don't have enough memory to counter just how many pages it is pulling into memory for the CHECKDB process. If it does ever affect performance, consider adding more RAM.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

influent
Constraint Violating Yak Guru

367 Posts

Posted - 2011-11-08 : 19:05:45

I have a nightly process that backs up the primary user databases, restores them with a different name (on the prod server), scrubs a few tables for testers, then backs up the scrubbed databases. Am I correct in assuming that if I were to set up a maintenance plan that runs CHECKDB against the scrubbed databases, that would still fill up memory and force the PLE to drop?

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2011-11-08 : 19:11:04

I replied with yes initially, but I need someone else like Gail Shaw to answer this.

Memory is shared on an instance, but each database has its own pool. I am not sure if pages read into memory for CHECKDB of db1 would affect what pages are in memory for db2, assuming the db2 pages are "old".

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

Next Page

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources