Author |
Topic |
influent
Constraint Violating Yak Guru
367 Posts |
Posted - 2011-10-27 : 18:41:55
|
Every day on my production server (which is a beast with 64 GB of RAM and 64 cores) the page life expectancy suddenly drops to zero twice per day. It gradually goes up to around 60,000 before dropping, shortly after noon and shortly after midnight. My full backups are around these times, which makes me think the maintenance plan is somehow responsible. Does the Check Database Integrity Task clean out the buffer pool or something? |
|
tkizer
Almighty SQL Goddess
38200 Posts |
|
influent
Constraint Violating Yak Guru
367 Posts |
Posted - 2011-11-03 : 22:02:21
|
Actually not all the backups run at these times, a couple start at 10 PM. Also the maintenance plans only create full and transaction log backups, they don't do any sort of index maintenance. I'll try profiling it soon. What template should I use? |
|
|
Sachin.Nand
2937 Posts |
Posted - 2011-11-04 : 00:21:21
|
quote: Originally posted by influent Actually not all the backups run at these times, a couple start at 10 PM. Also the maintenance plans only create full and transaction log backups, they don't do any sort of index maintenance. I'll try profiling it soon. What template should I use?
Is there any other job scheduled at the same time which might be scanning a huge table ?PBUH |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2011-11-04 : 13:56:46
|
quote: Originally posted by influent Actually not all the backups run at these times, a couple start at 10 PM. Also the maintenance plans only create full and transaction log backups, they don't do any sort of index maintenance. I'll try profiling it soon. What template should I use?
I don't use templates. If your system is using pure stored procedures for data access, then RPC:Completed is enough. If you've got prepared statements, then add SP:StmtCompleted. Also add in SP:Completed just in case. If you've got adhoc stuff, add SQL:BatchCompleted and SQL:StmtCompleted. You will likely have some duplicates in the trace with these, but that's okay. The goal is to capture the problem.You'll be looking for very high reads. Be sure to save the trace to a file to avoid causing performance issues on production, and also run the trace from a different machine. The best possible way to trace is through a server-side trace, but it doesn't sound like you have enough experience to dip your hands into that.Once you've captured the timeframe where the PLE plummets, load it into a table (you can use profiler with save as...) and then query it using:SELECT TOP 1000 Reads, TextData, Duration/1000.0 AS DurationInMsFROM YourTableORDER BY Reads DESCNow you've got to do some analysis starting with the first row in the result set.Tara KizerMicrosoft MVP for Windows Server System - SQL Serverhttp://weblogs.sqlteam.com/tarad/Subscribe to my blog |
|
|
influent
Constraint Violating Yak Guru
367 Posts |
Posted - 2011-11-05 : 01:57:32
|
quote: The best possible way to trace is through a server-side trace, but it doesn't sound like you have enough experience to dip your hands into that.
Ouch! I shed a tear when I read that. I'm really not very intelligent though so I forgive you. Actually I've run server-side traces several times and that's what I was planning on doing, I'll schedule one for tomorrow night if I have time. I still don't understand why a server with way more RAM than it needs and several active databases would lose all of its PLE due to high reads, which is why I was wondering what template I wanted. I wasn't sure if I was looking for lock escalations or something weird. Doesn't the PLE dropping to zero mean that my entire buffer pool has been suddenly wiped out? |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2011-11-05 : 02:40:52
|
I meant no offense of course. You shouldn't need anything special in the trace to find the culprit here."Way more RAM than it needs" isn't actually the case if your PLE is plummeting. Well sort of. The PLE plummeting has a high probability of a large scan due to a missing index. Have you run the "missing indexes" report to see what SQL Server thinks you are missing? That report plus very high reads in the trace should get you going on finding the root cause here.Tara KizerMicrosoft MVP for Windows Server System - SQL Serverhttp://weblogs.sqlteam.com/tarad/Subscribe to my blog |
|
|
Sachin.Nand
2937 Posts |
Posted - 2011-11-06 : 01:13:39
|
quote: Originally posted by tkizer ... The PLE plummeting has a high probability of a large scan due to a missing index...
Hmmm.....Isn't that what I had said in the first place ???@influentSet up a trace and a perfmon with this 3 counters CPU time,page faults and PLE value and see if when any high value of CPU time and page faults coincide with PLE value becoming 0.PBUH |
|
|
influent
Constraint Violating Yak Guru
367 Posts |
Posted - 2011-11-07 : 16:50:27
|
I tried to run the trace at noon but the file filled up too fast so I didn't capture anything useful. I did notice another person seems to have had a similar issue with CHECKDB and PLE: http://www.sqlservercentral.com/Forums/Topic891578-146-1.aspx#bm891598I'm removing CHECKDB from the maintenance plan for tonight to see how that goes. I guess I'm not even sure how CHECKDB helps me, will the job fail if it finds a problem? From this post it sounds like I won't be able to get a reliable estimate of how much space CHECKDB will use in tempdb. http://www.sqlskills.com/BLOGS/PAUL/post/How-does-DBCC-CHECKDB-WITH-ESTIMATEONLY-work.aspxBTW, my "missing index" query only suggests two indexes, neither of which is likely to be very helpful. |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2011-11-07 : 16:52:31
|
quote: Originally posted by Sachin.Nand
quote: Originally posted by tkizer ... The PLE plummeting has a high probability of a large scan due to a missing index...
Isn't that what I had said in the first place ???
Who said that you didn't? Your post was very short, so we are trying to elaborate this for the OP.And furthermore, I said it in the very first reply to the OP. So what is your point?Tara KizerMicrosoft MVP for Windows Server System - SQL Serverhttp://weblogs.sqlteam.com/tarad/Subscribe to my blog |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2011-11-07 : 16:56:46
|
quote: Originally posted by influent I'm removing CHECKDB from the maintenance plan for tonight to see how that goes. I guess I'm not even sure how CHECKDB helps me, will the job fail if it finds a problem?
If there is data corruption, yes it will fail and a DBA should look into it immediately. This type of job should page.quote: Originally posted by influent BTW, my "missing index" query only suggests two indexes, neither of which is likely to be very helpful.
What value do these two have for the "Impact" column?Tara KizerMicrosoft MVP for Windows Server System - SQL Serverhttp://weblogs.sqlteam.com/tarad/Subscribe to my blog |
|
|
influent
Constraint Violating Yak Guru
367 Posts |
Posted - 2011-11-07 : 18:14:01
|
6% and 42%. The one with 42% is a narrow index but it's for a table with more writes than reads, and the table has 3 non-clustered indexes already. There was already a similar index to the suggested one, it was just missing one column, so I updated the old index with that column and will see how it goes. Anyway I would be shocked if these tables had anything to do with the PLE drops. |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
|
influent
Constraint Violating Yak Guru
367 Posts |
Posted - 2011-11-07 : 19:23:56
|
Doh! I think we're looking at the same query (effectively) but I was looking at the Avg User Impact column rather than the Total Cost column, which I assume is equivalent to the Impact column. Also I was only returning TOP 2 by accident. Now I get a much bigger list. One index is at 820 million and the other is at 181 million, for Total Cost. I will give those a shot tomorrow after I see what effect my removal of CHECKDB has. Thanks! |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
|
influent
Constraint Violating Yak Guru
367 Posts |
Posted - 2011-11-08 : 16:53:54
|
At this point I'm certain that CHECKDB is what was causing the PLE drops. Should I put it back in the maintenance plan since the PLE drops don't seem to be causing any problems? |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2011-11-08 : 17:24:55
|
Well it depends. The best practice is to not run CheckDb on production and instead do a scheduled backup/restore to a test machine and do CheckDb there. That's for the user databases at least. You would do the CheckDb on production for the system databases.If you don't have an automated backup/restore test system for CheckDb of user databases, then you definitely need to run CheckDb in production for those user databases.Tara KizerMicrosoft MVP for Windows Server System - SQL Serverhttp://weblogs.sqlteam.com/tarad/Subscribe to my blog |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
|
tkizer
Almighty SQL Goddess
38200 Posts |
|
influent
Constraint Violating Yak Guru
367 Posts |
Posted - 2011-11-08 : 19:05:45
|
I have a nightly process that backs up the primary user databases, restores them with a different name (on the prod server), scrubs a few tables for testers, then backs up the scrubbed databases. Am I correct in assuming that if I were to set up a maintenance plan that runs CHECKDB against the scrubbed databases, that would still fill up memory and force the PLE to drop? |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
|
Previous Page&nsp;
Next Page
|