Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 Random SQL Timeout Errors

Author  Topic 

flyer299
Starting Member

6 Posts

Posted - 2010-05-26 : 12:09:46
Been tracking down a tough problem. Normally I am very good about optimizing my SPROCS to run very fast.

In this case, I have one procedure that is very optimized. I've looked at it every way possible with execution plans, stats, etc. When I run it in development and production it takes 0-1 seconds to run.

When the ASP.net Front end runs it, it runs very fast. Then out of the blue it starts to time out. My logs and Monitors indicates it now takes 30-45 seconds to run.

Here is the kicker. Once it times out, it will ALWAYS Time out. Until I make a change to the procedure... Then change it back. Then it runs for a day or two, then back we are. So I re exec the proc and it fixes it.

When we are in the mode of it timing out, I open SQL Manager Studio and exec the SPROC in a query window. It runs fast... 0-1 seconds. Looking at the execution plan, everything is perfect. I go tot he website and run the same procedure and it times out. I do an IISReset, it still times outs.

I belive that SQL is doing something in the background that isn't completing.

So far I've checked and done:
- Auto Stats On
- Aysnch Stats ON (thought this would fix it)
- Auto Shrink Off
- Auto Grow ON (and I've padded some extra empty space in the Data and Log file to make sure)

I am pulling my hair out on this one!

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2010-05-26 : 12:45:41
It's due to a bad plan in cache. To fix it, you just need to recompile the stored procedure as that clears the plans in cache.

The reason why it is running fast for you in SSMS and not fast from the website is because you are getting a different plan. In order to mimic the website call, you'll need to use the same exact parameters plus the same options (SET commands for the session).

To get around a bad plan from happening in the first place, you will likely need to break the stored procedure up into multiple stored procedures. But we'd have to see the code in order to say for sure what needs to be done. Even some pseudo code would help us.

I've got a recent blog regarding how to troubleshoot timeouts that you might be interested in: http://weblogs.sqlteam.com/tarad/archive/2010/03/08/Troubleshooting-Application-Timeouts-in-SQL-Server.aspx

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog
Go to Top of Page

flyer299
Starting Member

6 Posts

Posted - 2010-05-26 : 13:18:18
Here is the Code:

USE [MMS]
GO
/****** Object: StoredProcedure [dbo].[jjMMS_Report_WorkOrder_GetEquipment] Script Date: 05/26/2010 10:25:30 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER proc [dbo].[jjMMS_Report_WorkOrder_GetEquipment]
@strXML text,
@intJoBsite_id int
AS

declare @hDoc int
exec sp_xml_preparedocument @hDoc OUTPUT, @strXML


Select
*
into
#tmp2
From
OpenXMl(@hDoc,'/report/wo') with (intWorkOrder_id int) doc

Select
wte.intWorkTask_Equipment_id, wt.intWorkTask_id, strWorkTask_Equipment_subject, txtWorkTask_Equipment_details, bitWorkTask_Equipment_showMeter,
e.intEquipment_id, strEquipment_name, strEquipment_id, datEquipment_installed, datEquipment_warrantyExpire, strEquipment_serialnum, strEquipment_manufacture, strEquipment_model, strEquipment_location, strEquipment_govID,
r.strRoom_name, r.txtRoom_details,
--rd.strDepartment_name strRoomDepartment,
ed.strDepartment_name strEquipmentDepartment,
rtyp.strPickList_Display strRoomType_display, rtyp.strPickList_value strRoomType_value,
wtyp.strPickList_Display strWorkType_display, wtyp.strPickList_value strWorkType_value,
etyp.strPickList_Display strEquipmentType_display, etyp.strPickList_value strEquipmentType_value,
strFacility_name, strFacility_id, txtEquipment_details, strWorkTask_Equipment_govCode,
(select datLabor_endTime from jjMMS_PM_LastWorkForEquipment lwork where lwork.intJobsite_id = @intJobsite_id and lwork.intPM_id = wo.intPM_id and lwork.intequipment_id = e.intequipment_id) datLabor_endTime
Into
#tmp
From
#tmp2 doc
inner join WorkOrder wo on wo.intWorkOrder_id = doc.intWorkOrder_id
inner join WorkTask wt on Wo.intWorkOrder_id = wt.intWorkOrder_id
inner join WorkTask_Equipment wte on wte.intWorkTask_id = wt.intWorkTask_id
inner join Equipment e on e.intJobsite_id = @intJobsite_id and e.intEquipment_id = wte.intEquipment_id
inner join PickList wtyp on wtyp.intPickList_id = wte.intWorkTask_Equipment_typePL
inner join PickLIst etyp on etyp.intPickList_id = e.intEquipment_typePL
left outer join Room r on r.intRoom_id = e.intRoom_id
left outer join Department eD on ed.intDepartment_id = e.intDepartment_id
left outer join PickList rTyp on rtyp.intPickList_id = r.intRoom_typePL
left outer join Facility f on f.intFacility_id = r.intFacility_id and f.intJobsite_id = @intJobsite_id

Order BY
e.strEquipment_id, e.intEquipment_id

exec sp_xml_removedocument @hDoc

select
wo.intWorkOrder_id, wt.intWorkTask_id, e.*
from
workorder wo
inner join WorkTask wt on wt.intWorkOrdeR_id =wo.intWorkOrder_id
inner join #tmp e on e.intWorkTask_id = wt.intWorkTask_id
Order BY
strEquipment_id, intEquipment_id
for xml auto

drop table #tmp
drop table #tmp2






Go to Top of Page

flyer299
Starting Member

6 Posts

Posted - 2010-05-26 : 13:20:10
I understand the bad Plan in Cache... My question is, why do I recompile it and get a good plan. Then the good plan works for a couple hours to a couple days... then all of a sudden the plan is bad? What happened to make the plan go from good to bad? I was leaning towards stats being generated. Is that a possibility? Should I manually generate stats instead of auto?
Go to Top of Page

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2010-05-26 : 13:22:02
Yes you should definitely be updating stats manually. You should be doing both. We use the same settings as you plus run sp_updatestats nightly.

The bad plan in cache could be due to out of date stats or a parameter value that SQL Server thinks will perform better with a different plan. If this different plan is used and is not the correct plan most of the time, then it is considered a bad plan.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog
Go to Top of Page

flyer299
Starting Member

6 Posts

Posted - 2010-05-26 : 13:30:49
Funny thing is... I ran sp_updatestats this morning at 6:20 am. Then at 9:00am, the timeout happened. But I think I will schedule an sp_updatestats as you suggest.

Also, you said eailer, "you'll need to use the same exact parameters plus the same options (SET commands for the session)."

When I did this, I cut and pasted the exact SQL command from our monitoring software in SSMS. I can not speak to the Set Command part of it, so that might be why. But I figured with the exact same parameters I would get the same result.

This one just seems strange to me. Like I said, I am normally very good at troubleshooting these things on my own. I am not quite the DBA that a lot of people like yourself are. But I am not a SQL newbie either asking questions like, "What does a select statemtent do?"

Thanks for your feedback so far and any more feedback you can offer. I appreciate it!
Go to Top of Page

webfred
Master Smack Fu Yak Hacker

8781 Posts

Posted - 2010-05-26 : 13:38:27
Maybe a kind of parameter sniffing?
Declare variables and set them to the values of the parameters and then use the variables instead of the parameters in the following commands?

edit: I am not sure about that but I wanted not to conceal my opinion...


No, you're never too old to Yak'n'Roll if you're too young to die.
Go to Top of Page

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2010-05-26 : 13:54:25
quote:
Originally posted by flyer299


Also, you said eailer, "you'll need to use the same exact parameters plus the same options (SET commands for the session)."

When I did this, I cut and pasted the exact SQL command from our monitoring software in SSMS. I can not speak to the Set Command part of it, so that might be why. But I figured with the exact same parameters I would get the same result.



The SET options definitely are needed to get an exact test from SSMS.

You can check the application's SET options by running a SQL Profiler trace. You then can run those same exact commands plus the same exact parameter values and should be able to duplicate the query. Alternatively, you can compare the application's execution plan with what you are seeing in SSMS by adding the Showplan XML event to a trace. I'll bet that they are different.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog
Go to Top of Page

flyer299
Starting Member

6 Posts

Posted - 2010-05-26 : 14:14:03
Next time the error happens... SQL Profiler Trace will be turned on. I was hoping to solve it sooner. I hate to tell my customers... Sorry, Stay down for the 20 minutes it will take me to figure this out. But 20 minutes now will be better than a couple more weeks of this crazyness.

Thanks again for the feedback. When I get some more information, I will post it.
Go to Top of Page

flyer299
Starting Member

6 Posts

Posted - 2010-06-04 : 09:45:17
The issue still come back... Last time it came back, I enabled SQL profiler to do a trace. It really didn't tell me too much. I tracked it down to see the SPROC Running and I can see the Proc time out.

I ran exec SP_Recompile [Proc Name] and the problem went away.

I still can not figure out why out of 200+ Sprocs in the system, tons of activity, this is the ONLY one that chooses to crap out on a cached execution plan.

I can schedule a Recompile for once an hour or so, but that just doesn't seem like a good fix.

I am doing a Database Wide Update Stats once a night now. The error comming up does not happen anywhere near when the Update Stats happen. Update Stats is a 2 am, error happened at 8:45am. It ran with out problems MANY times between 2am and 8:45 am.

Would rebuilding all the indexes help? Maybe they are getting defraged?



Go to Top of Page

lazerath
Constraint Violating Yak Guru

343 Posts

Posted - 2010-06-04 : 12:24:44
You could put the WITH RECOMPILE option on the sproc, granted it will recompile every time, but at least it's the lessor of two evils.

If you recompiled every hour, you could still get a timeout if you got a bad plan during that hour and it would stay broken until the job ran.
Go to Top of Page

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2010-06-04 : 12:31:27
There is a "oddity" with how statistcs work with SQL. One would expect that having the AUTO on will keep statisics up to date. But, this not nessecarily the case. I'm not intemently familar with the issue, but you might want to look into using the norecompute option..?
Go to Top of Page
   

- Advertisement -