Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Administration
 CXPACKET waits

Author  Topic 

MariusD
Starting Member

7 Posts

Posted - 2012-06-13 : 13:17:06
I am trying to troubleshoot erratic query run times on this server. I started by noticing that some sp run between < 1 minute and 4-5 hours at a time. Then, I noticed CXPACKET waits. MAXDOP was set to 2, cost threshold for parallelism set to 5. This is a 4 processor, NUMA based server, running both OLAP and OLTP databases (but no high volume transaction databases). I changed the threshold to 25 and ever since all sp's did behave nicely. However, I am still experiencing queries generated by Cognos which get stuck for hours waiting for CXPACKET. Top ten wait statistics, show CXPACKET waits accounting for 40% overall, everything else < 1% but bear in mind, those numbers are mostly for the old setting, maxdop = 2 and thershold = 5. ( 5 days with new settings and 6 months with old).

1) CXPACKET waits are 40% of all waits, seems like that is a problem?
2) Should I start by setting parallelism settings to best practices values, that is, maxdop=4 and threshold = 5 (or should I leave it at 25, since it works for teh sp's)?

Any suggestions are appreciated.. thanks!!!

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2012-06-13 : 13:40:48
Why do you have both OLAP and OLTP databases on the same server? Generally they do better with different MAXDOP settings.

I wouldn't say that MAXDOP=4 is the best practice here. Do you have an article that shows that?

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog
Go to Top of Page

jackv
Master Smack Fu Yak Hacker

2179 Posts

Posted - 2012-06-13 : 14:10:34
CXPPACKET is normal in a multi core environment. The tkizer observation is a good one re: OLTP\DSS . You've switched the maxdop which has solved some problems , but maybe caused others.
Try: 1) identify which queries are running in parallelism and why?
Are they frequent?
2)OLTP normally works better with lower parallelism setting, consider lowering parallelism - having first identified \ adjsuted (?) some queries

Jack Vamvas
--------------------
http://www.sqlserver-dba.com
Go to Top of Page

MariusD
Starting Member

7 Posts

Posted - 2012-06-13 : 14:41:17
quote:
Originally posted by tkizer

Why do you have both OLAP and OLTP databases on the same server? Generally they do better with different MAXDOP settings.
Good question... That's how it was set up long before I started here (3 months ago).

I wouldn't say that MAXDOP=4 is the best practice here. Do you have an article that shows that?
Here's a reply by Jonathan Kehayias to an article of Dave Pinal, where Dave was suggesting setting MAXDOP to 2 for a mixed environment and cost of threshold to 25.

http://blog.sqlauthority.com/2011/02/06/sql-server-cxpacket-parallelism-usual-solution-wait-type-day-6-of-28/

reply:
CXPACKET waits aren’t necessarily bad, or even a sign of a problem, they are normal and expected if queries are executing using parallelism. I wouldn’t recommend that CXPACKET alone ever be considered a reason to reduce or change the configuration of the ‘max degree of parallelism’ configuration option. Most newer servers are going to be NUMA based systems, and the best practice recommendation for this configuration option is to set it to the number of processor cores in a single NUMA node. Generally speaking this best practice is not followed, and most recommendations show changing the ‘max degree of parallelism’ configuration option to 1 or 2 whenever the topic of CXPACKET comes up.

My recommendation, especially on newer NUMA based systems is always to first set it following best practices to the number of physical cores in a single NUMA node and then monitor. You can find the number of schedulers in each NUMA node by querying the online_scheduler_count from sys.dm_os_nodes. Even for data warehouse environments, this should be set initially following this practice unless testing has shown that leaving it at 0 is actually best, which is possible.

After making the change monitor not just wait types, but also for how the system is performing. Look for queries that run under parallelism and test them manually using different levels of DOP using the OPTION(MAXDOP n) query hint to see if reducing parallelism actually improves or harms performance. You might find that reducing it for one query improves performance while the rest of the workload shows a performance decrease from that same tested reduction. In that case putting the query hint in, either as a plan guide for the individual query, or by changing the code if you have access, would yield better returns.

Raising the cost of threshold to 25 has definitely helped, I don't have any more problems with sp's, only once in a while a Cognos-generated query hangs for hours in a CXPACKET wait. I guess I'll wait to see what teh wait statitics show tomorrow, I have just reset them this morning.


Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

Go to Top of Page

MariusD
Starting Member

7 Posts

Posted - 2012-06-13 : 14:44:02
quote:
Originally posted by jackv

CXPPACKET is normal in a multi core environment. The tkizer observation is a good one re: OLTP\DSS . You've switched the maxdop which has solved some problems , but maybe caused others.
No, maxdop is the same, I raised the cost of threshld to 25 and it has helped, but it has not solved the problem entirely.

Try: 1) identify which queries are running in parallelism and why?
Are they frequent?
No, they are not frequent any more, I only caught one yesterday. Identifying them is almost impossible, as those are Cognos-generated (users drag and drop columns in Cognos)

2)OLTP normally works better with lower parallelism setting, consider lowering parallelism - having first identified \ adjsuted (?) some queries

Jack Vamvas
--------------------
http://www.sqlserver-dba.com

Go to Top of Page

jackv
Master Smack Fu Yak Hacker

2179 Posts

Posted - 2012-06-14 : 01:37:30
Marius , analyse the cached plans for queries using parallelism

Jack Vamvas
--------------------
http://www.sqlserver-dba.com
Go to Top of Page

MariusD
Starting Member

7 Posts

Posted - 2012-06-14 : 11:43:07
quote:
Originally posted by jackv

Marius , analyse the cached plans for queries using parallelism

Jack Vamvas
--------------------
http://www.sqlserver-dba.com



Yes, thank you, you are right, but this is very hard to do with Cognos generated queries.
I ran the wait statistics query again today and CXPACKET waits amount to 40% of all waits, once again. Since I have not seen anything hung for hours waiting for CXPACKET, I'm starting to think this is not the problem. I will continue to test different queries with different options (hints) for parallelism. Thank you!
Go to Top of Page

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2012-06-14 : 12:45:00
Have you noticed any blocking when the erratic query times occur?

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog
Go to Top of Page

MariusD
Starting Member

7 Posts

Posted - 2012-06-15 : 11:11:17
No, I didn't, but I wasn't looking. It only happened once since I raised the threshold. I saw this Cognos query stuck waiting for CXPACKET for hours. So you are saying it could be that the thread it was waiting for was blocked by something else? In this case, Activity monitor would still show a CXPACKET wait and not a block by something else? I was looking at Activity monitor, at some point in time I believe this was the only process blocked by itself (that's what "blocked by" was showing). So I thought this a parallelism issue and didn't look any further.
Go to Top of Page

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2012-06-15 : 12:49:16
Are you sure the wait type didn't change for those hours it was running? Even if there's a bottleneck, it should change.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog
Go to Top of Page

MariusD
Starting Member

7 Posts

Posted - 2012-06-16 : 09:33:24
I can't be sure about that, I watched it for about 1.5 hours and it did not change.
Go to Top of Page

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2012-06-16 : 17:55:54
Can the Cognos query be changed to use the MAXDOP option? I really no nothing about Cognos.

Also, I'd recommend separating your OLAP and OLTP onto two different servers. They need two different MAXDOP settings. I wouldn't even just use two instances on the same server, I'd use two different servers.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog
Go to Top of Page

jackv
Master Smack Fu Yak Hacker

2179 Posts

Posted - 2012-06-17 : 02:06:57
MariusD - When analysing the statements - try to isolate just the Cognos statements. There must be a way via logon , or application nam eto isolate and identify on a SQL Server Trace.

Jack Vamvas
--------------------
http://www.sqlserver-dba.com
Go to Top of Page

MariusD
Starting Member

7 Posts

Posted - 2012-06-18 : 10:29:37
Yes, you are both right - I should try and track the Cognos queries only, I believe I can do that via user code. I am working on setting up a monitoring routine right now. Unfortunately, I cannot separate the 2 databases - the one that's constantly updated is too small to justify a separate server, and everything would have to be re-written for the data warehouse data imports/ updates (SSIS + Framework, about 25 packlages). Today it's been > week with the new threshold setting and other than that 1 query, everything has worked very well (roughly 800 SSIS steps daily, we track all execution times). I've seen other Cognos queries waiting for CXPACKET, but for a reasonable amount of time (minutes). Thanks again for your help!
Go to Top of Page

jackv
Master Smack Fu Yak Hacker

2179 Posts

Posted - 2012-06-18 : 10:44:21
Your welcome

Jack Vamvas
--------------------
http://www.sqlserver-dba.com
Go to Top of Page
   

- Advertisement -