Interesting Load Testing / Performance Problem

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2008 Forums

SQL Server Administration (2008)

Interesting Load Testing / Performance Problem

Author

Topic

ferrethouse
Constraint Violating Yak Guru

352 Posts

Posted - 2012-09-26 : 17:29:27

I'm load testing one of our reports through the application. At 1 concurrent user the report averages a load time of 1.5 seconds. At 10 concurrent users the load time averages 30 seconds. I have enabled performance monitor metrics on both the web server and the database server and neither indicate ANY CPU, memory, or IO contention. However, when I look at the database waits I do see a spike in CPU/Memory related waits. But the CPU on the database server doesn't exceed 3% during the course of the load test. So why do my CPU waits spike when the CPU itself doesn't seem to be doing much?

robvolk
Most Valuable Yak

15732 Posts

Posted - 2012-09-26 : 17:56:11

What are the actual waits that you see? If you're querying sys.dm_os_wait_stats, are you clearing between tests with DBCC SQLPERF("sys.dm_os_wait_stats",CLEAR) to prevent accumulated waits from skewing your results?

ferrethouse
Constraint Violating Yak Guru

352 Posts

Posted - 2012-09-26 : 18:54:05

quote:
Originally posted by robvolk

What are the actual waits that you see? If you're querying sys.dm_os_wait_stats, are you clearing between tests with DBCC SQLPERF("sys.dm_os_wait_stats",CLEAR) to prevent accumulated waits from skewing your results?

I'm using Confio Ignite which just reports high "CPU Waits". I cleared dm_os_wait_stats, ran the load test, then queried it and got this...


BROKER_TASK_STOP	114	102001	5000	1
FT_IFTS_SCHEDULER_IDLE_WAIT	1	60000	60000	0
XE_TIMER_EVENT	2	60000	30000	60000
REQUEST_FOR_DEADLOCK_SEARCH	11	55000	5000	55000
DBMIRROR_EVENTS_QUEUE	104	52001	749	0
SQLTRACE_INCREMENTAL_FLUSH_SLEEP	13	52000	4000	0
LAZYWRITER_SLEEP	52	52000	1000	0
BROKER_TO_FLUSH	25	25600	1024	0
SLEEP_TASK	65	25600	1024	0
MSQL_XP	80	301	53	0
PREEMPTIVE_OS_GETPROCADDRESS	80	301	53	0
ASYNC_NETWORK_IO	67	98	27	0
PREEMPTIVE_OS_WAITFORSINGLEOBJECT	8	46	27	0
PREEMPTIVE_OLE_UNINIT	10	4	1	0
OLEDB	8971	2	1	0

I'm no expert but it looks normal. I'm just not sure how to improve the scalability of this report. There are no reported missing indexes.

robvolk
Most Valuable Yak

15732 Posts

Posted - 2012-09-26 : 20:11:38

Is this the only thing indicating CPU or memory waits? Because none of those results point to that conclusion.

Have you looked at the query plan and statistics IO for the report you're testing? You'll want to look for scans with high rowcounts compared to the resulting rowcount, and high physical reads.

Can you post the query and the plan? XML plan is fine.

If you are testing a web report, does the web UI render the report as HTML? It's very likely the rendering is taking most of your time. You can verify that by running the same query in Management Studio and measuring its performance.

ferrethouse
Constraint Violating Yak Guru

352 Posts

Posted - 2012-09-26 : 23:13:56

quote:
Originally posted by robvolk

Is this the only thing indicating CPU or memory waits? Because none of those results point to that conclusion.

Have you looked at the query plan and statistics IO for the report you're testing? You'll want to look for scans with high rowcounts compared to the resulting rowcount, and high physical reads.

Can you post the query and the plan? XML plan is fine.

If you are testing a web report, does the web UI render the report as HTML? It's very likely the rendering is taking most of your time. You can verify that by running the same query in Management Studio and measuring its performance.

Hi Rob,

I'll get you the plan and query in the morning. I'm using JMeter so I wouldn't think rendering would come into play. It just looks at response time. There are no recommended indexes for the database which I would expect if there were scans as my load testing is pretty much the only thing happening to this database. I forget the perfmon metric's name but one of them related to CPU queue and it does go to "1" periodically during the test but that doesn't strike me as high.

The report runs quickly with only 1 user (1.5 seconds) so it is definitely a load problem. I just don't see it in any of the metrics I'm collecting. Maybe I'm not collecting the right ones.

I'm running this on pretty powerful AWS instances. But I also ran the same test again dedicated Rackspace hardware and got similar results.

ferrethouse
Constraint Violating Yak Guru

352 Posts

Posted - 2012-09-27 : 15:46:19

I turned on profiler and the report actually results in about 20 different queries. I ran each of them in SSMS looking for scans and recommended indexes and didn't see any. In the profiler the highest "duration" value is 6 which seems low to me. I can post the queries and their plans if you still think they may be of value but it will be a huge post.

Thanks!

robvolk
Most Valuable Yak

15732 Posts

Posted - 2012-09-27 : 16:02:07

If you're seeing durations of 6 milliseconds then it's not likely to be a query issue. It can't hurt to post queries and plans though, just in case they can still be tweaked.

I'm not familiar with JMeter, but from what I'm reading about it I'd say Java and JDBC may have some influence on the performance (perhaps unfairly, but both have slow reputations, especially against SQL Server). If all of your versions and drivers are up to date then it may be a configuration issue.

You're using AWS, where are the SQL and web servers located in relation to each other? Check the network between them. With all these parts on top of SQL Server there's still the application and how it interacts with SQL Server. JDBC does (or use to) a lot of client-side cursoring, that will kill performance on even fast queries, and if there's a long network path between them (especially AWS to local server) you've got lots of places for delays to accumulate.

ferrethouse
Constraint Violating Yak Guru

352 Posts

Posted - 2012-09-27 : 16:56:05

quote:
Originally posted by robvolk
You're using AWS, where are the SQL and web servers located in relation to each other? Check the network between them.

They are in a VPC in the same availability zone. Both servers are on 1000 IOPS optimized servers and all of the drives on both servers are 1000 IOPS EBS. There are 4 separate drives on the SQL box (OS, data file, tlogs, tempdb). So I think everything is setup optimally. I'll try more perfmon metrics. Maybe I can find a metric that is spiking which will identify the bottleneck.

Thanks for your help Rob. As always your feedback is valuable.

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources