Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2012 Forums
 Transact-SQL (2012)
 Performance Help with query -144 million row table

Author  Topic 

sarora05
Starting Member

5 Posts

Posted - 2015-04-30 : 12:12:31
I need help with my query. This table has 144 million rows and this is a stage table ( where we insert data from another table). I didn't have indexes before and the job had been running for more than 9 hours. I added the non clustered index to this table with multiple columns ( AdvertiserName,MediaPlanName,MediaPlanNumber, CreativeDescription) because this combination makes it unique. But even now execution plan shows Table Scan and not non-clustered index scan and no improvement with performance.

This is the query below, and it is taking very long time to summarize data into csv file using SSIS. How can I improve performance on this query? Please help!! It is taking very long time to run. :(
SELECT
AdvertiserName,
AdvertiserID,
MediaPlanNumber,
MediaPlanName,
PublishingSiteName,
SiteName,
Week_Begin_Monday,
CreativeDescription,
SUM(CAST(ViewCount AS BIGINT)) ViewCount,
SUM(CAST(ClickCount AS BIGINT)) ClickCount,
Media,
Segment_Name,
Segment_CD,
Group_Name,
Group_CD,
Channel,
LOB,
Creative_Message,
Creative_Category,
Creative_Type,
SUM(GRP) GRP,
Intended_Delivery_Screen

FROM Stage_MM240(NOLOCK)
GROUP BY AdvertiserName,
AdvertiserID,
MediaPlanNumber,
MediaPlanName,
PublishingSiteName,
SiteName,
Week_Begin_Monday,
CreativeDescription,
Media,
Segment_Name,
Segment_CD,
Group_Name,
Group_CD,
Channel,
LOB,
Creative_Message,
Creative_Category,
Creative_Type,
Intended_Delivery_Screen

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2015-04-30 : 12:21:46
There isn't much that can be done since you are summarizing the entire table. Why isn't there a where clause to filter it? What is the query waiting on?

Tara Kizer
SQL Server MVP since 2007
http://weblogs.sqlteam.com/tarad/
Go to Top of Page

sarora05
Starting Member

5 Posts

Posted - 2015-04-30 : 12:34:17
Hello Tara, Thank you for responding! Its a big feed and I already was doing a filter prior to inserting the data into this table. So I need to summarize it. I stopped the job and now I am trying to improve the performance on this query before I run it again. Maybe adding different index? Not sure what to do. Thanks!!
Go to Top of Page

gbritton
Master Smack Fu Yak Hacker

2780 Posts

Posted - 2015-04-30 : 12:47:05
Adding an index will have no effect since there are no JOINs or WHEREs. As Tara said, you are summarizing the entire table. So, the whole table has to be read. You'll just have to wait for it to finish.
Go to Top of Page

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2015-04-30 : 12:59:49
But do check what it is waiting on to see if that can be resolved.

Tara Kizer
SQL Server MVP since 2007
http://weblogs.sqlteam.com/tarad/
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2015-04-30 : 14:20:28
I was reading something today that said that HEAP might perform better where the intention is to SCAN the whole table. SO might be worth dropping all indexes, including (and in particular!) any Clustered Index.

I'd hate to be wrong though ... as recreating a clustered index on 144M rows would be ugly

Anything else running on that server?

Anything else querying that table ? (Scary that you have NOLOCK ... if its not being accessed by anyone else then get rid of it? It is NOT a performance enhancement bandaid!!)

If nothing else querying that table - i.e. its basically a Data Warehouse - should Snapshots be turned off? or the database set to READ_ONLY? (Would the DB being Read Only actually provide any useful performance "Hints" to SQL? Its not something I've tried)

If there is anything else running on that server (I mean something like a non-SQL Service) then perhaps SQL is configured to use all Memory and it is thrashing with whatever else is running and needs memory? (Might be that SQL is configured to use ALL Memory, and not even reserve any for O/S. I suppose if that is the case then dropping the MAX Memory could help.

What about TEMPDB? Is that on a separate drive to the Database? Configured to be a reasonable size, rather than auto-growing? I presume this query will use acres of TEMPDB, but its only a guess - maybe not? Can you pre-create TEMPDB to have minimum number of VLFs - or is that just "regular" DBs?

I seem to remember Tara mentioning having multiple files for TEMPDB ... and then I have a vague recollection of that approach going out of fashion! - what's the current Best Practice on that?

How about?? running this on, say, 10% of the data to analyse how much resources / TEMPDB / etc that uses, and then maybe use that information to best-guess how to configure for the full-fat run?
Go to Top of Page

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2015-04-30 : 14:51:07
Multiple data files for tempdb is still the recommended best practice. If there's a tempdb bottleneck, we'll see it in the waits. So what this query is waiting on while executing is important. Not just a one-time look as it can be waiting on different things at different times. We care what it's waiting on the most. Get WhoIsActive setup and collecting the data every 30 seconds or so.

Tara Kizer
SQL Server MVP since 2007
http://weblogs.sqlteam.com/tarad/
Go to Top of Page

jackv
Master Smack Fu Yak Hacker

2179 Posts

Posted - 2015-05-01 : 01:38:15
When you were checking the the execution plan , what did the plan show the SUM parts of th query ?
What is the difference in run time between just running the query on its own and then running the query as part of the import to csv?

Jack Vamvas
--------------------
http://www.sqlserver-dba.com
Go to Top of Page
   

- Advertisement -