Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
laailalalaa
Yak Posting Veteran
57 Posts |
Posted - 2011-04-04 : 14:30:02
|
i have a table T with (simplified)schema: T(Type, Interval, a, b, c, d). all the fields have int type. there is no primary key and no index set on the table and it has about 13 million records (the actual table contains 9 more int fields).i wrote a query that counts how many distinct d values appear for each triple (a,b,c), with Type = typeValue and Interval between I1 and I2 like this:SELECT a,b,c, COUNT(d) AS NoOfDValues FROM ( SELECT a, b, c, d FROM T WHERE Interval BETWEEN I1 AND I2 AND Type = typeValue GROUP BY a,b,c,d ) AS Q1 GROUP BY Q1.a, Q1.b, Q1.c for the data in my table, this query is executed in about 35 seconds. and i need to run it thousands of time from my app.how can i optimize both my table (only with indexes) and my query? thank you |
|
ajthepoolman
Constraint Violating Yak Guru
384 Posts |
Posted - 2011-04-04 : 14:40:48
|
One piece of optimization I would recommend is to make the select statement a stored procedure. I don't know how you are calling the database from you app, but don't hard code the select statement in your app source code. Instead, delcare a stored procedure and put the select in that. Then call the stored procedure from your app.Hey, it compiles. |
|
|
laailalalaa
Yak Posting Veteran
57 Posts |
Posted - 2011-04-04 : 14:46:04
|
it's already a stored procedure.actually i was wondering whether the query could be rewritten in a more efficient manner and/or what are the best indexes to create on the table (i tried some, but the execution time is roughly the same). |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2011-04-04 : 14:47:00
|
There's also:SELECT a, b, c, Count(DISTINCT d) FROM TWHERE Interval BETWEEN I1 AND I2 AND Type = typeValueGROUP BY a,b,c |
|
|
Seventhnight
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2011-04-04 : 14:48:09
|
Create an index on (a, b, c, d)and/orCreate an index on (Type, Interval, a, b, c, d)or possiblyCreate an index on (a, b, c)and/orCreate an index on (Type, Interval, a, b, c)What have you tried?CoreyI Has Returned!! |
|
|
laailalalaa
Yak Posting Veteran
57 Posts |
Posted - 2011-04-04 : 14:50:52
|
thanks, robvolk, it does look nicer, but the exec time is the same. i think the key to the problem lies in defining proper indexes. anyone good at this?thank uquote: Originally posted by robvolk There's also:SELECT a, b, c, Count(DISTINCT d) FROM TWHERE Interval BETWEEN I1 AND I2 AND Type = typeValueGROUP BY a,b,c
|
|
|
Sachin.Nand
2937 Posts |
Posted - 2011-04-04 : 15:20:58
|
If you are going to use robovolk query create a single non clustered covering index on I1,I2,Type which includes columns(a,b,c,d)PBUH |
|
|
laailalalaa
Yak Posting Veteran
57 Posts |
Posted - 2011-04-04 : 15:29:13
|
thanks, seventhnight,i got the following results:Clustered on (a,b,c,d) - 15 secondsClustered on (Type, Interval, a, b, c, d) - 45 secsNonclustered on (a,b,c,d) and nonclust on (Type, Interval, a,b,c,d) - 35 secs... other combinations give over 40 seconds exec time...(i have tried much more rudimentary indexes than yours)i guess this query cannot be executed in an under 2 or 3 secs time. :-?quote: Originally posted by Seventhnight Create an index on (a, b, c, d)and/orCreate an index on (Type, Interval, a, b, c, d)or possiblyCreate an index on (a, b, c)and/orCreate an index on (Type, Interval, a, b, c)What have you tried?CoreyI Has Returned!!
|
|
|
laailalalaa
Yak Posting Veteran
57 Posts |
Posted - 2011-04-04 : 15:31:55
|
I1 and I2 are set at runtime. i need to run the query for thousands of different values for the pair(I1, I2).quote: Originally posted by Sachin.Nand If you are going to use robovolk query create a single non clustered covering index on I1,I2,Type which includes columns(a,b,c,d)PBUH
|
|
|
Sachin.Nand
2937 Posts |
Posted - 2011-04-04 : 15:34:32
|
Oops sorry...Let me rephrase it.If you are going to use robovolk query create a single non clustered covering index on Interval,Type which includes columns(a,b,c,d)PBUH |
|
|
Seventhnight
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2011-04-04 : 15:39:47
|
Have you tried more than 1 index?how about:Clustered on (a,b,c,d) - 15 secondsandNonclustered on (Type, Interval)or possibly...Clustered on (Type,a,b,c,d) - ????andNonclustered on (Type, Interval)In some cases, I've done individual column indexes to see if SS can figure out its own way to make sense of it...Clustered on (a,b,c,d) - 15 secondsNonclustered on (Type)Nonclustered on (Interval)How specific are Type and Interval. The best bang out of an index is going to be when it allows you to skip the most number of records when processing.Sorry I can't give you a definite answer, but I've always been a bit of a trial and error kinda guy. quote: Originally posted by laailalalaa thanks, seventhnight,i got the following results:Clustered on (a,b,c,d) - 15 secondsClustered on (Type, Interval, a, b, c, d) - 45 secsNonclustered on (a,b,c,d) and nonclust on (Type, Interval, a,b,c,d) - 35 secs... other combinations give over 40 seconds exec time...(i have tried much more rudimentary indexes than yours)i guess this query cannot be executed in an under 2 or 3 secs time. :-?
CoreyI Has Returned!! |
|
|
laailalalaa
Yak Posting Veteran
57 Posts |
Posted - 2011-04-04 : 15:50:06
|
by "a single non clustered covering index on Interval,Type which includes columns(a,b,c,d)" u mean an index that includes columns Interval, Type, a,b,c,d?if so, the query runs in about 12 secs..:-?quote: Originally posted by Sachin.Nand Oops sorry...Let me rephrase it.If you are going to use robovolk query create a single non clustered covering index on Interval,Type which includes columns(a,b,c,d)PBUH
|
|
|
laailalalaa
Yak Posting Veteran
57 Posts |
Posted - 2011-04-04 : 15:53:13
|
update. i actually need to run this query about 50 million times. so this would give a total of about 160 thousand hours. i think i;m just gonna partition my data into different tables based on (a,b,c) values and run the query on partial data.thanks everyone though for your valuable suggestions |
|
|
Seventhnight
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2011-04-04 : 15:56:36
|
how much does the data change? 13 mill rows... running 50million times... doesn't sound like it changes in parallel to the runs.Maybe you could cache some of the data? or build a reporting table with counts for smaller intervals that won't change?CoreyI Has Returned!! |
|
|
laailalalaa
Yak Posting Veteran
57 Posts |
Posted - 2011-04-04 : 16:07:13
|
nvm my question. did as u told and exec time is about 10 secs. similar suggestion here: http://social.msdn.microsoft.com/Forums/en/transactsql/thread/35994b3f-d858-4f70-a930-1b0c78b88c2equote: Originally posted by laailalalaa by "a single non clustered covering index on Interval,Type which includes columns(a,b,c,d)" u mean an index that includes columns Interval, Type, a,b,c,d?if so, the query runs in about 12 secs..:-?quote: Originally posted by Sachin.Nand Oops sorry...Let me rephrase it.If you are going to use robovolk query create a single non clustered covering index on Interval,Type which includes columns(a,b,c,d)PBUH
[url][/url][url][/url][url][/url] |
|
|
laailalalaa
Yak Posting Veteran
57 Posts |
Posted - 2011-04-04 : 16:08:51
|
data never changes in my table. all i need to do is select, select, select. dunno how to do caching or reporting tables.. maybe some hints on where to start?quote: Originally posted by Seventhnight how much does the data change? 13 mill rows... running 50million times... doesn't sound like it changes in parallel to the runs.Maybe you could cache some of the data? or build a reporting table with counts for smaller intervals that won't change?CoreyI Has Returned!!
|
|
|
Sachin.Nand
2937 Posts |
Posted - 2011-04-04 : 16:12:22
|
quote: Originally posted by laailalalaa nvm my question. did as u told and exec time is about 10 secs. similar suggestion here: http://social.msdn.microsoft.com/Forums/en/transactsql/thread/35994b3f-d858-4f70-a930-1b0c78b88c2equote: Originally posted by laailalalaa by "a single non clustered covering index on Interval,Type which includes columns(a,b,c,d)" u mean an index that includes columns Interval, Type, a,b,c,d?if so, the query runs in about 12 secs..:-?quote: Originally posted by Sachin.Nand Oops sorry...Let me rephrase it.If you are going to use robovolk query create a single non clustered covering index on Interval,Type which includes columns(a,b,c,d)PBUH
[url][/url][url][/url][url][/url]
Well I suggested the same thing what the optimiser suggested in the query hints..Its simple..Just look at columns in your query and the columns of the indexes I had suggested.Also you would have to read more about covering indexes in BOL.PBUH |
|
|
laailalalaa
Yak Posting Veteran
57 Posts |
Posted - 2011-04-04 : 16:21:51
|
i know, i got over that (still big exec time). i was reading on table partitioning, as suggested on msdn topic. maybe that'll do the trickquote: Originally posted by Sachin.Nand
quote: Originally posted by laailalalaa nvm my question. did as u told and exec time is about 10 secs. similar suggestion here: http://social.msdn.microsoft.com/Forums/en/transactsql/thread/35994b3f-d858-4f70-a930-1b0c78b88c2equote: Originally posted by laailalalaa by "a single non clustered covering index on Interval,Type which includes columns(a,b,c,d)" u mean an index that includes columns Interval, Type, a,b,c,d?if so, the query runs in about 12 secs..:-?quote: Originally posted by Sachin.Nand Oops sorry...Let me rephrase it.If you are going to use robovolk query create a single non clustered covering index on Interval,Type which includes columns(a,b,c,d)PBUH
[url][/url][url][/url][url][/url]
Well I suggested the same thing what the optimiser suggested in the query hints..Its simple..Just look at columns in your query and the columns of the indexes I had suggested.Also you would have to read more about covering indexes in BOL.PBUH
|
|
|
Seventhnight
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2011-04-04 : 16:27:26
|
I don't know what you're intervals look like... but lets say its a date.So you might build a summary table that looks like:Type, ISeg, a, b, c, d, d_countwhere you would only have 1 row per: Type, ISeg, a, b, c, dif your 'I1' and 'I2' values repeat alot, consider just building a cache table where you store the results of a previous query.Cache table:TypeValue, I1, I2, a, b, c, d_count, expireswhere TypeValue, I1, I2 are the supplied values and a,b,c,d_count are the resultsadd an expires datetime to use to clean out the cache if not referenced frequentlyupdate the expires datetime everytime you pull data from the cache.Just some thoughts...quote: Originally posted by laailalalaa data never changes in my table. all i need to do is select, select, select. dunno how to do caching or reporting tables.. maybe some hints on where to start?quote: Originally posted by Seventhnight how much does the data change? 13 mill rows... running 50million times... doesn't sound like it changes in parallel to the runs.Maybe you could cache some of the data? or build a reporting table with counts for smaller intervals that won't change?CoreyI Has Returned!!
CoreyI Has Returned!! |
|
|
Sachin.Nand
2937 Posts |
Posted - 2011-04-04 : 16:37:14
|
quote: i know, i got over that (still big exec time). i was reading on table partitioning, as suggested on msdn topic. maybe that'll do the trick
So what is your "benchmark" execution time ?? zero ?Also how do you know partitioning will improve the performance?You are talking about data partitioning.What about index partitioning?Did you give a thought on that ? There will be hardly any performance improvement if you do not have index partitioning implemented.What about Disks?Are you planning to have it implemented on logical or physical partitioned disks ?PBUH |
|
|
LoztInSpace
Aged Yak Warrior
940 Posts |
Posted - 2011-04-05 : 03:42:20
|
I'd say a cluster on interval, type, a, b, c, d would be the only way to make this run properly against the given query (or Rob's). Without indexing the interval there is no way to know when to stop. The plan *has* to scan the whole table. I am surprised the clustered index runs slower but try putting the interval first in the CI. Don't forget though that going via an index can slow things down by adding tons of IO. It will certaintly affect your inserts.Can you tell us more about the intervals and the distibution of a, b, c and D? What kind of ranges are we talking about here? Is the interval incremental or pretty much random throughout the data?Are you timing this or looking at an execution plan? That can make a difference - run each attempt several times - you might get a better result one the plan has been created.Also, very important, do I1 and I2 live in the table (a table) or are they parameters? If they're in the table then there are many things that can make this better. |
|
|
Next Page
|
|
|
|
|