Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 optimize a query

Author  Topic 

laailalalaa
Yak Posting Veteran

57 Posts

Posted - 2011-04-04 : 14:30:02
i have a table T with (simplified)schema:
T(Type, Interval, a, b, c, d). all the fields have int type. there is no primary key and no index set on the table and it has about 13 million records (the actual table contains 9 more int fields).

i wrote a query that counts how many distinct d values appear for each triple (a,b,c), with Type = typeValue and Interval between I1 and I2 like this:

SELECT a,b,c, COUNT(d) AS NoOfDValues
FROM
(
SELECT a, b, c, d FROM T
WHERE Interval BETWEEN I1 AND I2 AND Type = typeValue
GROUP BY a,b,c,d
) AS Q1
GROUP BY Q1.a, Q1.b, Q1.c

for the data in my table, this query is executed in about 35 seconds. and i need to run it thousands of time from my app.

how can i optimize both my table (only with indexes) and my query?

thank you

ajthepoolman
Constraint Violating Yak Guru

384 Posts

Posted - 2011-04-04 : 14:40:48
One piece of optimization I would recommend is to make the select statement a stored procedure. I don't know how you are calling the database from you app, but don't hard code the select statement in your app source code. Instead, delcare a stored procedure and put the select in that. Then call the stored procedure from your app.

Hey, it compiles.
Go to Top of Page

laailalalaa
Yak Posting Veteran

57 Posts

Posted - 2011-04-04 : 14:46:04
it's already a stored procedure.
actually i was wondering whether the query could be rewritten in a more efficient manner and/or what are the best indexes to create on the table (i tried some, but the execution time is roughly the same).
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2011-04-04 : 14:47:00
There's also:

SELECT a, b, c, Count(DISTINCT d) FROM T
WHERE Interval BETWEEN I1 AND I2 AND Type = typeValue
GROUP BY a,b,c
Go to Top of Page

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2011-04-04 : 14:48:09
Create an index on (a, b, c, d)
and/or
Create an index on (Type, Interval, a, b, c, d)

or possibly

Create an index on (a, b, c)
and/or
Create an index on (Type, Interval, a, b, c)


What have you tried?

Corey

I Has Returned!!
Go to Top of Page

laailalalaa
Yak Posting Veteran

57 Posts

Posted - 2011-04-04 : 14:50:52
thanks, robvolk, it does look nicer, but the exec time is the same. i think the key to the problem lies in defining proper indexes. anyone good at this?

thank u

quote:
Originally posted by robvolk

There's also:

SELECT a, b, c, Count(DISTINCT d) FROM T
WHERE Interval BETWEEN I1 AND I2 AND Type = typeValue
GROUP BY a,b,c


Go to Top of Page

Sachin.Nand

2937 Posts

Posted - 2011-04-04 : 15:20:58
If you are going to use robovolk query create a single non clustered covering index on I1,I2,Type which includes columns(a,b,c,d)

PBUH

Go to Top of Page

laailalalaa
Yak Posting Veteran

57 Posts

Posted - 2011-04-04 : 15:29:13
thanks, seventhnight,

i got the following results:

Clustered on (a,b,c,d) - 15 seconds
Clustered on (Type, Interval, a, b, c, d) - 45 secs
Nonclustered on (a,b,c,d) and nonclust on (Type, Interval, a,b,c,d) - 35 secs
... other combinations give over 40 seconds exec time...
(i have tried much more rudimentary indexes than yours)

i guess this query cannot be executed in an under 2 or 3 secs time. :-?
quote:
Originally posted by Seventhnight

Create an index on (a, b, c, d)
and/or
Create an index on (Type, Interval, a, b, c, d)

or possibly

Create an index on (a, b, c)
and/or
Create an index on (Type, Interval, a, b, c)


What have you tried?

Corey

I Has Returned!!

Go to Top of Page

laailalalaa
Yak Posting Veteran

57 Posts

Posted - 2011-04-04 : 15:31:55
I1 and I2 are set at runtime. i need to run the query for thousands of different values for the pair(I1, I2).

quote:
Originally posted by Sachin.Nand

If you are going to use robovolk query create a single non clustered covering index on I1,I2,Type which includes columns(a,b,c,d)

PBUH



Go to Top of Page

Sachin.Nand

2937 Posts

Posted - 2011-04-04 : 15:34:32
Oops sorry...Let me rephrase it.

If you are going to use robovolk query create a single non clustered covering index on Interval,Type which includes columns(a,b,c,d)


PBUH

Go to Top of Page

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2011-04-04 : 15:39:47
Have you tried more than 1 index?

how about:
Clustered on (a,b,c,d) - 15 seconds
and
Nonclustered on (Type, Interval)

or possibly...

Clustered on (Type,a,b,c,d) - ????
and
Nonclustered on (Type, Interval)

In some cases, I've done individual column indexes to see if SS can figure out its own way to make sense of it...

Clustered on (a,b,c,d) - 15 seconds
Nonclustered on (Type)
Nonclustered on (Interval)

How specific are Type and Interval. The best bang out of an index is going to be when it allows you to skip the most number of records when processing.

Sorry I can't give you a definite answer, but I've always been a bit of a trial and error kinda guy.


quote:
Originally posted by laailalalaa

thanks, seventhnight,

i got the following results:

Clustered on (a,b,c,d) - 15 seconds
Clustered on (Type, Interval, a, b, c, d) - 45 secs
Nonclustered on (a,b,c,d) and nonclust on (Type, Interval, a,b,c,d) - 35 secs
... other combinations give over 40 seconds exec time...
(i have tried much more rudimentary indexes than yours)

i guess this query cannot be executed in an under 2 or 3 secs time. :-?


Corey

I Has Returned!!
Go to Top of Page

laailalalaa
Yak Posting Veteran

57 Posts

Posted - 2011-04-04 : 15:50:06
by "a single non clustered covering index on Interval,Type which includes columns(a,b,c,d)" u mean an index that includes columns Interval, Type, a,b,c,d?

if so, the query runs in about 12 secs..:-?

quote:
Originally posted by Sachin.Nand

Oops sorry...Let me rephrase it.

If you are going to use robovolk query create a single non clustered covering index on Interval,Type which includes columns(a,b,c,d)


PBUH



Go to Top of Page

laailalalaa
Yak Posting Veteran

57 Posts

Posted - 2011-04-04 : 15:53:13
update. i actually need to run this query about 50 million times. so this would give a total of about 160 thousand hours. i think i;m just gonna partition my data into different tables based on (a,b,c) values and run the query on partial data.

thanks everyone though for your valuable suggestions
Go to Top of Page

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2011-04-04 : 15:56:36
how much does the data change? 13 mill rows... running 50million times... doesn't sound like it changes in parallel to the runs.

Maybe you could cache some of the data? or build a reporting table with counts for smaller intervals that won't change?

Corey

I Has Returned!!
Go to Top of Page

laailalalaa
Yak Posting Veteran

57 Posts

Posted - 2011-04-04 : 16:07:13
nvm my question. did as u told and exec time is about 10 secs. similar suggestion here: http://social.msdn.microsoft.com/Forums/en/transactsql/thread/35994b3f-d858-4f70-a930-1b0c78b88c2e

quote:
Originally posted by laailalalaa

by "a single non clustered covering index on Interval,Type which includes columns(a,b,c,d)" u mean an index that includes columns Interval, Type, a,b,c,d?

if so, the query runs in about 12 secs..:-?

quote:
Originally posted by Sachin.Nand

Oops sorry...Let me rephrase it.

If you are going to use robovolk query create a single non clustered covering index on Interval,Type which includes columns(a,b,c,d)


PBUH





[url][/url][url][/url][url][/url]
Go to Top of Page

laailalalaa
Yak Posting Veteran

57 Posts

Posted - 2011-04-04 : 16:08:51
data never changes in my table. all i need to do is select, select, select.

dunno how to do caching or reporting tables.. maybe some hints on where to start?

quote:
Originally posted by Seventhnight

how much does the data change? 13 mill rows... running 50million times... doesn't sound like it changes in parallel to the runs.

Maybe you could cache some of the data? or build a reporting table with counts for smaller intervals that won't change?

Corey

I Has Returned!!

Go to Top of Page

Sachin.Nand

2937 Posts

Posted - 2011-04-04 : 16:12:22
quote:
Originally posted by laailalalaa

nvm my question. did as u told and exec time is about 10 secs. similar suggestion here: http://social.msdn.microsoft.com/Forums/en/transactsql/thread/35994b3f-d858-4f70-a930-1b0c78b88c2e

quote:
Originally posted by laailalalaa

by "a single non clustered covering index on Interval,Type which includes columns(a,b,c,d)" u mean an index that includes columns Interval, Type, a,b,c,d?

if so, the query runs in about 12 secs..:-?

quote:
Originally posted by Sachin.Nand

Oops sorry...Let me rephrase it.

If you are going to use robovolk query create a single non clustered covering index on Interval,Type which includes columns(a,b,c,d)


PBUH





[url][/url][url][/url][url][/url]



Well I suggested the same thing what the optimiser suggested in the query hints..Its simple..

Just look at columns in your query and the columns of the indexes I had suggested.Also you would have to read more about covering indexes in BOL.

PBUH

Go to Top of Page

laailalalaa
Yak Posting Veteran

57 Posts

Posted - 2011-04-04 : 16:21:51
i know, i got over that (still big exec time). i was reading on table partitioning, as suggested on msdn topic. maybe that'll do the trick

quote:
Originally posted by Sachin.Nand

quote:
Originally posted by laailalalaa

nvm my question. did as u told and exec time is about 10 secs. similar suggestion here: http://social.msdn.microsoft.com/Forums/en/transactsql/thread/35994b3f-d858-4f70-a930-1b0c78b88c2e

quote:
Originally posted by laailalalaa

by "a single non clustered covering index on Interval,Type which includes columns(a,b,c,d)" u mean an index that includes columns Interval, Type, a,b,c,d?

if so, the query runs in about 12 secs..:-?

quote:
Originally posted by Sachin.Nand

Oops sorry...Let me rephrase it.

If you are going to use robovolk query create a single non clustered covering index on Interval,Type which includes columns(a,b,c,d)


PBUH





[url][/url][url][/url][url][/url]



Well I suggested the same thing what the optimiser suggested in the query hints..Its simple..

Just look at columns in your query and the columns of the indexes I had suggested.Also you would have to read more about covering indexes in BOL.

PBUH



Go to Top of Page

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2011-04-04 : 16:27:26
I don't know what you're intervals look like... but lets say its a date.

So you might build a summary table that looks like:

Type, ISeg, a, b, c, d, d_count

where you would only have 1 row per: Type, ISeg, a, b, c, d

if your 'I1' and 'I2' values repeat alot, consider just building a cache table where you store the results of a previous query.

Cache table:

TypeValue, I1, I2, a, b, c, d_count, expires

where TypeValue, I1, I2 are the supplied values and a,b,c,d_count are the results
add an expires datetime to use to clean out the cache if not referenced frequently
update the expires datetime everytime you pull data from the cache.

Just some thoughts...





quote:
Originally posted by laailalalaa

data never changes in my table. all i need to do is select, select, select.

dunno how to do caching or reporting tables.. maybe some hints on where to start?

quote:
Originally posted by Seventhnight

how much does the data change? 13 mill rows... running 50million times... doesn't sound like it changes in parallel to the runs.

Maybe you could cache some of the data? or build a reporting table with counts for smaller intervals that won't change?

Corey

I Has Returned!!





Corey

I Has Returned!!
Go to Top of Page

Sachin.Nand

2937 Posts

Posted - 2011-04-04 : 16:37:14
quote:
i know, i got over that (still big exec time). i was reading on table partitioning, as suggested on msdn topic. maybe that'll do the trick


So what is your "benchmark" execution time ?? zero ?

Also how do you know partitioning will improve the performance?You are talking about data partitioning.What about index partitioning?Did you give a thought on that ? There will be hardly any performance improvement if you do not have index partitioning implemented.

What about Disks?Are you planning to have it implemented on logical or physical partitioned disks ?

PBUH

Go to Top of Page

LoztInSpace
Aged Yak Warrior

940 Posts

Posted - 2011-04-05 : 03:42:20
I'd say a cluster on interval, type, a, b, c, d would be the only way to make this run properly against the given query (or Rob's). Without indexing the interval there is no way to know when to stop. The plan *has* to scan the whole table. I am surprised the clustered index runs slower but try putting the interval first in the CI.
Don't forget though that going via an index can slow things down by adding tons of IO. It will certaintly affect your inserts.

Can you tell us more about the intervals and the distibution of a, b, c and D? What kind of ranges are we talking about here? Is the interval incremental or pretty much random throughout the data?
Are you timing this or looking at an execution plan? That can make a difference - run each attempt several times - you might get a better result one the plan has been created.

Also, very important, do I1 and I2 live in the table (a table) or are they parameters? If they're in the table then there are many things that can make this better.
Go to Top of Page
    Next Page

- Advertisement -