Big Table Query Performance Problem

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

Transact-SQL (2000)

Big Table Query Performance Problem

Author

Topic

Kristen
Test

22859 Posts

Posted - 2006-11-13 : 02:22:32

I have a table with about 5,000,000 rows.

PK is an identity (clustered). There is also a CreateDate column & index. Records are inserted in very-nearly date order, so the ID and CreateDate are more-or-less in the same order.

CreateDate covers about 3 years

The ID is used often in FKs, so needs to retain the Clustered Index.

There are reporting queries based on a range of CreateDate. They always results in a Clustered Index Scan of the PK.

I wondered if there is a way to get the MIN / MAX ID for the date range, and thus constrain the query by that, as well as a FineTuning on CreateDate.

However, I can't find a way to get the MIN / MAX ID without incurring an IndexScan

I even created a CreateDate, ID index so that it was covered

Everything results in an Index Scan on the PK.

The sort of figures I'm getting are:


SELECT	@MIN_ID = MIN(ID),
	@MAX_ID = MAX(ID)
FROM	dbo.MyTable
WHERE	    CreateDate >= @StartDate
	AND CreateDate < @EndDate

SQL Server Execution Times:
Table 'MyTable'. Scan count 2, logical reads 128049, physical reads 0, read-ahead reads 95553.
CPU time = 5906 ms, elapsed time = 23028 ms.

and then the big query:
[code]
SELECT ... stuff ...
FROM MyTable
WHERE ID >= @MIN_ID
AND ID <= @MAX_ID
AND CreateDate >= @StartDate
AND CreateDate < @EndDate
[code]
Table 'MyTable'. Scan count 1, logical reads 471, physical reads 0, read-ahead reads 0.
CPU time = 563 ms, elapsed time = 624 ms.

If I do NOT use the pre-calculated MIN/MAX ID the second big query gives:

Table 'MyTable'. Scan count 1, logical reads 128515, physical reads 0, read-ahead reads 126902.
CPU time = 2172 ms, elapsed time = 35841 ms.

so is not really any slower.

However, it does seem clear that IF I could "cheaply" get the MIN / MAX ID then the query would be really fast ...

Perhaps I should resort to a crib-list table where once a day I add the cut-off ID number for that date

Thanks chaps,

Kristen

harsh_athalye
Master Smack Fu Yak Hacker

5581 Posts

Posted - 2006-11-13 : 03:01:50

How about horizontal partitioning data based on CreateDate so that you will have identical tables for each year data and your search will be narrowed down to specific year?

Harsh Athalye
India.
"Nothing is Impossible"

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2006-11-13 : 03:14:14

I think I am more in the mood for a "lookup" table
CREATE TABLE DateIDs(CreateDate DATETIME, MinID INT, MaxID INT)
with covering index.

Peter Larsson
Helsingborg, Sweden

Kristen
Test

22859 Posts

Posted - 2006-11-13 : 04:17:33

"I think I am more in the mood for a "lookup" table"

Sadly me too. I was hoping, given that I have a CreateDate index (which presumably also includes the Clustered PK's [ID]) that ought to Cover it.

Hmmm ... Huzzah! I tried a LOCK HINT:


SELECT	@MIN_ID = MIN(ID),
	@MAX_ID = MAX(ID)
FROM	dbo.MyTable WITH (INDEX(MyCreateDateIndex))
WHERE	    CreateDate >= @StartDate
	AND CreateDate < @EndDate

which gives

Table 'MyTable'. Scan count 1, logical reads 38, physical reads 0, read-ahead reads 0.
CPU time = 9 ms, elapsed time = 9 ms.

I'm always weary of HINT though, what do you think?

Perhaps I ought to stick a NOLOCK in there whilst I'm bending the rules ...

Kristen

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2006-11-13 : 04:40:36

Maybe, if the query is not mission-critical...

Peter Larsson
Helsingborg, Sweden

Kristen
Test

22859 Posts

Posted - 2006-11-13 : 05:11:41

Yeah, that's the point isn't it

(This table records Session data on a web site. As such CreateDate does not change, once record inserted. There are a few updates that may occur later - to set the "End Date" for the session, number of pages visited, that sort of thing)

So on that basis I think it may be reasonable. But I hate overriding the Query Optimiser ... that HINT will get left in when we migrate to SQL2005, never get reviewed, and probably get to be a nuisance at some point ...

... which I suppose comes back to making my own LookUp table like you suggested in the first place

Kristen

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2006-11-13 : 05:16:58

what kind of exec plan does this give you?


SELECT	@MIN_ID = MIN(ID),
	@MAX_ID = MAX(ID)
FROM	dbo.MyTable
WHERE	ID >= 0
        AND CreateDate >= @StartDate
	AND CreateDate < @EndDate

Go with the flow & have fun! Else fight the flow

blog thingie: http://weblogs.sqlteam.com/mladenp

Kristen
Test

22859 Posts

Posted - 2006-11-13 : 05:55:11

[code]
StmtText
-------------------------------
|--Nested Loops(Inner Join)
|--Stream Aggregate(DEFINE:([Expr1002]=MIN([MyTable].[ID])))
| |--Top(1)
| |--Clustered Index Scan(OBJECT:([MyDB].[dbo].[MyTable].[PK_MyTable]),
WHERE:([MyTable].[CreateDate]>=[@StartDate] AND [MyTable].[CreateDate]<[@EndDate]]) ORDERED FORWARD)
|--Stream Aggregate(DEFINE:([Expr1003]=MAX([MyTable].[ID])))
|--Top(1)
|--Clustered Index Scan(OBJECT:([MyDB].[dbo].[MyTable].[PK_MyTable]),
WHERE:([MyTable].[CreateDate]>=[@StartDate] AND [MyTable].[CreateDate]<[@EndDate]]) ORDERED BACKWARD)
[/code]
whereas with the Hint: WITH (INDEX(MyCreateDateIndex)) I get:
[code]
StmtText
----------------------
|--Stream Aggregate(DEFINE:([Expr1002]=MIN([MyTable].[ID]), [Expr1003]=MAX([MyTable].[ID])))
|--Index Seek(OBJECT:([MyDB].[dbo].[MyTable].[MyCreateDateIndex]),
SEEK:([MyTable].[CreateDate] >= [@StartDate] AND [MyTable].[CreateDate] < @EndDate) ORDERED FORWARD)
[/code]
Actually I'm probably better off with two queries and force the index-seek to only span a single day for each of the MIN and MAX:
[code]
SELECT @MIN_ID = MIN(ID)
FROM dbo.MyTable WITH (INDEX(MyCreateDateIndex))
WHERE CreateDate >= @StartDate
AND CreateDate < DATEADD(Day, 1, @StartDate)

SELECT @MAX_ID = MAX(ID)
FROM dbo.MyTable WITH (INDEX(MyCreateDateIndex))
WHERE CreateDate >= DATEADD(Day, -1, @EndDate)
AND CreateDate < @EndDate
[/code]
The first, over a date range of 2 weeks, gives:

Table 'MyTable'. Scan count 1, logical reads 291, physical reads 0, read-ahead reads 289.
CPU time = 172 ms, elapsed time = 207 ms

the second pair of queries gives:

Table 'MyTable'. Scan count 1, logical reads 20, physical reads 0, read-ahead reads 0.
CPU time = 0 ms, elapsed time = 7 ms.

Table 'MyTable'. Scan count 1, logical reads 72, physical reads 0, read-ahead reads 0.
CPU time = 22 ms, elapsed time = 22 ms.

Might be a little less efficient for queries that only span a single day, but they are rare - a week or two is the normal query period

Kristen

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2006-11-13 : 05:59:22

how are your statistics on that table?
do you use FULLSCAN when updating them?

Go with the flow & have fun! Else fight the flow

blog thingie: http://weblogs.sqlteam.com/mladenp

Kristen
Test

22859 Posts

Posted - 2006-11-13 : 06:54:10

Stats updated each night, and using FULLSCAN

but thanks for thinking of that.

Perhaps I ought to just double check that they are up to date! <fx:toddles-off>

Kristen

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2006-11-13 : 07:00:21

you didn't answer my first question so i assume i didn't help adding ID > 0...

Go with the flow & have fun! Else fight the flow

blog thingie: http://weblogs.sqlteam.com/mladenp

Kristen
Test

22859 Posts

Posted - 2006-11-13 : 07:31:16

"ID >= 0"

Missed it, sorry.

It changes it, but still operates on the Clustered PK rather than the CreateDate index:


StmtText
----------------
  |--Nested Loops(Inner Join)
       |--Stream Aggregate(DEFINE:([Expr1002]=MIN([MyTable].[ID])))
       |    |--Top(1)
       |         |--Clustered Index Scan(OBJECT:([MyDB].[dbo].[MyTable].[PK_MyTable]),  
                     WHERE:([MyTable].[CreateDate]>=[@dtStart] AND [MyTable].[CreateDate]<[@dtEnd]) ORDERED FORWARD)
       |--Stream Aggregate(DEFINE:([Expr1003]=MAX([MyTable].[ID])))
            |--Top(1)
                 |--Clustered Index Scan(OBJECT:([MyDB].[dbo].[MyTable].[PK_MyTable]),  
                     WHERE:([MyTable].[CreateDate]>=[@dtStart] AND [MyTable].[CreateDate]<[@dtEnd]) ORDERED BACKWARD)

StmtText
-------------------------
  |--Nested Loops(Inner Join)
       |--Stream Aggregate(DEFINE:([Expr1002]=MIN([MyTable].[ID])))
       |    |--Top(1)
       |         |--Clustered Index Seek(OBJECT:([MyDB].[dbo].[MyTable].[PK_MyTable]), 
                     SEEK:([MyTable].[ID] > 0),  
                     WHERE:([MyTable].[CreateDate]>=[@dtStart] AND [MyTable].[CreateDate]<[@dtEnd]) ORDERED FORWARD)
       |--Stream Aggregate(DEFINE:([Expr1003]=MAX([MyTable].[ID])))
            |--Top(1)
                 |--Clustered Index Seek(OBJECT:([MyDB].[dbo].[MyTable].[PK_MyTable]), 
                     SEEK:([MyTable].[ID] > 0),  
                     WHERE:([MyTable].[CreateDate]>=[@dtStart] AND [MyTable].[CreateDate]<[@dtEnd]) ORDERED BACKWARD)