Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 Old Forums
 CLOSED - SQL Server 2005/Yukon
 SQL I/O Scan Counts

Author  Topic 

datamonkey
Starting Member

29 Posts

Posted - 2006-12-15 : 17:47:00
Hi,
I was curious if any happened to know why I might be getting more then one scan count when attempting to pull data from a fact table with a joined dimension. Verses applying the constraint directly to the fact table. In addition I have also attempted to force the constraint against the fact table but leaving the dimensional join in place and I get the same IO stat result.


select col1, sum(col2) from
tblfact ft
inner join tbltime td
on
ft.factdate = td.timedate
where td.timedate between '1/1/2006' and '1/3/2006'
group by ft.col1

io stats:
Table .'tblFact' Scan count 3, logical reads 1088, physical reads 4, read-ahead reads 226, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'tblTimeDim'. Scan count 1, logical reads 2, and physical reads 2, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

the only way I have been able to get the optimizer to return a single scan is by removing the dimensional join and applying the constraint to the fact table.

Is this normal...?
Any thoughts
Thanks

snSQL
Master Smack Fu Yak Hacker

1837 Posts

Posted - 2006-12-15 : 18:00:20
Do you have an index on factdate in the fact table?
Go to Top of Page

datamonkey
Starting Member

29 Posts

Posted - 2006-12-15 : 18:02:33
yep, there is index on both columns taking part in the join.
Go to Top of Page

datamonkey
Starting Member

29 Posts

Posted - 2006-12-15 : 18:17:59
Here is the show plan:

|--Compute Scalar(DEFINE:([Expr1013]=CASE WHEN [globalagg1017]=(0) THEN NULL ELSE [globalagg1019] END))
|--Stream Aggregate(GROUP BY:([TESTDB].[dbo].[tblFact].[Client_ID]) DEFINE:([globalagg1017]=SUM([partialagg1016]), [globalagg1019]=SUM([partialagg1018])))
|--Sort(ORDER BY:([TESTDB].[dbo].[tblFact].[col1] ASC))
|--Nested Loops(Inner Join, OUTER REFERENCES:([TESTDB].[dbo].[tblFact].[FactDate], [Expr1048]) WITH UNORDERED PREFETCH)
|--Hash Match(Aggregate, HASH:([TESTDB].[dbo].[tblFact].[col1], [TESTDB].[dbo].[tblFact].[factdate]), RESIDUAL:([TESTDB].[dbo].[tblFact].[col1] = [TESTDB].[dbo].[tblFact].[col1] AND [TESTDB].[dbo].[tblFact].[factdate] = [TESTDB].[dbo].[tblFact].[factdate]) DEFINE:([partialagg1016]=COUNT_BIG([tblFact].[dbo].[tblFact].[col2]), [partialagg1018]=SUM([TESTDB].[dbo].[tblFact].[col2])))
| |--Table Scan(OBJECT:([TESTDB].[dbo].[tblFact]), WHERE:([TESTDB].[dbo].[tblFact].[factdate]>='2006-01-01 00:00:00.000' AND [TESTDB].[dbo].[tblFact].[factdate]<='2006-01-03 00:00:00.000') PARTITION ID:((86)))
|--Index Seek(OBJECT:([TESTDBDB].[dbo].[tblTime].[IX_timedate] AS [td]), SEEK:([td].[timedate]=[TESTDB].[dbo].[tblFact].[factdate]), WHERE:([TESTDB].[dbo].[tblTime].[timedate] as [td].[timedate]>='2006-01-01 00:00:00.000' AND [TESTDB].[dbo].[tblTime].[timedate] as [td].[timedate]<='2006-01-03 00:00:00.000') ORDERED FORWARD)
Go to Top of Page

datamonkey
Starting Member

29 Posts

Posted - 2006-12-15 : 18:31:52
I figured that the optimizer is having to loop through the tblfact table in order to apply the join but I would suspect that it would return a single result set based on the partialagg1016 THEN apply the nested loop to bring in the tblTime table.( the multiple scan seems to be applied when the join is apply and is linked to the number of days within the constraint.) the odd thing is when the join is remove and the constraint is applied to the tblFact table it returns a single scan.

Why would a simple join increase the number scan applied to the fact table.
hmm...
Thanks
Go to Top of Page

datamonkey
Starting Member

29 Posts

Posted - 2006-12-18 : 21:43:19
Does anyone have any thoughts as to why this might be happening?
Thanks
Go to Top of Page

evjo
Starting Member

20 Posts

Posted - 2006-12-23 : 00:09:29
Do you have a index on Col1? the group by could be causing the large number of reads.
Go to Top of Page

datamonkey
Starting Member

29 Posts

Posted - 2006-12-26 : 13:46:24
Hi evjo, nope there is not an index on col1. though even when i do use an indexed field it still performs the multi-scan on the fact table.
Go to Top of Page

snSQL
Master Smack Fu Yak Hacker

1837 Posts

Posted - 2006-12-26 : 13:59:37
Is the fact table index on factdate a multi-column index or is it indexed only on that column? Also, when last did you rebuild that index?

Run DBCC SHOWCONTIG on the factdate index and post the results.
Go to Top of Page

datamonkey
Starting Member

29 Posts

Posted - 2006-12-26 : 18:24:08
The index is a multi-column index including the col1 and factdate columns

dbcc showcontig (tblfact,IX_Fact_Date)

DBCC SHOWCONTIG scanning 'tblFact' table...
Table: 'tblFact' (1887345788); index ID: 8, database ID: 26
LEAF level scan performed.
- Pages Scanned................................: 22388
- Extents Scanned..............................: 2799
- Extent Switches..............................: 2798
- Avg. Pages per Extent........................: 8.0
- Scan Density [Best Count:Actual Count].......: 100.00% [2799:2799]
- Logical Scan Fragmentation ..................: 0.00%
- Extent Scan Fragmentation ...................: 0.25%
- Avg. Bytes Free per Page.....................: 793.1
- Avg. Page Density (full).....................: 90.20%
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
Go to Top of Page

snSQL
Master Smack Fu Yak Hacker

1837 Posts

Posted - 2006-12-26 : 18:53:07
OK, that may be it then - when you say the index is by Col1 and by FactDate, are they in that order - Col1 first then FactDate? If so, the query cannot use the index because the rows for the date range you want are not contigous in the index - they are by Col1 first and you do not have Col1 in the WHERE clause.

Your query is a like me asking you to find all the people with a first name of Anne in a phone book. The fact that people are arranged alphabetically by last name then first name really doesn't help you because there could be Annes for every last name in the book.

To prevent the table scans, you either need an index that is first by FactDate, or you need to specify a filter for Col1 in your query (but I assume you want all values for Col1 or you'd be doing it already).
Go to Top of Page

datamonkey
Starting Member

29 Posts

Posted - 2006-12-26 : 20:44:40
Hi snSQL, yep the IX_Fact_Date is comprised of the following columns
(Fact_Date, Col1) in that order.
Which I am pretty sure should return the least amount of scan due to the query will always have a date constraint in it. as such should narrow down the index down to defined list if Col1 to group by.
The Scan I’m seeing isn't so much as a table scan rather its a I/O scan (in terms of the optimizer having to loop through the index for every constraint in the constraint dataset. the odd thing is when the join to the time dimension is removed the number of I/O s are the dropped by 3rd yet the scan count is relative to the number of loops the date constraint had applied.
Doesn’t that seem odd?

Thanks
DM
Go to Top of Page

snSQL
Master Smack Fu Yak Hacker

1837 Posts

Posted - 2006-12-27 : 13:16:29
It does make sense because it cannot ignore the time dimension if you put it in the query. There is a big difference between joining the tables and filtering on the time dimension and just filtering on the fact table. If there are rows in the fact table between the given dates that do not have matching rows in the dimension then they would be excluded by the join but not by a filter directly on the fact table (you know if there are such rows, but SQL Server doesn't).

So SQL Server executes a join strategy that will get the rows as best as the indexing will allow and that's what you're seeing. In this case if you don't need to join to the dimension table, then don't and performance will be (and should be) better.
Go to Top of Page
   

- Advertisement -