I'm working an OLAP project where a lot of tables have been read over from their original source to a data warehouse. These tables in general have been read over without their primary keys. No tables were created with clustered indices (a table with a primary key gets a clustered index on that PK by default if no other clustered index is created).Analysis Services shows you the SQL statements it uses when it creates cubes. I took each SQL statement and ran it through the "Display Estimated Execution Plan" tool in SQL Query Analyzer. I saw that all of the SQL created caused table scans.I created clustered indices on the tables involved with each cube process. Clustered indices seemed to be the answer, because the SQL for a cube process is generally like:select (lots of columns)from fact1, dim1, dim2, dim3where fact1.d1 = dim1.d1 and fact2.d2 = dim2.d2 and fact3.d3 = dim3.d3
This results in selects which are low-selectivity. So non-clustered indices (which are used at 10% selectivity or lower, or 10%-15% sometimes) won't ever be used. I had to remember that a table can only have one clustered index (because a clustered index is really a re-ordering of the rows in the table-- can't do that more than once).I used this syntax:-- First, I tested the tables for uniqueness of keysselect d1, count(*) from dim1 group by d1 having count(*) > 1-- This should come back with no rows-- Then I created the indexcreate unique clustered index ix_dim1_d1 on dim1(d1)
The result? No table scans at all. And a drop in cube processing time, in one case, from 20 minutes to 3. Pretty good! I recently read an article that says that all tables should have a clustered index. I am now a believer!--TimothyAllen[url]http://www.timallen.org[/url]