Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 Transact-SQL (2000)
 Performance of low selectivity indexes

Author  Topic 

snSQL
Master Smack Fu Yak Hacker

1837 Posts

Posted - 2006-11-17 : 16:33:43
I did some testing today on indexes based on low selectivity values - for example if you have a large table with a status column and the status for all rows is either 1 or 0. Is there much point in indexing such a table on the status column. I compared this to selecting rows based on a join to a small (possibly temporary) table that holds the PK values of the rows you want.

What I found surprised me. I'd be interested to get feedback on your thoughts or scenarios where you have found these results to hold true or not.

Here's all the info, but I'll copy the conclusion here at the beginning in case you don't want to read all the details.

Conclusion: tinyint or smallint flags are indexed MUCH more effectively than bit flags for some reason (I don't know what that is - anyone?). Using a tinyint or smallint flag is much more simple than creating a separate table and joining it, so that's the way I'd go.

I tested on a table with about 4 million rows.

Table structure is like this
CREATE TABLE [BigTable] (
[btID] [int] IDENTITY PRIMARY KEY,
[col1] [int],
[col2] [datetime],
[col3] [varchar] (52),
[col4] [char] (13),
[status] [smallint],
[statusbit] [bit],
[statusbyte] [tinyint],
)
Create index statements
create index BTstatus on BigTable(status)
create index BTstatus on BigTable(statusbit)
create index BTstatus on BigTable(statusbyte)
I only ever put on one of these at a time while testing

I tested the following:
1. Status flag (bit, tinyint, smallint makes no difference) in the table and no index on the status flag vs join to an active rows table
2. Bit status flag in the table with an index on the status flag vs join to an active rows table
3. Tinyint status flag in the table with an index on the status flag vs join to an active rows table
4. Smallint status flag in the table with an index on the status flag vs join to an active rows table

First set of tests on 524 active rows

The average join performance was
Duration - 16 ms, 1918 reads.

Obviously using no index the performance is hopeless
Duration - 19 sec, 242666 reads.

But the indexed status flag performance was very interesting
Bit: Duration - 460 ms, 8950 reads.
Tinyint: Duration - 23 ms, 1826 reads.
Smallint: Duration - 23 ms, 1833 reads.

First set of tests on 8367 active rows

The average join performance was
Duration - 203 ms, 29128 reads.

No index the performance
Duration - 20 sec, 242671 reads.

But the indexed status flag performance was very interesting
Bit: Duration - 595 ms, 34937 reads.
Tinyint: Duration - 203 ms, 27796 reads.
Smallint: Duration - 203 ms, 27825 reads.

So an index on the status flag is not a good idea if you use a bit field,
- about 20 times slower on 4 times as many reads when selecting 524 out of 4 million rows
- about 3 times slower on 1.2 times as many reads when selecting 8367 out of 4 million rows
BUT if you use a tinyint for the flag the performance of the indexed status flag is about the same on slightly fewer reads!

Conclusion: tinyint or smallint flags are indexed MUCH more effectively than bit flags for some reason (I don't know what that is - anyone?). Using a tinyint or smallint flag is much more simple than creating a separate table and joining it, so that's the way I'd go.

PSamsig
Constraint Violating Yak Guru

384 Posts

Posted - 2006-11-19 : 10:36:16
My guess is that bit gets 'promoted' to int, would be interesting to see how int would have performed just for comparesen.

Im happy to see that the streight forward method of using an index performs pretty equal to the much more complicated join one, even in a simple in PK.

What server version did you test on ? (Can't be 7.0, since bit indexes isn't allowed, and EM can't disallow it too, but it can be done (and I have a few to change to tiny/small int now )

-- Alice came to a fork in the road. "Which road do I take?" she asked. "Where do you want to go?" responded the Cheshire cat. "I don't know," Alice answered. "Then," said the cat, "it doesn't matter."
Go to Top of Page

snSQL
Master Smack Fu Yak Hacker

1837 Posts

Posted - 2006-11-19 : 19:48:30
I did this on 2000, but I plan to test 2005 tomorrow and will post those results then.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2006-11-20 : 04:02:17
(Didn't want to employ I was being silent here, just haven't had time yet. I have a 15M row table copied into a Test database. I've got as far as Reindex and Up-Stats etc., and plan to run some test queries. The table has a column with 1/2/3 values in it representing about 10%, 89%, 1% respectively, so I should be able to test out some theories. There is another column where about 10 values are used on around 80% of the rows, and the rest of the rows have highly selective values, so I'm hoping to get a good idea for how well queries perform against the (supposedly!) more selective values.

Its running on a fire-breathing server too ...

I'll report back ... but it may take a little while before I gather together some spare time

Kristen
Go to Top of Page

snSQL
Master Smack Fu Yak Hacker

1837 Posts

Posted - 2006-11-20 : 16:21:36
When running the tests on SQL Server 2005 I realized that my previous numbers were out because the table was severely fragmented. After rebuilding the clustered index to defrag the table the performance of the flag without an index improved dramatically due to a table scan being much faster. As a result table scans were also selected by the optimizer sooner because a table scan was now much more efficient.

Here are the numbers I now have - they do indicate that you benefit from using a join table if the number of rows gets high enough that the optimizer no longer chooses the index.

I'm testing 2005 in a virtual machine, so the duration is inflated, the important thing is the number of reads, which is a little lower across the board for 2005.
AND in 2005
1. the performance of the bit index in 2005 is about the same as the other indexes
2. the optimizer makes much better choices, so the performance of the flag vs the join table was almost the same with 500, 5000 or 10000 active rows in a table with 4 million rows

SQL Server 2000
500 rows
Join table - 30 ms, 1742 reads
No index, bit flag - 710 ms, 41214 reads
No index, tinyint flag - 640 ms, 41214 reads
No index, smallint flag - 640 ms, 41214 reads
Indexed bit flag - 520 ms, 8827 reads
Indexed tinyint flag - 15 ms, 1703 reads
Indexed smallint flag - 16 ms, 1731 reads

5000 rows
Join table - 124 ms, 16723 reads
No index, bit flag - 820 ms, 41214 reads
No index, tinyint flag - 780 ms, 41214 reads
No index, smallint flag - 780 ms, 41214 reads
Indexed bit flag - 820 ms, 41214 reads (table scan)
Indexed tinyint flag - 780 ms, 41214 reads (table scan)
Indexed smallint flag - 780 ms, 41214 reads (table scan)

10000 rows
Join table - 1600 ms, 39559 reads
No index, bit flag - 1600 ms, 41214 reads
No index, tinyint flag - 1600 ms, 41214 reads
No index, smallint flag - 1600 ms, 41214 reads
Indexed bit flag - 1600 ms, 41214 reads (table scan)
Indexed tinyint flag - 1600 ms, 41214 reads (table scan)
Indexed smallint flag - 1600 ms, 41214 reads (table scan)

SQL Server 2005
500 rows
Join table - 50 ms, 1623 reads
No index, bit flag - 1250 ms, 32836 reads
No index, tinyint flag - 1030 ms, 32836 reads
No index, smallint flag - 860 ms, 32836 reads
Indexed bit flag - 150 ms, 1623 reads
Indexed tinyint flag - 140 ms, 1623 reads
Indexed smallint flag - 90 ms, 1625 reads

5000 rows
Join table - 280 ms, 15976 reads
No index, bit flag - 1400 ms, 32836 reads
No index, tinyint flag - 1175 ms, 32836 reads
No index, smallint flag - 1100 ms, 32836 reads
Indexed bit flag - 370 ms, 15974 reads
Indexed tinyint flag - 350 ms, 15974 reads
Indexed smallint flag - 280 ms, 15975 reads

10000 rows
Join table - 590 ms, 31929 reads
No index, bit flag - 1620 ms, 32836 reads
No index, tinyint flag - 1400 ms, 32836 reads
No index, smallint flag - 1400 ms, 32836 reads
Indexed bit flag - 675 ms, 31919 reads
Indexed tinyint flag - 600 ms, 31919 reads
Indexed smallint flag - 540 ms, 31921 reads
Go to Top of Page

snSQL
Master Smack Fu Yak Hacker

1837 Posts

Posted - 2006-11-20 : 21:29:39
Figured out the discrepancy on SQL Server 2000 with the bit index.

You need to cast the value to a bit in the WHERE clause so that the optimizer correctly uses the index.

This is just as fast as the tinyint or smallint because like them it uses an index seek.

SELECT * FROM table WHERE statusbit = cast(1 as bit)
Go to Top of Page
   

- Advertisement -