Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
snSQL
Master Smack Fu Yak Hacker
1837 Posts |
Posted - 2006-11-17 : 16:33:43
|
I did some testing today on indexes based on low selectivity values - for example if you have a large table with a status column and the status for all rows is either 1 or 0. Is there much point in indexing such a table on the status column. I compared this to selecting rows based on a join to a small (possibly temporary) table that holds the PK values of the rows you want.What I found surprised me. I'd be interested to get feedback on your thoughts or scenarios where you have found these results to hold true or not.Here's all the info, but I'll copy the conclusion here at the beginning in case you don't want to read all the details.Conclusion: tinyint or smallint flags are indexed MUCH more effectively than bit flags for some reason (I don't know what that is - anyone?). Using a tinyint or smallint flag is much more simple than creating a separate table and joining it, so that's the way I'd go.I tested on a table with about 4 million rows.Table structure is like thisCREATE TABLE [BigTable] ( [btID] [int] IDENTITY PRIMARY KEY, [col1] [int], [col2] [datetime], [col3] [varchar] (52), [col4] [char] (13), [status] [smallint], [statusbit] [bit], [statusbyte] [tinyint],)Create index statementscreate index BTstatus on BigTable(status)create index BTstatus on BigTable(statusbit)create index BTstatus on BigTable(statusbyte)I only ever put on one of these at a time while testingI tested the following: 1. Status flag (bit, tinyint, smallint makes no difference) in the table and no index on the status flag vs join to an active rows table 2. Bit status flag in the table with an index on the status flag vs join to an active rows table 3. Tinyint status flag in the table with an index on the status flag vs join to an active rows table 4. Smallint status flag in the table with an index on the status flag vs join to an active rows table First set of tests on 524 active rowsThe average join performance wasDuration - 16 ms, 1918 reads.Obviously using no index the performance is hopelessDuration - 19 sec, 242666 reads.But the indexed status flag performance was very interestingBit: Duration - 460 ms, 8950 reads.Tinyint: Duration - 23 ms, 1826 reads.Smallint: Duration - 23 ms, 1833 reads.First set of tests on 8367 active rowsThe average join performance wasDuration - 203 ms, 29128 reads.No index the performanceDuration - 20 sec, 242671 reads.But the indexed status flag performance was very interestingBit: Duration - 595 ms, 34937 reads.Tinyint: Duration - 203 ms, 27796 reads.Smallint: Duration - 203 ms, 27825 reads.So an index on the status flag is not a good idea if you use a bit field, - about 20 times slower on 4 times as many reads when selecting 524 out of 4 million rows - about 3 times slower on 1.2 times as many reads when selecting 8367 out of 4 million rowsBUT if you use a tinyint for the flag the performance of the indexed status flag is about the same on slightly fewer reads!Conclusion: tinyint or smallint flags are indexed MUCH more effectively than bit flags for some reason (I don't know what that is - anyone?). Using a tinyint or smallint flag is much more simple than creating a separate table and joining it, so that's the way I'd go. |
|
PSamsig
Constraint Violating Yak Guru
384 Posts |
Posted - 2006-11-19 : 10:36:16
|
My guess is that bit gets 'promoted' to int, would be interesting to see how int would have performed just for comparesen.Im happy to see that the streight forward method of using an index performs pretty equal to the much more complicated join one, even in a simple in PK.What server version did you test on ? (Can't be 7.0, since bit indexes isn't allowed, and EM can't disallow it too, but it can be done (and I have a few to change to tiny/small int now )-- Alice came to a fork in the road. "Which road do I take?" she asked. "Where do you want to go?" responded the Cheshire cat. "I don't know," Alice answered. "Then," said the cat, "it doesn't matter." |
 |
|
snSQL
Master Smack Fu Yak Hacker
1837 Posts |
Posted - 2006-11-19 : 19:48:30
|
I did this on 2000, but I plan to test 2005 tomorrow and will post those results then. |
 |
|
Kristen
Test
22859 Posts |
Posted - 2006-11-20 : 04:02:17
|
(Didn't want to employ I was being silent here, just haven't had time yet. I have a 15M row table copied into a Test database. I've got as far as Reindex and Up-Stats etc., and plan to run some test queries. The table has a column with 1/2/3 values in it representing about 10%, 89%, 1% respectively, so I should be able to test out some theories. There is another column where about 10 values are used on around 80% of the rows, and the rest of the rows have highly selective values, so I'm hoping to get a good idea for how well queries perform against the (supposedly!) more selective values.Its running on a fire-breathing server too ...I'll report back ... but it may take a little while before I gather together some spare timeKristen |
 |
|
snSQL
Master Smack Fu Yak Hacker
1837 Posts |
Posted - 2006-11-20 : 16:21:36
|
When running the tests on SQL Server 2005 I realized that my previous numbers were out because the table was severely fragmented. After rebuilding the clustered index to defrag the table the performance of the flag without an index improved dramatically due to a table scan being much faster. As a result table scans were also selected by the optimizer sooner because a table scan was now much more efficient.Here are the numbers I now have - they do indicate that you benefit from using a join table if the number of rows gets high enough that the optimizer no longer chooses the index.I'm testing 2005 in a virtual machine, so the duration is inflated, the important thing is the number of reads, which is a little lower across the board for 2005. AND in 2005 1. the performance of the bit index in 2005 is about the same as the other indexes2. the optimizer makes much better choices, so the performance of the flag vs the join table was almost the same with 500, 5000 or 10000 active rows in a table with 4 million rowsSQL Server 2000500 rowsJoin table - 30 ms, 1742 readsNo index, bit flag - 710 ms, 41214 readsNo index, tinyint flag - 640 ms, 41214 readsNo index, smallint flag - 640 ms, 41214 readsIndexed bit flag - 520 ms, 8827 readsIndexed tinyint flag - 15 ms, 1703 readsIndexed smallint flag - 16 ms, 1731 reads5000 rowsJoin table - 124 ms, 16723 readsNo index, bit flag - 820 ms, 41214 readsNo index, tinyint flag - 780 ms, 41214 readsNo index, smallint flag - 780 ms, 41214 readsIndexed bit flag - 820 ms, 41214 reads (table scan)Indexed tinyint flag - 780 ms, 41214 reads (table scan)Indexed smallint flag - 780 ms, 41214 reads (table scan)10000 rowsJoin table - 1600 ms, 39559 readsNo index, bit flag - 1600 ms, 41214 readsNo index, tinyint flag - 1600 ms, 41214 readsNo index, smallint flag - 1600 ms, 41214 readsIndexed bit flag - 1600 ms, 41214 reads (table scan)Indexed tinyint flag - 1600 ms, 41214 reads (table scan)Indexed smallint flag - 1600 ms, 41214 reads (table scan)SQL Server 2005500 rowsJoin table - 50 ms, 1623 readsNo index, bit flag - 1250 ms, 32836 readsNo index, tinyint flag - 1030 ms, 32836 readsNo index, smallint flag - 860 ms, 32836 readsIndexed bit flag - 150 ms, 1623 readsIndexed tinyint flag - 140 ms, 1623 readsIndexed smallint flag - 90 ms, 1625 reads5000 rowsJoin table - 280 ms, 15976 readsNo index, bit flag - 1400 ms, 32836 readsNo index, tinyint flag - 1175 ms, 32836 readsNo index, smallint flag - 1100 ms, 32836 readsIndexed bit flag - 370 ms, 15974 readsIndexed tinyint flag - 350 ms, 15974 reads Indexed smallint flag - 280 ms, 15975 reads 10000 rowsJoin table - 590 ms, 31929 readsNo index, bit flag - 1620 ms, 32836 readsNo index, tinyint flag - 1400 ms, 32836 readsNo index, smallint flag - 1400 ms, 32836 readsIndexed bit flag - 675 ms, 31919 reads Indexed tinyint flag - 600 ms, 31919 reads Indexed smallint flag - 540 ms, 31921 reads |
 |
|
snSQL
Master Smack Fu Yak Hacker
1837 Posts |
Posted - 2006-11-20 : 21:29:39
|
Figured out the discrepancy on SQL Server 2000 with the bit index.You need to cast the value to a bit in the WHERE clause so that the optimizer correctly uses the index.This is just as fast as the tinyint or smallint because like them it uses an index seek. SELECT * FROM table WHERE statusbit = cast(1 as bit) |
 |
|
|
|
|
|
|