| Author |
Topic  |
|
blindman
Flowing Fount of Yak Knowledge
USA
2365 Posts |
Posted - 01/05/2006 : 12:05:27
|
...also, bitwise operations are pretty fast, and I'd bet the performance of this statement:
SELECT *
FROM MyTable
WHERE (MyBitMask & 57) = 57
...would compare favorably with the performance of a logical WHERE clause such as this:
SELECT *
FROM MyTable
WHERE Rainy = 1
AND Cloudy = 1
AND Dark = 1
AND Windy = 1
|
 |
|
|
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
USA
6997 Posts |
Posted - 01/05/2006 : 14:36:43
|
We can trade posts about bitmap indexes and so on all day, so I did some testing. Feel free to do you own. 
Here is the script I used. I got clustered index scans on the two lookup queries at the end, so it was not using the index.
create table NUM
(
N int not null primary key clustered,
RND int not null
)
go
/*
Load table NUM with a list of random integers
Takes a while to run because of the conversion
*/
insert into NUM
select
N = number,
-- Use newid to generate a random integer
RND = abs(convert(int,convert(varbinary(40),newid())))
from
-- Function available from Script Library
dbo.F_TABLE_NUMBER_RANGE(1,100000)
go
create table TEST_BITMAP
(
N int not null
primary key clustered,
BMP tinyint not null,
COL1 varchar(30) not null,
COL2 varchar(30) not null,
COL3 varchar(500) not null,
)
go
-- Load data into table
insert into TEST_BITMAP
select
N,
-- BMP will contain a value in the range of 0 to 32
BMP = RND%33,
COL1 = convert(varchar(30),RND),
COL2 = reverse(convert(varchar(30),RND)),
COL3 = replicate('123456789',50)
from
NUM
order by
N
go
-- Create index on bitmap column
create index IX_TEST_BITMAP_BMP ON TEST_BITMAP (BMP)
go
/*
Test with direct lookup on a single value
I got a clustered index scan
*/
select
N,BMP,COL1,COL2, COL3
from
TEST_BITMAP
where
BMP IN (24)
go
/*
Test with bitmap
I got a clustered index scan
*/
select
N,BMP,COL1,COL2, COL3
from
TEST_BITMAP
where
(BMP & 24) = 24
go
drop table NUM
go
drop table TEST_BITMAP
go
CODO ERGO SUM |
 |
|
|
blindman
Flowing Fount of Yak Knowledge
USA
2365 Posts |
Posted - 01/05/2006 : 15:13:51
|
Of course, I get the same results using your script as written.
But a few points... First, you stated that you doubted that SQL Server could effectively use the index. I consider a scan to be an effective use of an index. Not as effective as a seek, but definitely more effective than ignoring the index altogether, which is what the optimizer would likely do with an indexed bit column. As your code demonstrates, the optimizer does use the index to increase performance.
Second, you need to play around with the parameters a bit more if you are going to go to the trouble of creating all that sample code. Yeah, I get index scans when 100,000 records are populated with 32 possible values. But 32 values in a bitmask will only represent four distinct attributes. Add just one more attribute to your test (the poster listed five, anyway) and you bump your possible bitmask values up to 64.
And guess what? Then you get an index seek. Voila'! |
 |
|
|
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
USA
6997 Posts |
Posted - 01/05/2006 : 18:18:01
|
quote: Originally posted by blindman
Of course, I get the same results using your script as written.
But a few points... First, you stated that you doubted that SQL Server could effectively use the index. I consider a scan to be an effective use of an index. Not as effective as a seek, but definitely more effective than ignoring the index altogether, which is what the optimizer would likely do with an indexed bit column. As your code demonstrates, the optimizer does use the index to increase performance.
Second, you need to play around with the parameters a bit more if you are going to go to the trouble of creating all that sample code. Yeah, I get index scans when 100,000 records are populated with 32 possible values. But 32 values in a bitmask will only represent four distinct attributes. Add just one more attribute to your test (the poster listed five, anyway) and you bump your possible bitmask values up to 64.
And guess what? Then you get an index seek. Voila'!
I'm not sure where you got 64, but 2 to the 5th power is 32 (2*2*2*2*2 = 32), and that's why I set the test up that way. I never said that an index with higher cardinality would not do an index seek.
The index scan was a clustered index scan, which is the same as a table scan on a table with a clustered index. It was not a scan on the index that I created in the script, so the index is not used.
I did not try to rig the test to keep it from using the index. In fact, I set it up so that the table was wide in order to make it more likely to use the index by reducing the rows per page to only 16.
To be fair, I re-tested with higher cardinality, and it will use an index for a index lookup in that case with the first query. I also retested with a smaller row size, about 130 bytes. In that case it would not use the index until I raised cardinality to 256. When I reduced it to about 60 bytes per row, I had to raise the cardinality to 512 to get it to use the index.
I could never get it to use the index for the lookup when using the second query with the bitmask, even when I made the cardinality so high that all rows were unique, so it doesn't look like it will ever use the index in that case.
CODO ERGO SUM |
 |
|
|
blindman
Flowing Fount of Yak Knowledge
USA
2365 Posts |
Posted - 01/05/2006 : 23:53:33
|
Thanks for doing more testing. Interesting results with changing the rowsize. Not surprised that the second query always scans.
By the way, 2^5 is 32, as you said. 2^4 is 16 2^3 is 8 2^2 is 4 and of course, 2^1 is 2 The bitmask must be able to store ALL combinations of these values. To store all the bits as true results in a bitmask of 2+4+8+16+32=64. And that's why you need to use 64 random numbers in your test to cover five attributes. |
 |
|
|
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
USA
6997 Posts |
Posted - 01/06/2006 : 01:04:33
|
quote: Originally posted by blindman
Thanks for doing more testing. Interesting results with changing the rowsize. Not surprised that the second query always scans.
By the way, 2^5 is 32, as you said. 2^4 is 16 2^3 is 8 2^2 is 4 and of course, 2^1 is 2 The bitmask must be able to store ALL combinations of these values. To store all the bits as true results in a bitmask of 2+4+8+16+32=64. And that's why you need to use 64 random numbers in your test to cover five attributes.
Actually, 2+4+8+16+32 = 62
In any case, with 5 bits, you can have only 32 possible values, 0 to 31. 1+2+4+8+16=31
Binary 00000 = Decimal 0 Binary 00001 = Decimal 1 Binary 00010 = Decimal 2 Binary 00100 = Decimal 4 Binary 01000 = Decimal 8 Binary 10000 = Decimal 16 Binary 11111 = Decimal 31
With the Windows Calculator: Enter 31, click the Bin radio button, and it will display 11111
"There are only 10 kinds of people in the world, those that understand binary and those that don't." 
CODO ERGO SUM |
 |
|
|
Kristen
Test
United Kingdom
22191 Posts |
Posted - 01/06/2006 : 04:35:37
|
"You can't use aggregate functions on bit datatypes, so maybe that was what you were thinking about"
That was it - TaDa!.
"No argument on the "bits 4 and 5 and all others don't care" case. I think a scan would be unavoidable"
Just thinking on that a bit more (I expect I was being dense!!) two BIT columns are going to do a TABLE SCAN too ... so no worse off, and in the case of "bits 4 and 5 set to True and ALL OTHER BITS ZERO" an index may well be used, and thus perform better than BIT columns which are only ever going to SCAN.
MVJ: Very useful test script, ta.
Kristen |
 |
|
Topic  |
|
|
|