Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Development (2000)
 bitmap column

Author  Topic 

ujjaval
Posting Yak Master

108 Posts

Posted - 2006-01-03 : 18:52:27
Hello all,

I need to store a certain type of data where possible answers are 'Yes' and 'No' for some number of fields. I meant, lets say in a table
I have following columns:

-Rainy (Yes/No)
-Windy (Yes/No)
-Sunny (Yes/No)
-Cloudy(Yes/No)
-Dark (Yes/No)

Please don't question the table design as I am taking this as an example table.
Now, instead of having 5 separate columns for these fields, I was thinking, if I can have 1 field named 'WeatherCondition' which stores all these column values into one column as number which can act as a bitmap.

So lets say I assign the following values to each of these field

1 - Rainy
2 - Windy
4 - Sunny
8 - Cloudy
16 - Dark

Now, for a particular row in that table lets say I have following values for these fields..

Rainy = yes
Windy = yes
Sunny = no
Cloudy = no
dark = yes

Now, as per the numbers assigned to these field above, I can sum the values which has 'yes' and store that into a single column 'WeatherCondition'. In the above case that number will be 19.

These values are easy to retrieve using any programming language in front-end with simple function for coding and decoding of these numbers.

But I would like to have this facility in SQL Server itself.
I mean when I want to insert any data into the table with single column 'WeatherCondition', I should be able to use following statement:

insert into example_table( fieldA, fieldB, WeatherCondition) values ('test','test','Rainy,Windy, Dark')

or I should be able to do following select statement:

select * from example_table where WeatherCondition='Rainy, Windy, Dark'.

Anybody knows how to do this in SQL Server?

Just for the information, I can't rely on programming language in frontend for this feature as many ppl will query directly on SQL Server for their research purpose. And also, keeping all such columns as separate columns makes table very wide.

Thanks in advance,
ujjaval



nathans
Aged Yak Warrior

938 Posts

Posted - 2006-01-03 : 19:54:15
Look up Bitwise operators in BOL. This should get you going:

set nocount on
declare @weather table (weather_id int, weather_name varchar(25), bit_group int)
insert into @weather
select 1, 'Rainy', 1 union all
select 2, 'Windy', 2 union all
select 3, 'Sunny', 4 union all
select 4, 'Cloudy', 8 union all
select 5, 'Dark', 16


declare @field table (field_id int, field_name varchar(25), field_weather_sum int)
insert into @field
select 1, 'Qualcomm Stadium', 4 union all
select 2, 'Safeco Field', 11


declare @field_id int
set @field_id = 1

select field_name, weather_name
from @weather w
inner join @field f
on (f.field_weather_sum & w.bit_group) > 0
where f.field_id = @field_id


Nathan Skerl
Go to Top of Page

Frank Kalis
Constraint Violating Yak Guru

413 Posts

Posted - 2006-01-04 : 02:50:12
I wouldn't use bitmasks in SQL Server. Used in a WHERE clause they will always result in a scan, which is likely to hurt performance. Consider this:

SELECT TOP 200 Number = IDENTITY(INT)
INTO #test
FROM Sysobjects S1
CROSS JOIN Sysobjects S2
CROSS JOIN Sysobjects S3

GO
/*ALTER TABLE #test
ADD c2 CHAR(10)*/

CREATE UNIQUE NONCLUSTERED INDEX ix_test ON #test(Number)

SELECT *
FROM #test
WHERE Number & 1983 = 177

DROP TABLE #test
GO


Also, I don't agree that using separated columns will widen your table. If you use the BIT data type, you can pack 32 BIT columns into the same space one INTEGER column occupies.

--
Frank Kalis
Microsoft SQL Server MVP
http://www.insidesql.de
Heute schon gebloggt? http://www.insidesql.de/blogs
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2006-01-04 : 07:19:55
"And also, keeping all such columns as separate columns makes table very wide"

As Frank said: this is not the case. From BoL:

quote:
Microsoft SQL Server optimizes the storage used for bit columns. If there are 8 or fewer bit columns in a table, the columns are stored as 1 byte. If there are from 9 through 16 bit columns, they are stored as 2 bytes, and so on.


However, to keep the bit/byte count to a minimum they will ideally be declared as NOT NULL - but then you would have to have a value of either YES or NO (but there again your BITMAP plan doesn't allow for NULLs/UNKNOWNs

Kristen
Go to Top of Page

blindman
Master Smack Fu Yak Hacker

2365 Posts

Posted - 2006-01-04 : 11:08:27
quote:
Originally posted by Frank Kalis

I wouldn't use bitmasks in SQL Server. Used in a WHERE clause they will always result in a scan, which is likely to hurt performance.
Bit values will also result in scans. SQL Server rarely uses an index on a bit datatype.
In contrast, you can benefit from an index on a bitmask, and searches such as "return all records where RAINY AND CLOUDY" would use the index very efficiently, and would be faster than searching on bit values.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2006-01-04 : 12:47:27
"you can benefit from an index on a bitmask"

I'm probably being thick. How do you search for bits 4 & 5 set on a bitmask (in a way that will make use of an index)?

Kristen
Go to Top of Page

blindman
Master Smack Fu Yak Hacker

2365 Posts

Posted - 2006-01-04 : 13:57:31
With bits 4 and 5 set to True, your search bitmask = 11000, which is the bitmask equivalent of numeric 24. So, search where bitmask = 24.

Granted, this returns records where ONLY bits 4 and 5 are set, but it would use an index and would be the fasted search method.
Go to Top of Page

ujjaval
Posting Yak Master

108 Posts

Posted - 2006-01-04 : 17:35:19
Thank you for all your inputs.

Yesterday I tried implementing the bitmap column for this data that I mentioned. What I did was created an INTEGER column which stores a number that ultimately is the combination of Rainy, Windy, Sunny.. data in terms of 1 and 0 in binary format. I had also implemented stored procedure that can encode and decode this number to and from the raw data.

And I think this is the good way to store such data. Then I have a trouble running the queries on it in SQL Server. The users of database should be able to run queries as in Nathan Skerl mentioned. But somehow I don't think it as the good way. Though I don't have any promising reason for that.

So, all I am left with is have a single large varchar column that can store this data which is really bad. The other was to have BIT columns for each data field as in Rainy, Windy etc. And the third one is even worse that I have separate columns for each data field storing 'yes' and 'no' in it.

None of this is best solution. But still can't decide which one is most suitable. On the second thought BIT columns looks better. But that widens the table. So, then thought comes to have only a single large varchar column.

Ohh! I am confused.

'select * from example_table where WeatherCondition='Rainy, Windy, Dark'.

And the SQL Server should have some function in it that can encode given WeatherCondition into a number and then select statement runs.So, I thought I could make a function for that instead of procedure which can be used in above query as

'select * from example_table where WeatherCondition=function_encode('Rainy, Windy, Dark')'.

But then being a novice user I realized that function feature is not supported in SQL Server 7 which I have to use right now.

Then, the other option was to have a join operation for this as in with a table storing the bitmap integer value and its 'Rainy, Windy etc. data combination as
quote:

Go to Top of Page

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2006-01-04 : 19:29:49
quote:
Originally posted by blindman

With bits 4 and 5 set to True, your search bitmask = 11000, which is the bitmask equivalent of numeric 24. So, search where bitmask = 24.

Granted, this returns records where ONLY bits 4 and 5 are set, but it would use an index and would be the fasted search method.



I doubt if SQL Server could effectively use an index to search such a low cardinality column, with only 32 possible values. If you wanted to have a true search for all the possible values where 4 and 5 are true, you would have to combine this with all the combinations of the other 3 bits for 8 possible values. A table scan would probably be faster than an index search and bookmark lookups.

If you really want to use a bitmap, you could use a tinyint to reduce the amount of space used to one byte.



CODO ERGO SUM
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2006-01-05 : 05:04:45
"On the second thought BIT columns looks better. But that widens the table"

I reckon you are a bit off-beam on that!

If you have multiple BIT columns, scattered through your table, SQL Server combines them all into a single byte (or multiple bytes if you have more than 8 of them), so its actually very efficient.

If the BIT columns allow NULL then an additional Bit is allocated to record whether the bit-column is NULL, or not (so you can save a bit by defining the column as NOT NULL - assuming you can set a sensible default value on it, or you will always know the value, when the record is first created).

However, you cannot put an index on a bit column - and it would be useless anyway 'coz SQL isn't going to bother to use an index on a column that has only two values (unless 99.9999% of the rows have one value, and the remaining tiny minority have the other value!)

If the user chooses "Rainy, Windy, Dark" (from Check Boxes perhaps?) you need to generate the SQL

SELECT *
FROM MyTable
WHERE MyRainyBitColumn = 1
AND MyWindyBitColumn = 1
AND MyDarkBitColumn = 1

Alternatively you could have parameters @Rainy, @Windy and @Dark which are set to 1 or 0 for MUST match, or NULL for "Don't care" and then you could do:

SELECT *
FROM MyTable
WHERE (@Rainy IS NULL OR MyRainyBitColumn = @Rainy)
AND (@Windy IS NULL OR MyWindyBitColumn = @Windy)
AND (@Dark IS NULL OR MyDarkBitColumn = @Dark)

Kristen
Go to Top of Page

Frank Kalis
Constraint Violating Yak Guru

413 Posts

Posted - 2006-01-05 : 05:47:56
quote:
Bit values will also result in scans. SQL Server rarely uses an index on a bit datatype.
In contrast, you can benefit from an index on a bitmask, and searches such as "return all records where RAINY AND CLOUDY" would use the index very efficiently, and would be faster than searching on bit values.


You obviously have a point here regarding the scans, but I would agree with Michael about the efficiency of an index. On the other hand, SQL Server can combine indexes depending on the queries, so even an index on a BIT column might turn out useful.
Anyway, probably the most compelling reason to avoid a bitmask is that it is multivalued, thus violating 1NF. So, at least to me, using separated columns is a much cleaner approach.

--
Frank Kalis
Microsoft SQL Server MVP
http://www.insidesql.de
Heute schon gebloggt? http://www.insidesql.de/blogs
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2006-01-05 : 06:02:57
How about a 1:many linked "Attributes" table?

Kristen
Go to Top of Page

blindman
Master Smack Fu Yak Hacker

2365 Posts

Posted - 2006-01-05 : 10:02:15
Michael Valentine Jones:
"I doubt if SQL Server could effectively use an index to search such a low cardinality column, with only 32 possible values."
--The bitmask column would actually be very high cardinality, as opposed to bit datatypes which only take two values.

"If you wanted to have a true search for all the possible values where 4 and 5 are true, you would have to combine this with all the combinations of the other 3 bits for 8 possible values."
--Yes, as I stated, the index could only be used when the exact values were being sought.

Kristen:
"However, you cannot put an index on a bit column "
Actually, you can put an index on a bit column, but as you point out it would be unlikely to be used and would not be efficient.

Frank Kalis:
"Anyway, probably the most compelling reason to avoid a bitmask is that it is multivalued, thus violating 1NF. "
--The schema design proposed by the original poster violates rules of normalization. Ideally, these attributes should be stored in a subtable, which would allow additional attributes to be added without redesigning the schema. Having separate columns for each attribute (bit, tinyint, or whatever) is definitely NOT normalized and is not something I would recommend. The use of a bitmask is closer to logical normalization, allowing up to 64 separate attributes before requiring a change to the schema. And while a subtable is the usual method for handling N attributes, the bitmask column does allow you to easily perform some operations which are difficult with the use of a subtable, such as finding cases where two records have an identical set of attributes.
Go to Top of Page

mmarovic
Aged Yak Warrior

518 Posts

Posted - 2006-01-05 : 10:21:56
Actually, cardinality is not deciding factor for index efficiency, it is data distribution. Even index on bit column can be efficient if query searches on value that appears in 5% of rows.
Go to Top of Page

Frank Kalis
Constraint Violating Yak Guru

413 Posts

Posted - 2006-01-05 : 10:29:42
I'm not going to argue on this with you as my English is too weak for such a discussion.

--
Frank Kalis
Microsoft SQL Server MVP
http://www.insidesql.de
Heute schon gebloggt? http://www.insidesql.de/blogs
Go to Top of Page

Frank Kalis
Constraint Violating Yak Guru

413 Posts

Posted - 2006-01-05 : 10:30:23
Oops, Mirko, you came in between. My posting was aimed at the last comment from blindman.

--
Frank Kalis
Microsoft SQL Server MVP
http://www.insidesql.de
Heute schon gebloggt? http://www.insidesql.de/blogs
Go to Top of Page

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2006-01-05 : 10:55:18
quote:
Originally posted by blindman
Michael Valentine Jones:
"I doubt if SQL Server could effectively use an index to search such a low cardinality column, with only 32 possible values."
--The bitmask column would actually be very high cardinality, as opposed to bit datatypes which only take two values....


My point was not the cardinality of a bitmap compared to a bit column. My point was that it was such low cardinality that a table scan would probably be a faster, and that SQL Server would probably ignore any index and do a table scan.

You could also create an index with all five bit columns. It would have the same cardinality as the index on the bitmap index, and SQL Server would be just as likely to ignore that.

If a data row is a typical size, 80 bytes for example, there would be about 90 rows on a page. With a random distribution, the chances are that SQL Server would have to access almost every data page after doing an index lookup, so it would just be faster to scan the table to begin with if these columns are your only search criteria.



CODO ERGO SUM
Go to Top of Page

mmarovic
Aged Yak Warrior

518 Posts

Posted - 2006-01-05 : 11:06:47
quote:
Originally posted by Michael Valentine Jones

quote:
Originally posted by blindman
Michael Valentine Jones:
"I doubt if SQL Server could effectively use an index to search such a low cardinality column, with only 32 possible values."
--The bitmask column would actually be very high cardinality, as opposed to bit datatypes which only take two values....


My point was not the cardinality of a bitmap compared to a bit column. My point was that it was such low cardinality that a table scan would probably be a faster, and that SQL Server would probably ignore any index and do a table scan.

You could also create an index with all five bit columns. It would have the same cardinality as the index on the bitmap index, and SQL Server would be just as likely to ignore that.

If a data row is a typical size, 80 bytes for example, there would be about 90 rows on a page. With a random distribution, the chances are that SQL Server would have to access almost every data page after doing an index lookup, so it would just be faster to scan the table to begin with if these columns are your only search criteria.



CODO ERGO SUM

That was not my point either. My point is: If your query is going to return 5% of values from the table having significant enough number of rows based on one column value, that index will provide better performance, regardless of column cardinality (if cardinality is number of possible or actuall column values). In our example, if we are searching for Rainy day in Sahara, index on Rainy column would be used. Random distribution is a key assumptation you made, my point is that efficiency depends on data distribution, so what if distribution is not random?
Go to Top of Page

mmarovic
Aged Yak Warrior

518 Posts

Posted - 2006-01-05 : 11:26:11
quote:
Originally posted by Frank Kalis

I'm not going to argue on this with you as my English is too weak for such a discussion.

--
Frank Kalis
Microsoft SQL Server MVP
http://www.insidesql.de
Heute schon gebloggt? http://www.insidesql.de/blogs


Frank, I got it, but it was funy anyway
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2006-01-05 : 11:33:59
"Actually, you can put an index on a bit column"

Sorry my mistake. I'm sure I tried to do something with a BIT column recently and had to cast it to INT. I thought maybe it was ORDER BY but BoL is silent on that, so presumably it wasn't that either

I don't get the Index issue for a different reason though:

<fx:pouring petrol on the flames!>

"With bits 4 and 5 set to True, your search bitmask = 11000, which is the bitmask equivalent of numeric 24. So, search where bitmask = 24.

Granted, this returns records where ONLY bits 4 and 5 are set, but it would use an index and would be the fasted search method.
"

So, apart from the specific case where I want "bits 4 and 5 set to True and ALL OTHER BITS ZERO" I would say:

SELECT *
FROM MyTable
WHERE (MyBitMask & 24) = 24

which isn't going to use an index on MyBitMask, is it?

(I'm making the assuming that the more common query would be "bits 4 and 5 set to True and ALL OTHER DON'T CARE" - of course if this isn't the case then the index is good-to-go!!)

Kristen
Go to Top of Page

blindman
Master Smack Fu Yak Hacker

2365 Posts

Posted - 2006-01-05 : 11:57:33
You can't use aggregate functions on bit datatypes, so maybe that was what you were thinking about. But also, Enterprise Manager throws an error if you try to index a bit column, saying that it is not allowed, though you CAN create the index through query analyzer.

No argument on the "bits 4 and 5 and all others don't care" case. I think a scan would be unavoidable.

I still take issue with MVJ stating that the bitmask column would have a low cardinality. With just six different attributes you have 64 potential bitmask values, and of course the number of potential values doubles with each new attribute. And if the optimizer didn't make use of the index in a "Where [bitmask] = 24" clause, then I don't know when it ever would.
Go to Top of Page
    Next Page

- Advertisement -