Basic Tutorial on Bitfield Arithmetic
This article describes basic operations in manipulating bitfields using boolean operations. Although this article focuses in Java, most programming languages use the same syntax.
What is a Bitfield?
All modern computers use binary arithmetic - that means, the most basic unit of information is a bit - it’s value can either be 0 or 1. On almost all hardware implementations of binary arithmetic, you can’t adress and modify bits directly, but you have to use bytes (= 8 bits). Bitfields are vectors of bits where each bit expresses a specific piece of information that can either be true or false. Depending on the application the bitfield can occupy more or less space.
Possible applications for bitfields include, but are not limited to Bloom Filters and Game artificial intelligences. In the latter case they are called bitboards if they represent a specific state in a game board.
You might also use words (2 bytes = short in most configurations), double words (4 bytes = int in most configurations) or quad words (8 bytes = long in most configurations) instead of single bytes for addressing. On 64-bit platforms, double words or quad words are usually most efficient but anything beyond quad words (i.e. 64 bits) needs more than one instruction in the CPU, which usually makes computation inefficient (there are some tricks involving SIMD, but this is beyond the scope of this article).
Setting one or more bits in a bitfield
Setting a single bit in a bitfield is extremely easy, because it only involves the boolean OR operation. The bit to be set needs to be expressed as mask first - if $n$ is the number of the bit to be set, the appropriate mask is $2^n$. Then, you can use the following expression involving bitwise (!) OR to set the bit
bitfield |= mask;
If you want to set multiple bits, just OR all masks like this:
bitfield |= mask1 | mask2 | mask3;
Note that this code sets the bit regardless of whether they have been set before or not.
When using any programming language always verify that you are using the bitwise boolean operators. C/C++ usually doesn’t print a warning message if you use operators like ||
or &&
but these don’t work bitwise but treat the left-side and the right-side arguments as false only if they are zero (that is, there is no bit set).
Unsetting bits in a bitfield
Unsetting bits is almost as easy as checking them: You AND the bitfield with the negated (NOT) mask. Because of how the AND and NOT operations work, the bits that are not set in the original mask, are set in the negated mask and therefore remain the same as before. The negated mask is zero for the bits to be unset, therefore the result of the AND operation is always zero for these bits.
Code example:
bitfield &= ~mask;
Just like above, you can bitwise-OR masks to do multiple unsets at once - but remember to put parentheses around them.
bitfield &= ~(mask1 | mask2 | mask3);
Checking bits in a bitfield
Checking a bit in a bitfield is as easy as ANDing the bitfield with the corresponding mask. If the resulting number is zero, the checked bit is not set, if it isn’t zero the checked bit ist set.
byte result = bitfield & mask;
If you want to use multiple masks at once, remember that the result is zero if and only if none of the bits corresponding to the masks are set. You can’t see (without further operations) how many and which bits are set, if at least one is set.
byte result = bitfield & (mask1 | mask2 | mask3);
Efficiency of bitfields and possible alternatives
Obviously, bitfields use about 1/8 of the space of using one byte per attribute (most programming languages use one byte per attribute when using _boolean _arrays or similar). If your application needs a deterministic storage, a bitfield is the least space-consuming uncompressed type of storage. Because it’s uncompressed, no time-consuming compression and decompression operations need to be done.
Although bitfields need at least one boolean operation per read/write access, these operations are extremely fast (~1 clock cycle) on all types of processors. Additionally, using bitfields yields much better memory locality (because of the reduced size) that using 1-byte-per-attribute storages and therefore causes much fewer page faults. Overall, these effects yields better time performance in almost any case on real hardware although in theory they are slightly slower that 1-bit-per-attribute-storages.
If you can accept a given probability for false-positives, you should consider Bloom Filters as an alternative. Writing to them requires more time, but they require significantly less space while conserving zero false-negative probability.