Bit field
|
|
This article is written like a manual or guidebook. Please help rewrite this article from a neutral point of view. (December 2011) |
A bit field is a common idiom used in computer programming to compactly store multiple logical values as a short series of bits where each of the single bits can be addressed separately. A bit field is most commonly used to represent integral types of known, fixed bit-width. A well-known usage of bit-fields is to represent single bit flags with each flag stored in a separate bit.[citation needed]
A bit field is distinguished from a bit array in that the latter is used to store a large set of bits indexed by integers and is often wider than any integral type supported by the language. Bit fields, on the other hand, typically fit within a machine word, and the denotation of bits is independent of their numerical index.
Contents |
[edit] Implementation
Although languages such as C or C++ have built-in support for bit fields, these can be still implemented manually, even in languages lacking native bit field support. It suffices to have a set of integer constants, to which each a power of two is assigned, that semantically associates each individual bit with its respective semantic state.
The bitwise operators AND, OR, and NOT are used in combination to set/unset flags, or to determine whether certain flags are set/unset, respectively. In order to do the latter, a bit mask is required with all its bits disabled except for those that are supposed to be tested. If AND-ing the value of the bit set with the to-be-tested bit mask results in the same bit mask, then all these bits are enabled in the original bit set.
[edit] Examples
Example implementation in C:
typedef int Preference; #define Preference_LikesIceCream (1 << 0) /* 0x01 */ #define Preference_PlaysGolf (1 << 1) /* 0x02 */ #define Preference_WatchesTV (1 << 2) /* 0x04 */ #define Preference_ReadsBooks (1 << 3) /* 0x08 */ Preference Preference_new(void) { return 0; } void Preference_set(Preference *this, int flag) { *this |= flag; } void Preference_reset(Preference *this, int flag) { *this &= ~flag; } void Preference_reverse(Preference *this, int flag) { *this ^= flag; } bool Preference_get(Preference *this, int flag) { return (*this & flag) != 0; }
This object-oriented class can be used as follows:
Preference preference = Preference_new(); Preference_set(&preference, Preference_LikesIceCream); Preference_set(&preference, Preference_PlaysGolf); assert(Preference_get(&preference, Preference_PlaysGolf)); assert(!Preference_get(&preference, Preference_WatchesTV));
For the flag, it is recommended to use int instead of unsigned char because an unsigned char will only support values up to 255 which can be easily exceeded in bit fields as the values grow exponentially (2^n), i.e., this value will be exceeded after 8 items. A value higher than int will reduce performance as flags are usually processed using one assembly instruction. With a higher value than int, more instructions may be required on some architectures, notably x86. Also, most compilers implement enumerations (like the above) using the int as well. Therefore, higher values than 31 couldn't be represented in an enumeration anyway (log(2147483647)/log(2) = 31, whereas 2147483647 is the upper limit of an int, values <0 aren't used). In theory, compilers could have used an unsigned type to make the available range wider. However, this is not the case for most compilers.
Instead of using hardcoded numerical representations for the powers of two (0x08), the use of the bit shift operator (1 << 3) with an incrementing shift operand is used for easier readability.
Kernighan and Ritchie's book, The C Programming Language, describes a method for defining and accessing fields directly. Using this method, bitwise operators are not needed as bit members can be accessed the same as members of a structure without the need to create a object-oriented class like the one above. An example using C's struct keyword and C++'s bool data type follows:
typedef struct Preferences { bool likesIceCream : 1; bool playsGolf : 1; bool watchesTv : 1; bool readsBooks : 1; } Preferences; Preferences fred; fred.likesIceCream = true; fred.playsGolf = true; fred.watchesTv = true; fred.readsBooks = false; if (fred.likesIceCream) { /* ... */ }
[edit] Drawbacks of the structure-based approach
Bit members in structures as presented above have potential practical drawbacks. First, the ordering of bits in memory is CPU dependent and memory padding rules can vary between compilers. In addition, less well-optimized compilers sometimes generate poor quality code for reading and writing bit members and there are potentially thread safety issues relating to bit fields because most machines cannot manipulate arbitrary sets of bits in memory, but must instead load and store whole words.[1] For instance, the following code would not be thread-safe, despite the use of a mutex for each member:
typedef struct Foo { int flag : 1; int counter : 15; } Foo; Foo myFoo; /* ... */ /* In thread 1 */ pthread_mutex_lock(&myMutexForFlag); myFoo.flag = !myFoo.flag; pthread_mutex_unlock(&myMutexForFlag); /* In thread 2 */ pthread_mutex_lock(&myMutexForCounter); myFoo.counter++; pthread_mutex_unlock(&myMutexForCounter);
The root of the problem is that on most machines it is impossible to load and store flag and counter separately, when both are stored in the same word. (In order for this to be thread-safe you should use a single mutex to protect both flag and counter, instead of two.)
[edit] Homogeneity and mathematical structure
In the examples above, the individual power-of-two values are declared as macros (ending up being the machine word). Since bit fields are in essence destined to be combined with the bitwise OR operator, such code fails the type safety principle when combining Preference_PlaysGolf | Preference_LikesIceCream that does not belong to the enumeration.
typedef enum Preference { /* This is a counter-example, don't use, read below. */ Preference_LikesIceCream = 1 << 0, Preference_PlaysGolf = 1 << 1, Preference_LikesWatchingTv = 1 << 2, Preference_ReadsBooks = 1 << 3 } Preference;
Quantities defined as the combination of bits are actually elements of the elementary abelian group (Z/2Z)n; and the relation defined as a <= b when (a & b) = a only creates a partial order, i.e. 1011b is greater than 1010b but 1011b and 1101b are not comparable (whereas the suffix b denotes that the numbers are binary[2]).[3]
This remark is of interest when designing variable importance debug channels (ranging from informative to fatal); the regular integer comparison cannot be used to filter out part of the messages.[which?]
Nevertheless, a bit field can be safely and elegantly implemented using a bit array where the bit indices for each flag are values of an enumerated type (like the EnumSet class in Java); this avoids the dangers of direct bitwise manipulations.
[edit] See also
- Mask (computing)
- Bitboard, used in chess and similar games.
- Bit array
- Flag word
[edit] External links
- Explanation from a book
- Description from another wiki
- Use case in a C++ guide
- C++ libbit bit library (alternative URL)
[edit] References
- ^ Threads Cannot Be Implemented As a Library
- ^ Chua, Hock-Chuan. "Integers, Floating-point Numbers, and Character Sets". http://www3.ntu.edu.sg/home/ehchua/programming/java/DataRepresentation.html.
- ^ PengOne. "Elementary abelian groups". http://stackoverflow.com/questions/6697680/elementary-abelian-groups.