Bit Manipulation Instruction Sets
Bit Manipulation Instructions Sets (BMI sets) are extensions to the x86 instruction set architecture for microprocessors from Intel and AMD. The purpose of these instruction sets is to improve the speed of bit manipulation. All the instructions in these sets are non-SIMD and operate only on general-purpose registers.
There are two sets published by Intel: BMI1 and BMI2; they were both introduced with the Haswell microarchitecture. Another two sets were published by AMD: ABM (Advanced Bit Manipulation, which is also a subset of SSE4a implemented by Intel as part of SSE4.2 and BMI1), and TBM (Trailing Bit Manipulation, an extension introduced with Piledriver processors as an extension to BMI1).
ABM (Advanced Bit Manipulation)
ABM is only implemented as a single instruction set by AMD. Intel considers
POPCNT as part of SSE4.2, and
LZCNT as part of BMI1.
POPCNT has a separate CPUID flag, however Intel uses AMD's
ABM flag to indicate
LZCNT support (since
LZCNT completes the ABM).
||Leading zeros count|
BMI1 (Bit Manipulation Instruction Set 1)
The instructions below are those enabled by the
BMI bit in CPUID. Intel officially considers
LZCNT as part of BMI, but advertises
LZCNT support using the
ABM CPUID feature flag. BMI1 is available in AMD's Jaguar, Piledriver and newer processors, and in Intel's Haswell and newer processors.
||Logical and not|
||Bit field extract (with register)|
||Extract lowest set isolated bit|
||Get mask up to lowest set bit|
||Reset lowest set bit|
||Count the number of trailing zero bits|
BMI2 (Bit Manipulation Instruction Set 2)
Intel introduced BMI2 together with BMI1 in its line of Haswell processors. Only AMD has produced processors supporting only BMI1 without BMI2; support for BMI2 is planned for AMD's next architecture, Excavator.
||Zero high bits starting with specified bit position|
||Unsigned multiply; returns both low and high result by using both source registers as destination|
||Parallel bits deposit|
||Parallel bits extract|
||Rotate right logical|
||Shift arithmetic right|
||Shift logical right|
||Shift logical left|
Parallel bit deposit and extract
PEXT instructions are new generalized bitwise pack and unpack instructions. They take two inputs; one is a source, and the other is a selector. The selector is a bitmap selecting the bits that are to be packed or unpacked. This can be used to extract any bitfield of the input, and even do a lot of bit-level shuffling that previously would have been expensive. With the same selector, these two instructions are inverse each to other. While what these instructions do is similar to some SIMD instructions,
PEXT instructions (like the rest of the BMI instruction sets) operate on general-purpose registers.
Below are a few 8-bit examples of these operations:
|Input||Selector example||Parallel bit extract||Parallel bit deposit|
TBM (Trailing Bit Manipulation)
TBM consists of instructions complementary to the instruction set started by BMI1; their complementary nature means they do not necessarily need to be used directly but can be generated by an optimizing compiler when supported. AMD introduced TBM together with BMI1 in its Piledriver line of processors; AMD Jaguar processors do not support TBM.
||Bit field extract (with immediate)|
||Fill from lowest clear bit|
||Isolate lowest clear bit|
||Isolate lowest clear bit and complement|
||Mask from lowest clear bit|
||Set lowest clear bit|
||Fill from lowest set bit|
||Isolate lowest set bit and complement|
||Inverse mask from trailing ones|
||Mask from trailing zeros|
- AMD Barcelona-based processors (ABM supported)
- AMD Bulldozer-based processors (ABM supported)
- AMD Piledriver-based processors (ABM, BMI1 and TBM supported)
- AMD Steamroller-based processors (ABM, BMI1 and TBM supported)
- AMD Excavator-based processors (ABM, BMI1, BMI2 and TBM supported)
- AMD Bobcat-based processors (ABM supported)
- AMD Jaguar-based processors (ABM and BMI1 supported)
- AES instruction set
- CLMUL instruction set
- FMA instruction set
- Advanced Vector Extensions (AVX)
- XOP instruction set
- "New "Bulldozer" and "Piledriver" Instructions". Retrieved 2014-01-03.
- "Intel Advanced Vector Extensions Programming Reference" (PDF). intel.com. Intel. June 2011. Retrieved 2014-01-03.
- "AMD64 Architecture Programmer's Manual, Volume 3: General-Purpose and System Instructions" (PDF). amd.com. AMD. October 2013. Retrieved 2014-01-02.
- "AMD Excavator Core May Bring Dramatic Performance Increases". X-bit labs. October 18, 2013. Retrieved November 24, 2013.
- "chessprogramming - BMI2". Retrieved 2014-02-09.
- Yedidya Hilewitz; Ruby B. Lee (August 2009). "A New Basis for Shifters in General-Purpose Processors for Existing and Advanced Bit Manipulations" (PDF). palms.princeton.edu 58 (8). IEEE Transactions on Computers. Retrieved 2014-02-10.
- "chessprogramming - TBM". Retrieved 2014-02-09.
- "Family 16h AMD A-Series Data Sheet" (PDF). amd.com. AMD. October 2013. Retrieved 2014-01-02.
- "BIOS and Kernel Developer's Guide for AMD Family 14h". Retrieved 2014-01-03.