Quadruple-precision floating-point format

From Wikipedia, the free encyclopedia

Jump to: navigation, search

In computing, quadruple precision (also commonly shortened to quad precision) is a binary floating-point computer numbering format that occupies 16 bytes (128 bits in modern computers) in computer memory.

In IEEE 754-2008 the 128-bit base 2 format is officially referred to as binary128.

IEEE 754 quadruple precision binary floating-point format: binary128

The IEEE 754 standard specifies a binary128 as having:

The format is written with an implicit lead bit with value 1 unless the exponent is stored with all zeros. Thus only 112 bits of the significand appear in the memory format, but the total precision is 113 bits (approximately 34 decimal digits, \log_{10}(2^{113}) \approx 34.016). The bits are laid out as follows:

IEEE 754 Quadruple Floating Point Format.svg

Exponent encoding

The quadruple precision binary floating-point exponent is encoded using an offset binary representation, with the zero offset being 16383; also known as exponent bias in the IEEE 754 standard.

  • Emin = 0x0001−0x3fff = −16382
  • Emax = 0x7ffe−0x3fff = 16383
  • Exponent bias = 0x3fff = 16383

Thus, as defined by the offset binary representation, in order to get the true exponent the offset of 16383 has to be subtracted from the stored exponent.

The stored exponents 0x0000 and 0x7fff are interpreted specially.

Exponent Significand zero Significand non-zero Equation
0x0000 0, −0 subnormal numbers (-1)^{\text{signbit}} \times 2^{-16382} \times 0.\text{significandbits}_2
0x0001, ..., 0x7ffe normalized value (-1)^{\text{signbit}} \times 2^{{\text{exponentbits}_2} - 16383} \times 1.\text{significandbits}_2
0x7fff ±infinity NaN (quiet, signalling)

The maximum representable value is ≈ 1.1897 × 104932.

Quadruple precision examples

These examples are given in bit representation, in hexadecimal, of the floating point value. This includes the sign, (biased) exponent, and significand.

3fff 0000 0000 0000 0000 0000 0000 0000   = 1
c000 0000 0000 0000 0000 0000 0000 0000   = -2

7ffe ffff ffff ffff ffff ffff ffff ffff   ≈  1.189731495357231765085759326628007 × 104932 (max quadruple precision)

0000 0000 0000 0000 0000 0000 0000 0000   = 0
8000 0000 0000 0000 0000 0000 0000 0000   = -0

7fff 0000 0000 0000 0000 0000 0000 0000   = infinity
ffff 0000 0000 0000 0000 0000 0000 0000   = -infinity
                                
3ffd 5555 5555 5555 5555 5555 5555 5555   ≈  1/3

By default, 1/3 rounds down like double precision, because of the odd number of bits in the significand. So the bits beyond the rounding point are 0101... which is less than 1/2 of a unit in the last place.

See also

External links