Quadruple-precision floating-point format
| This is an old revision of this page, as edited by 207.164.32.138 (talk) at 16:23, 24 July 2009 (→Exponent encoding). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision. |
In computing, quadruple precision (also commonly shortened to quad precision) is a binary floating-point computer numbering format that occupies 16 bytes (128 bits in modern computers) in computer memory.
In IEEE 754-2008 the 128-bit base 2 format is officially referred to as binary128.
| Floating-point precisions |
|---|
| IEEE 754 |
| Other |
Contents
IEEE 754 quadruple precision binary floating-point format: binary128
The IEEE 754 standard specifies a binary128 as having:
- Sign bit: 1
- Exponent width: 15
- Significand precision: 113 (112 explicitly stored)
The format is written with an implicit lead bit with value 1 unless the exponent is stored with all zeros. Thus only 112 bits of the significand appear in the memory format, but the total precision is 113 bits (approximately 34 decimal digits,
). The bits are laid out as follows:
Exponent encoding
The quadruple precision binary floating-point exponent is encoded using an offset binary representation, with the zero offset being 16383; also known as exponent bias in the IEEE 754 standard.
- Emin = 0x0001−0x3fff = −16382
- Emax = 0x7ffe−0x3fff = 16383
- Exponent bias = 0x3fff = 16383
Thus, as defined by the offset binary representation, in order to get the true exponent the offset of 16383 has to be subtracted from the stored exponent.
The stored exponents 0x0000 and 0x7fff are interpreted specially.
| Exponent | Significand zero | Significand non-zero | Equation |
|---|---|---|---|
| 0x0000 | 0, −0 | subnormal numbers | ![]() |
| 0x0001, ..., 0x7ffe | normalized value | ![]() |
|
| 0x7fff | ±infinity | NaN (quiet, signalling) | |
The maximum representable value is ≈ 1.1897 × 104932.
Quadruple precision examples
These examples are given in bit representation, in hexadecimal, of the floating point value. This includes the sign, (biased) exponent, and significand.
3fff 0000 0000 0000 0000 0000 0000 0000 = 1
c000 0000 0000 0000 0000 0000 0000 0000 = -2
7ffe ffff ffff ffff ffff ffff ffff ffff ≈ 1.189731495357231765085759326628007 × 104932 (max quadruple precision)
0000 0000 0000 0000 0000 0000 0000 0000 = 0
8000 0000 0000 0000 0000 0000 0000 0000 = -0
7fff 0000 0000 0000 0000 0000 0000 0000 = infinity
ffff 0000 0000 0000 0000 0000 0000 0000 = -infinity
3ffd 5555 5555 5555 5555 5555 5555 5555 ≈ 1/3
By default, 1/3 rounds down like double precision, because of the odd number of bits in the significand. So the bits beyond the rounding point are 0101... which is less than 1/2 of a unit in the last place.
See also
- IEEE Standard for Floating-Point Arithmetic (IEEE 754)
- ISO/IEC 10967, Language Independent Arithmetic
- Primitive data type
- long double

