Jump to content

Quadruple-precision floating-point format: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Ishvara7 (talk | contribs)
No edit summary
Ishvara7 (talk | contribs)
No edit summary
Line 47: Line 47:
=== Implementations ===
=== Implementations ===


Quadruple precision is currently implemented in [[C (programming language)|C]]/[[C++]] and [[Fortran]] as [[long double]] and REAL*16<ref>{{cite web|title=Single, Double, and Long Double Precision |url=http://docs.sun.com/app/docs/doc/801-7639/6i1ucu1ul?a=view|work=|publisher=Sun Microsystems|date=|accessdate=2010-01-23}}</ref> respectively. Not all compilers support the quad-precison type. Microsoft [[Visual C++]], for example, makes the [[long double]] type synomymous with the double-precison type<ref>MSDN homepage, about Visual C++ compiler [http://msdn.microsoft.com/en-us/library/9cx8xs15.aspx]</ref>. The [[GNU]] Fortran compiler also lacks support for the quad-precison, but proprietary compilers such as the [[Intel Fortran Compiler]] do support quad-precison<ref>{{cite web|title= Intel Fortran Compiler Product Brief |url=http://h21007.www2.hp.com/portal/download/files/unprot/intel/product_brief_Fortran_Linux.pdf|work=|publisher=Su|date=|accessdate=2010-01-23}}</ref>.
Quadruple precision is currently implemented in [[C (programming language)|C]]/[[C++]] and [[Fortran]] as [[long double]] and REAL*16<ref>{{cite web|title=Single, Double, and Long Double Precision |url=http://docs.sun.com/app/docs/doc/801-7639/6i1ucu1ul?a=view|work=|publisher=Sun Microsystems|date=|accessdate=2010-01-23}}</ref> respectively. Not all compilers support the quad-precison type. Microsoft [[Visual C++]], for example, makes the [[long double]] type synomymous with the double-precison type<ref>MSDN homepage, about Visual C++ compiler [http://msdn.microsoft.com/en-us/library/9cx8xs15.aspx]</ref>. The [[GNU]] Fortran compiler also lacks support for the quad-precison, but proprietary compilers such as the [[Intel Fortran Compiler]] do support quad-precison<ref>{{cite web|title= Intel Fortran Compiler Product Brief |url=http://h21007.www2.hp.com/portal/download/files/unprot/intel/product_brief_Fortran_Linux.pdf|work=|publisher=Su|date=|accessdate=2010-01-23}}</ref>. The fourth-generation programming language, [[MATLAB]], supports varying degrees of precision including quadruple precision with use of the "Variable Precision Arithmetc function"<ref>{{cite web|title= Technical Solutions |url=http://www.mathworks.com/support/solutions/en/data/1-1AGHW/index.html?product=SM&solution=1-1AGHW|work=|publisher=Su|date=|accessdate=2010-01-23}}</ref>. .


=== Quadruple precision examples ===
=== Quadruple precision examples ===

Revision as of 06:46, 24 January 2010

In computing, quadruple precision (also commonly shortened to quad precision) is a binary floating-point computer numbering format that occupies 16 bytes (128 bits in modern computers) in computer memory.

In IEEE 754-2008 the 128-bit base 2 format is officially referred to as binary128.

IEEE 754 quadruple precision binary floating-point format: binary128

The IEEE 754 standard specifies a binary128 as having:

The format is written with an implicit lead bit with value 1 unless the exponent is stored with all zeros. Thus only 112 bits of the significand appear in the memory format, but the total precision is 113 bits (approximately 34 decimal digits, ). The bits are laid out as follows:

Exponent encoding

The quadruple precision binary floating-point exponent is encoded using an offset binary representation, with the zero offset being 16383; also known as exponent bias in the IEEE 754 standard.

  • Emin = 0x0001−0x3fff = −16382
  • Emax = 0x7ffe−0x3fff = 16383
  • Exponent bias = 0x3fff = 16383

Thus, as defined by the offset binary representation, in order to get the true exponent the offset of 16383 has to be subtracted from the stored exponent.

The stored exponents 0x0000 and 0x7fff are interpreted specially.

Exponent Significand zero Significand non-zero Equation
0x0000 0, −0 subnormal numbers
0x0001, ..., 0x7ffe normalized value
0x7fff ±infinity NaN (quiet, signalling)

The maximum representable value is ≈ 1.1897 × 104932.

Implementations

Quadruple precision is currently implemented in C/C++ and Fortran as long double and REAL*16[1] respectively. Not all compilers support the quad-precison type. Microsoft Visual C++, for example, makes the long double type synomymous with the double-precison type[2]. The GNU Fortran compiler also lacks support for the quad-precison, but proprietary compilers such as the Intel Fortran Compiler do support quad-precison[3]. The fourth-generation programming language, MATLAB, supports varying degrees of precision including quadruple precision with use of the "Variable Precision Arithmetc function"[4]. .

Quadruple precision examples

These examples are given in bit representation, in hexadecimal, of the floating point value. This includes the sign, (biased) exponent, and significand.

3fff 0000 0000 0000 0000 0000 0000 0000   = 1
c000 0000 0000 0000 0000 0000 0000 0000   = -2

7ffe ffff ffff ffff ffff ffff ffff ffff   ≈  1.189731495357231765085759326628007 × 104932 (max quadruple precision)

0000 0000 0000 0000 0000 0000 0000 0000   = 0
8000 0000 0000 0000 0000 0000 0000 0000   = -0

7fff 0000 0000 0000 0000 0000 0000 0000   = infinity
ffff 0000 0000 0000 0000 0000 0000 0000   = -infinity
				
3ffd 5555 5555 5555 5555 5555 5555 5555   ≈  1/3

By default, 1/3 rounds down like double precision, because of the odd number of bits in the significand. So the bits beyond the rounding point are 0101... which is less than 1/2 of a unit in the last place.

See also

References

  1. ^ "Single, Double, and Long Double Precision". Sun Microsystems. Retrieved 2010-01-23.
  2. ^ MSDN homepage, about Visual C++ compiler [1]
  3. ^ "Intel Fortran Compiler Product Brief" (PDF). Su. Retrieved 2010-01-23.
  4. ^ "Technical Solutions". Su. Retrieved 2010-01-23.

External links