Fast inverse square root
Fast inverse square root (sometimes referred to as Fast InvSqrt() or by the hexadecimal constant 0x5f3759df) is a method of calculating x-½, the reciprocal (or multiplicative inverse) of a square root for a 32-bit floating point number in IEEE 754 floating point format. The algorithm was probably developed at Silicon Graphics in the early 1990s, and an implementation appeared in 1999 in the Quake III Arena source code, but the method did not appear on public forums such as Usenet until 2002 or 2003.[1] At the time, the primary advantage of the algorithm came from avoiding computationally expensive floating point operations in favor of integer operations. Inverse square roots are used to compute angles of incidence and reflection for lighting and shading in computer graphics.
The algorithm accepts a 32-bit unsigned floating point number as the input and stores a halved value for later use. Then, treating the bits representing the floating point number as a 32-bit integer, a logical shift right of one bit is performed and the result subtracted from the "magic" value 0x5f3759df. This is the first approximation of the inverse square root of the input. Treating the bits again as floating point it runs one iteration of Newton's method to return a more precise approximation. This computes an approximation of the inverse square root of a floating point number approximately four times faster than floating point division.
The algorithm was originally attributed to John Carmack, but an investigation showed that the code had deeper roots in both the hardware and software side of computer graphics. Adjustments and alterations passed through both Silicon Graphics and 3dfx Interactive, with Gary Tarolli's implementation for the SGI Indigo as the earliest known use. It is not known how the constant was originally derived, though investigation has shed some light on possible methods.
Contents |
[edit] Motivation
The inverse square root of a floating point number is used in calculating a normalized vector.[2] Since a 3D graphics program uses these normalized vectors to determine lighting and reflection, millions of these calculations must be done per second. Before the creation of specialized hardware to handle transform and lighting, software computations could be slow. Specifically, when the code was developed in the early 1990s, most floating point processing power lagged behind the speed of integer processing.[1]
To normalize a vector, the length of the vector is determined by calculating its Euclidean norm: the square root of the sum of squares of the vector components. When each component of the vector is divided by that length, the new vector will be a unit vector pointing in the same direction.
is the Euclidean norm of the vector, analogous to the calculation of the Euclidean distance between two points in Euclidean space.
is the normalized (unit) vector. Using
to represent
,
, which relates the unit vector to the inverse square root of the distance components.
Quake III Arena used the fast inverse square root algorithm to speed graphics processing unit computation, but the algorithm has since been implemented in some dedicated hardware vertex shaders using Field-programmable gate arrays (FPGA).[3]
[edit] Overview of the code
The following code is the fast inverse square root implementation from Quake III Arena, stripped of C preprocessor directives, but including the exact original comment text:[4]
float Q_rsqrt( float number ) { long i; float x2, y; const float threehalfs = 1.5F; x2 = number * 0.5F; y = number; i = * ( long * ) &y; // evil floating point bit level hacking i = 0x5f3759df - ( i >> 1 ); // what the fuck? y = * ( float * ) &i; y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration // y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed return y; }
In order to determine the inverse square root, an approximation for x − 1 / 2 would be determined by the software, then some numerical method would revise that approximation until it came within an acceptable error range of the actual result. Common software methods in the early 1990s drew a first approximation from a lookup table.[5] This bit of code proved faster than table lookups and approximately four times faster than regular floating point division.[6] Some loss of precision occurred, but was offset by the significant gains in performance.[7] The algorithm was designed with the IEEE 754-2008 32-bit floating point specification in mind, but investigation from Chris Lomont and later Charles McEniry showed that it could be implemented in other floating point specifications.
The advantages in speed offered by the fast inverse square root kludge came from treating the longword[note 1] containing the floating point number as an integer then subtracting it from a specific constant, 0x5f3759df. The purpose of the constant is not immediately clear to someone viewing the code, so, like other such constants found in code, it is often called a "magic number".[1][8][9][10] This integer subtraction and bit shift results in an longword which when treated as a floating point number is a rough approximation for the inverse square root of the input number. One iteration of Newton's method is performed to gain some precision, and the code is finished. The algorithm generates reasonably accurate results using a unique first approximation for Newton's method, however it is much slower and less accurate than using the SSE instruction rsqrtss on x86 processors, and also released in 1999.[11]
[edit] Aliasing from floating point to integer
Breaking this down requires some understanding of how a floating point number is stored. A floating point number represents a rational number expressed in three portions across 32 bits. The first, the sign bit, is 0 for positive numbers and 1 for negative numbers. The next 8 bits form the exponent, which is biased in order to result in a range of values from −126 to 127. The significand comprises the next 23 bits and represents the significant digits of the number stored. This is expressed as x = ( − 1)Si(1 + m)2E − B where the bias B = 127.[12] The value of the exponent determines whether the significand (referred to as the mantissa by Lomont 2003 and McEniry 2007) represents a fraction or an integer.[13]
| sign bit | |||||||||
| 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | = | 127 |
| 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | = | 2 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | = | 1 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | = | 0 |
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | = | −1 |
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | = | −2 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | = | −127 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | = | −128 |
A positive, signed integer represented in a two's complement system has a first bit of 0, followed by a binary representation of the value. Aliasing the floating point number as a two-complement integer gives an integer value of I = E * 223 + M where I is the integer value of the number, E the exponent and M the significand. Since the inverse square root function deals only with positive numbers, the floating point sign bit (Si) must be 0. This ensures that the resulting signed integer is also positive. The aliasing makes possible the computationally inexpensive operations that follow. The first of these, the 1-bit right shift, divides the integer by 2.[14]
[edit] The "magic number"
| S(ign) | E(xponent) | M(antissa) |
|---|---|---|
| 1 bit | b bits | (n-1-b) bits |
| n bits[15] | ||
The selection of 0x5f3759df as a constant prompted much of the original speculation surrounding the fast inverse square root function. In an attempt to determine how a programmer might have originally determined that constant as a mechanism to approximate the inverse square root, Charles McEniry first determined how the choice of any constant R could give a first approximation for the inverse square root. Recalling the integer and floating point comparison from above, note that x, our floating point number, is
and Ix, our integer value for that same number, is Ix = ExL + Mx.[note 2] These identities introduce a few new elements, which are simply restatements of values for the exponent and significand.
where L = 2n − 1 − b.- ex = Ex − B where B = 2b − 1 − 1.
Per these identities, a bit shift operation and integer subtraction can reliably output an approximation for the inverse square root of a floating point number. The illustration from McEniry 2007 proceeds:
taking the binary logarithm or log 2 of both sides. The binary logarithm is the inverse function of f(n) = 2n and makes the multiplied terms in the floating point numbers x and y reduce to addition. The relationship between log 2(x) and log 2(x − 1 / 2) is linear, allowing an equation to be constructed which can express x and y0 (The input and first approximation) as a linear combination of terms.[12] McEniry introduces a term σ which serves as an approximation for
in an intermediate step toward approximating R.[note 3] Since
,
,
can now be defined. This definition affords a first approximation of the binary logarithm. For our purposes, σ is a real number bounded by [0,1/3]—for an R equal to 0x5f3759df, σ = 0.0450461875791687011756.[15]
Using the identities for Mx, Ex, B and L:
Rearranging of terms leads to:
The integer value of a strictly positive floating point number x is Ix = ExL + Mx. This gives an expression for the integer value of y (where
, our first approximation for the inverse square root) in terms of the integer components of x. Specifically,
where
.
The equation
is the line i = 0x5f3759df - (i>>1); in Fast InvSqrt(), the integer approximation for y0 is the integer value for x, shifted to the right and subtracted from R.[15] McEniry's proof here shows only that a constant R can be used to approximate the integer value of the inverse square root of a floating point number. It does not prove that R assumes the value used in the code itself.
A relatively simple explanation for how a bit shift and a subtraction operation using the expected value of R results in division of the exponent E by negative two can be found in Chris Lomont's paper exploring the constant. As an example, for 10000 = 104, a division of the exponent by −2 would produce 10000 − 1 / 2 = 10 − 2 = 1 / 100. Since the exponent is biased, the true value of the exponent (here e) is e = E − 127, making the value of the biased result − e / 2 + 127.[16] Subtracting the integer from R (the "magic" number 0x5f3759df) forces the least significant bit of the exponent to carry into the significand and when returned to floating point notation, outputs a floating point number very close to the inverse square root of the input. The specific value of R was chosen to minimize the expected error in division of the exponent as well as the expected error in shifting the significand. 0xbe represents an integer value which minimizes the expected error resulting from division of the floating point exponent through bit shift operations—notably the value of 0xbe shifted one to the right is 0x5f, the first digits in the magic number R.[17]
[edit] Accuracy
As noted above, the approximation is surprisingly accurate. The graph on the right plots the error of the function (that is, the error of the approximation after it has been improved by running one iteration of Newton's method), for inputs starting at 0.01, where the standard library gives 10.0 as a result, while InvSqrt() gives 9.982522, making the difference 0.017479, or 0.175%. The absolute error only drops from then on, while the relative error stays within the same bounds across all orders of magnitude.
[edit] Newton's method
After performing those integer operations, the algorithm once again treats the longword as a floating point number (x = *(float*)&i;) and performs a floating point multiplication operation (x = x*(1.5f - xhalf*x*x);). The floating point operation represents a single iteration of Newton's method of finding roots for a given equation. For this example,
is the inverse square root, or, as a function of y,
.
- As
represents a general expression of Newton's method with
as the first approximation,
is the particularized expression where
and
.
- Hence
x = x*(1.5f - xhalf*x*x);is the same as
The first approximation is generated above through the integer operations and input into the last two lines of the function. Repeated iterations of the algorithm, using the output of the function (yn + 1) as the input of the next iteration, cause the algorithm to converge on the root with increasing precision.[18] For the purposes of the Quake III engine, only one iteration was used. A second iteration remained in the code but was commented out.[10]
[edit] History and investigation
The source code for Quake III was not released until QuakeCon 2005, but copies of the fast inverse square root code appeared on Usenet and other forums as early as 2002 or 2003.[1] Initial speculation pointed to John Carmack as the probable author of the code, but he demurred and suggested it was written by Terje Mathisen, an accomplished assembly programmer who had previously helped id Software with Quake optimization. Mathisen had written an implementation of a similar bit of code in the late 1990s, but the original authors proved to be much further back in the history of 3D computer graphics with Gary Tarolli's implementation for the SGI Indigo as a possible earliest known use. Jim Blinn also demonstrated a simple approximation of the inverse square root in a 1997 column for IEEE Computer Graphics and Applications.[19] Rys Sommefeldt concluded that the original algorithm was devised by Greg Walsh at Ardent Computer in consultation with Cleve Moler of MATLAB fame, though no conclusive proof of authorship exists.[20]
It is not known precisely how the exact value for the magic number was determined. Chris Lomont developed a function to minimize approximation error by choosing the magic number R over a range. He first computed the optimal constant for the linear approximation step as 0x5f37642f, close to 0x5f3759df, but this new constant gave slightly less accuracy after one iteration of Newton's method.[21] Lomont then searched for a constant optimal even after one and two Newton iterations and found 0x5f375a86, which is more accurate than the original at every iteration stage.[21] He concluded by asking whether the exact value of the original constant was chosen through derivation or trial and error.[22] Lomont pointed out that the "magic number" for 64 bit IEEE754 size type double is 0x5fe6ec85e7de30da, but in fact it is close to 0x5fe6eb50c7aa19f9. Charles McEniry performed a similar but more sophisticated optimization over likely values for R. His initial brute force search resulted in the same constant that Lomont determined.[23] When he attempted to find the constant through weighted bisection, the specific value of R used in the function occurred, leading McEniry to believe that the constant may have originally been derived through "bisecting to a given tolerance".[24]
[edit] See also
[edit] Notes
- ^ Use of the type
longreduces the portability of this code on modern systems. For the code to execute properly,sizeof(long)must be 4 bytes, otherwise negative outputs may result. Under many modern 64-bit systems,sizeof(long)is 8 bytes. - ^ Floating point numbers are normalized—the significand is expressed as a number
. See David Goldberg (March 1991). "What Every Computer Scientist Should Know About Floating-Point Arithmetic". ACM Computing Surveys 23 (1): 5–48. doi:10.1145/103162.103163. for further explanation. - ^ Lomont 2003 approaches the determination of R in a different fashion, splitting R into R1 and R2 for the significand and exponent bits of R.
[edit] References
- ^ a b c d Sommefeldt, Rys (November 29, 2006). "Origin of Quake3's Fast InvSqrt()". Beyond3D. http://www.beyond3d.com/content/articles/8/. Retrieved 2009-02-12.
- ^ Blinn, Jim (2003). Jim Blinn's Corner: Notation, notation notation. Morgan Kaufmann. p. 130. ISBN 1558608605.
- ^ Middendorf, Lars; Mühlbauer, Felix; Umlauf, George; Bodba, Christophe (June 1, 2007). "Embedded Vertex Shader in FPGA". In Rettberg, Achin. Embedded System Design: Topics, Techniques and Trends. IFIP TC10 Working Conference:International Embedded Systems Symposium (IESS), et al.. Irvine, California: Springer. pp. 155–164. ISBN 978-0-387-72257-3.
- ^ "quake3-1.32b/code/game/q_math.c". Quake III Arena. id Software. ftp://ftp.idsoftware.com/idstuff/source/quake3-1.32b-source.zip. Retrieved 2010-11-15.
- ^ Eberly, David (2001). 3D Game Engine Design. Morgan Kaufmann. p. 504. ISBN 9781558605930.
- ^ Lomont, Chris (February, 2003). "Fast Inverse Square Root". p. 1. http://www.lomont.org/Math/Papers/2003/InvSqrt.pdf. Retrieved 2009-02-13.
- ^ McEniry, Charles (August, 2007). "The Mathematics Behind the Fast Inverse Square Root Function Code". p. 1. http://www.daxia.com/bibis/upload/406Fast_Inverse_Square_Root.pdf. Retrieved 2009-02-13.
- ^ Lomont 2003, p. 3
- ^ McEniry 2007, p. 2, 16
- ^ a b Eberly, David (2002). Fast Inverse Square Root. Geometric Tools. p. 2. http://www.geometrictools.com/Documentation/FastInverseSqrt.pdf. Retrieved 2009-03-22
- ^ Ruskin, Elan (2009-10-16). "Timing square root". Some Assembly Required. http://assemblyrequired.crashworks.org/2009/10/16/timing-square-root/. Retrieved 2010-09-13.
- ^ a b McEniry 2007, p. 2
- ^ Hennessey, John; Patterson, David A. (1998). Computer Organization and Design (2nd ed.). San Francisco, CA: Morgan Kaufmann Publishers. p. 276. ISBN 9781558604919. http://books.google.com/?id=7-ZQAAAAMAAJ.
- ^ Hennessey and Patterson, p. 305
- ^ a b c McEniry 2007, p. 3
- ^ Hennessey and Patterson, p. 278, 282
- ^ Lomont 2003, p. 5
- ^ McEniry 2007, p. 6
- ^ Blinn, Jim (July 1997). "Floating Point Tricks". Computer Graphics & Applications, IEEE 17 (4): 80–84. doi:10.1109/38.595279.
- ^ Sommefeldt, Rys (2006-12-19). "Origin of Quake3's Fast InvSqrt() - Part Two". Beyond3D. http://www.beyond3d.com/content/articles/15/. Retrieved 2008-04-19.
- ^ a b Lomont 2003, p.10
- ^ Lomont 2003, p. 10–11
- ^ McEniry 2007, p. 11–12
- ^ McEniry 2007, p. 16
[edit] Further reading
- Kushner, David (Aug 2002). "The wizardry of Id". IEEE Spectrum 39 (8): 42–47. doi:10.1109/MSPEC.2002.1021943.
[edit] External links
- Quake III Arena source code
- Margolin, Tomer (27 August 2005). "Magical Square Root Implementation In Quake III". CodeMaestro. The Coding Experience. http://www.codemaestro.com/reviews/9.
|
||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||
is the Euclidean norm of the vector, analogous to the calculation of the
is the normalized (unit) vector. Using
to represent
,
, which relates the unit vector to the inverse square root of the distance components.
where 





where
.
.
represents a general expression of Newton's method with
as the first approximation,
is the particularized expression where
and
.
. See