Fast inverse square root

From Wikipedia, the free encyclopedia
Jump to: navigation, search
Lighting and reflection calculations (shown here in the first-person shooter OpenArena) use the fast inverse square root code to compute angles of incidence and reflection.

Fast inverse square root (sometimes referred to as Fast InvSqrt() or by the hexadecimal constant 0x5f3759df) is a method of calculating x−½, the reciprocal (or multiplicative inverse) of a square root for a 32-bit floating point number in IEEE 754 floating point format. The algorithm was probably developed at Silicon Graphics in the early 1990s, and an implementation appeared in 1999 in the Quake III Arena source code, but the method did not appear on public forums such as Usenet until 2002 or 2003.[1] At the time, the primary advantage of the algorithm came from avoiding computationally expensive floating point operations in favor of integer operations. Inverse square roots are used to compute angles of incidence and reflection for lighting and shading in computer graphics.

The algorithm accepts a 32-bit floating point number as the input and stores a halved value for later use. Then, treating the bits representing the floating point number as a 32-bit integer, a logical shift right of one bit is performed and the result subtracted from the "magic constant" 0x5f3759df. This is the first approximation of the inverse square root of the input. Treating the bits again as floating point it runs one iteration of Newton's method to return a more precise approximation. This computes an approximation of the inverse square root of a floating point number approximately four times faster than floating point division.

The algorithm was originally attributed to John Carmack, but an investigation showed that the code had deeper roots in both the hardware and software side of computer graphics. Adjustments and alterations passed through both Silicon Graphics and 3dfx Interactive, with Gary Tarolli's implementation for the SGI Indigo as the earliest known use. It is not known how the constant was originally derived, though investigation has shed some light on possible methods.

Motivation[edit]

Surface normals are used extensively in lighting and shading calculations, requiring the calculation of norms for vectors. A field of vectors normal to a surface is shown here.

The inverse square root of a floating point number is used in calculating a normalized vector.[2] Since a 3D graphics program uses these normalized vectors to determine lighting and reflection, millions of these calculations must be done per second. Before the creation of specialized hardware to handle transform and lighting, software computations could be slow. Specifically, when the code was developed in the early 1990s, most floating point processing power lagged behind the speed of integer processing.[1]

To normalize a vector, the length of the vector is determined by calculating its Euclidean norm: the square root of the sum of squares of the vector components. When each component of the vector is divided by that length, the new vector will be a unit vector pointing in the same direction.

\|\boldsymbol{v}\| = \sqrt{v_1^2+v_2^2+v_3^2} is the Euclidean norm of the vector, analogous to the calculation of the Euclidean distance between two points in Euclidean space.
\boldsymbol{\hat{v}} = \boldsymbol{v} / \|\boldsymbol{v}\| is the normalized (unit) vector. Using \|\boldsymbol{v}\|^2 to represent v_1^2+v_2^2+v_3^2,
\boldsymbol{\hat{v}} = \boldsymbol{v} / \sqrt{\|\boldsymbol{v}\|^2}, which relates the unit vector to the inverse square root of the distance components.

Quake III Arena used the fast inverse square root algorithm to speed graphics processing unit computation, but the algorithm has since been implemented in some dedicated hardware vertex shaders using field-programmable gate arrays (FPGA).[3]

Overview of the code[edit]

The following code is the fast inverse square root implementation from Quake III Arena, stripped of C preprocessor directives, but including the exact original comment text:[4]

float Q_rsqrt( float number )
{
	long i;
	float x2, y;
	const float threehalfs = 1.5F;
 
	x2 = number * 0.5F;
	y  = number;
	i  = * ( long * ) &y;                       // evil floating point bit level hacking
	i  = 0x5f3759df - ( i >> 1 );               // what the fuck?
	y  = * ( float * ) &i;
	y  = y * ( threehalfs - ( x2 * y * y ) );   // 1st iteration
//      y  = y * ( threehalfs - ( x2 * y * y ) );   // 2nd iteration, this can be removed
 
	return y;
}

In order to determine the inverse square root, an approximation for x^{-1/2} would be determined by the software, then some numerical method would revise that approximation until it came within an acceptable error range of the actual result. Common software methods in the early 1990s drew a first approximation from a lookup table.[5] This bit of code proved faster than table lookups and approximately four times faster than regular floating point division.[6] Some loss of precision occurred, but was offset by the significant gains in performance.[7] The algorithm was designed with the IEEE 754-1985 32-bit floating point specification in mind, but investigation from Chris Lomont and later Charles McEniry showed that it could be implemented in other floating point specifications.

The advantages in speed offered by the fast inverse square root kludge came from treating the longword[note 1] containing the floating point number as an integer then subtracting it from a specific constant, 0x5f3759df. The purpose of the constant is not immediately clear to someone viewing the code, so, like other such constants found in code, it is often called a "magic number".[1][8][9][10] This integer subtraction and bit shift results in a longword which when treated as a floating point number is a rough approximation for the inverse square root of the input number. One iteration of Newton's method is performed to gain some precision, and the code is finished. The algorithm generates reasonably accurate results using a unique first approximation for Newton's method; however, it is much slower and less accurate than using the SSE instruction rsqrtss on x86 processors also released in 1999.[11]

Aliasing from floating point to integer and back[edit]

Float w significand.svg

Breaking this down requires some understanding of how a floating point number is stored. A floating point number represents a rational number expressed in three portions across 32 bits. The example decimal value of 0.15625 is 0.00101 in binary which is to be normalised by multiplying by a suitable power of two so as to have a leading "one" bit, thus 1.01×2−3 which is to say an exponent of -3 and a significand of 1.01. Since all normalised numbers have their leading bit one, in this format it is not represented in storage but is implicit so that the represented significand is M = .01. The storage form then has the first bit, the sign bit, 0 for positive numbers and 1 for negative numbers. The next 8 bits form the exponent, which is biased in order to result in a range of values from −127 to 128 (if interpreted as an eight-bit two's complement number) so that the required exponent of -3 is represented as E = 124 (= -3 + 127). The significand comprises the next 23 bits and represents the significant digits of the number stored. This is expressed as x = (-1)^{\mathrm{Sign}}(1 + M)2^{E - B} where the bias B=127.[12] So, for the example, 0.15625_{10} = (-1)^{0}(1 + .01_2)2^{124 - 127} The value of the exponent determines whether the significand (referred to as the mantissa by Lomont 2003 and McEniry 2007) represents a fraction or an integer.[13]

sign bit
0 1 1 1 1 1 1 1 = 127
0 0 0 0 0 0 1 0 = 2
0 0 0 0 0 0 0 1 = 1
0 0 0 0 0 0 0 0 = 0
1 1 1 1 1 1 1 1 = −1
1 1 1 1 1 1 1 0 = −2
1 0 0 0 0 0 0 1 = −127
1 0 0 0 0 0 0 0 = −128
8-bit two's-complement integers

A positive, signed integer represented in a two's complement system has a first bit of 0, followed by a binary representation of the value. Aliasing the floating point number as a two-complement integer gives an integer value of I = E*2^{23} + M where I is the integer value of the number, E the exponent as stored (with bias applied and interpreted as an unsigned eight bit number) and M the significand as stored (with leading one omitted) and interpreted as an integer, not a fractional number. Since the inverse square root function deals only with positive numbers, the floating point sign bit (Sign) will always be 0 and so the resulting signed integer is also positive.

Notice in particular that the implicit leading-one bit as well as the exponent field means that the aliasing produces an integer very different from what would be the result of a float-to-integer value conversion. For the example, E = 124 and as an integer M becomes 221 because its only "on" bit is for power 21. That is, in binary it is 01000000000000000000000. not .01000000000000000000000. Thus I = 124*2^{23}+ 2^{21} which is 1,040,187,392 + 2,097,152 or 1,042,284,544. This "conversion" or rather, reinterpretation is done immediately, without the need for the complex adjustments that a non-zero exponent would require. Then follow a few likewise computationally inexpensive operations to obtain the initial estimate that will be improved via an application of Newton's method.

The first of these, the 1-bit right shift, divides the integer by two,[14] which result is then subtracted from a magic integer and the result re-aliased back to a floating point number to become the initial value. Notice that this shift halves the value in what will again be the mantissa field (but not the value of the mantissa because the implicit 1 remains: 1.(x) becomes 1.(x/2) not (1.x)/2), the value in what will again be the exponent field (treating it as unsigned), and also shifts the low-order exponent field bit 23 (zero in the example) into the high order end of the mantissa field (bit 22). The implicit-on bit of the floating point interpretation is not involved during these integer operations, as follow:

Seeeeeeee(1)mmmmmmmmmmmmmmmmmmmmmmm    1 Sign, 8 exponent bits, (implicit bit), 23 mantissa bits.
001111100   01000000000000000000000    1,042,284,544 The example value, i
000111110   00100000000000000000000      521,142,272 Shift right one.   i/2
010111110   01101110101100111011111    1,597,463,007 The magic number = 5F3759DF
010000000   01001110101100111011111    1,076,320,735 The result of 5F3759DF - i/2

Which is recast as a floating point number: the sign bit is still zero, the exponent field has E = 128 and the mantissa field has .01001110101100111011111 in binary which is 2578911/223 or 0.3074301481247 so that the result is y = (1 + .3074301481247)2^{128 - 127} = 2.614

\frac{1}{\sqrt{0.15625}} = 2.5298221281347, so this is a good first approximation. One application of the iteration gives 2.5255 and a second 2.529811. A third would exhaust the precision of the variable.

If instead the value were double, 0.3125, the mantissa would be the same but the exponent would be one higher, thus turning on bit 23 which via the halving of the integer value would turn up in the mantissa field (as bit 22) and with the implicit 1 bit results in an initial value of 1.807 as an approximation to 1.788854; the first iteration improves this to 1.788564.

The "magic number"[edit]

S(ign) E(xponent) M(antissa)
1 bit b bits (n-1-b) bits
n bits[15]

The selection of 0x5f3759df as a constant prompted much of the original speculation surrounding the fast inverse square root function. In an attempt to determine how a programmer might have originally determined that constant as a mechanism to approximate the inverse square root, Charles McEniry first determined how the choice of any constant R could give a first approximation for the inverse square root. Recalling the integer and floating point comparison from above, note that x, our floating point number, is \scriptstyle x=(1+m_x)2^{e_x} and I_x, our integer value for that same number, is I_x=E_xL+M_x.[note 2] These identities introduce a few new elements, which are simply restatements of values for the exponent and significand.

\textstyle m_x=\frac{M_x}{L} where L=2^{n-1-b}.
e_x=E_x-B where B=2^{b-1}-1.

The illustration from McEniry 2007 proceeds:

y=\frac{1}{\sqrt{x}}
\log_2{(y)}=-\frac{1}{2}\log_2{(x)}
\log_2(1+m_y)+e_y=-\frac{1}{2}\log_2{(1+m_x)}-\frac{1}{2}e_x

taking the binary logarithm or \log_2 of both sides. The binary logarithm is the inverse function of f(n)=2^n and makes the multiplied terms in the floating point numbers x and y reduce to addition. The relationship between \log_2{(x)} and \log_2{(x^{-1/2})} is linear, allowing an equation to be constructed which can express x and y_0 (The input and first approximation) as a linear combination of terms.[12] McEniry introduces a term \sigma which serves as an approximation for \scriptstyle \log_2{(1+x)} in an intermediate step toward approximating R.[note 3] Since \scriptstyle 0\le x< 1, \scriptstyle \log_2{(1+x)}\approx {x}, \scriptstyle \log_2{(1+x)}\cong x+\sigma can now be defined. This definition affords a first approximation of the binary logarithm. For our purposes, \sigma is a real number bounded by [0,1/3]—for an R equal to 0x5f3759df, \sigma=0.0450461875791687011756.[15]

m_y+\sigma+e_y=-\frac{1}{2}m_x-\frac{1}{2}\sigma-\frac{1}{2}e_x

Using the identities for M_x, E_x, B and L:

M_y+(E_y-B)L=-\frac{3}{2}\sigma{L}-\frac{1}{2}M_x-\frac{1}{2}(E_x-B)L

Rearranging of terms leads to:

E_yL+M_y=\frac{3}{2}(B-\sigma)L-\frac{1}{2}(E_xL+M_x)

The integer value of a strictly positive floating point number x is I_x=E_xL+M_x. This gives an expression for the integer value of y (where \textstyle y=\frac{1}{\sqrt{x}}, our first approximation for the inverse square root) in terms of the integer components of x. Specifically,

I_y=E_yL+M_y=R-\frac{1}{2}(E_xL+M_x)=R-\frac{1}{2}I_x where R=\frac{3}{2}(B-\sigma)L.

The equation \scriptstyle I_y=R-\frac{1}{2}I_x is the line i = 0x5f3759df - (i>>1); in Fast InvSqrt(), the integer approximation for y_0 is the integer value for x, shifted to the right and subtracted from R.[15] McEniry's proof here shows only that a constant R can be used to approximate the integer value of the inverse square root of a floating point number. It does not prove that R assumes the value used in the code itself.

A relatively simple explanation for how a bit shift and a subtraction operation using the expected value of R results in division of the exponent E by negative two can be found in Chris Lomont's paper exploring the constant. As an example, for 10000=10^4, a division of the exponent by −2 would produce 10000^{-1/2}=10^{-2}=1/100. Since the exponent is biased, the true value of the exponent (here e) is e=E-127, making the value of the biased result -e/2+127.[16] Subtracting the integer from R (the "magic" number 0x5f3759df) forces the least significant bit of the exponent to carry into the significand and when returned to floating point notation, outputs a floating point number very close to the inverse square root of the input. The specific value of R was chosen to minimize the expected error in division of the exponent as well as the expected error in shifting the significand. 0xbe represents an integer value which minimizes the expected error resulting from division of the floating point exponent through bit shift operations—notably the value of 0xbe shifted one to the right is 0x5f, the first digits in the magic number R.[17]

Accuracy[edit]

A graph showing the difference between the heuristic Fast Inverse Square Root and the inversion of square root supplied by libstdc.[citation needed] (Note logscale on both axes.)

As noted above, the approximation is surprisingly accurate. The graph on the right plots the error of the function (that is, the error of the approximation after it has been improved by running one iteration of Newton's method), for inputs starting at 0.01, where the standard library gives 10.0 as a result, while InvSqrt() gives 9.982522, making the difference 0.017479, or 0.175%. The absolute error only drops from then on, while the relative error stays within the same bounds across all orders of magnitude.

Newton's method[edit]

After performing those integer operations, the algorithm once again treats the longword as a floating point number (y = *(float*)&i;) and performs a floating point multiplication operation (y = y*(1.5f - xhalf*y*y);). The floating point operation represents a single iteration of Newton's method of finding roots for a given equation. For this example,

y=\frac{1}{\sqrt{x}} is the inverse square root, or, as a function of y,
f(y)=\frac{1}{y^2}-x=0.
As y_{n+1} = y_{n} - \frac{f(y_n)}{f'(y_n)} represents a general expression of Newton's method with \, y_n as the first approximation,
y_{n+1} = \frac{y_{n}(3-xy_n^2)}{2} is the particularized expression where f(y)=\frac{1}{y^2}-x and f'(y)=\frac{-2}{y^3}.
Hence y = y*(1.5f - xhalf*y*y); is the same as \, y_{n+1} = y_{n}(1.5-\frac{xy_n^2}{2}) = \frac{y_{n}(3-xy_n^2)}{2}

The first approximation is generated above through the integer operations and input into the last two lines of the function. Repeated iterations of the algorithm, using the output of the function (y_{n+1}) as the input of the next iteration, cause the algorithm to converge on the root with increasing precision.[18] For the purposes of the Quake III engine, only one iteration was used. A second iteration remained in the code but was commented out.[10]

History and investigation[edit]

John Carmack, co-founder of id Software, is commonly associated with the code, though he actually did not write it.

The source code for Quake III was not released until QuakeCon 2005, but copies of the fast inverse square root code appeared on Usenet and other forums as early as 2002 or 2003.[1] Initial speculation pointed to John Carmack as the probable author of the code, but he demurred and suggested it was written by Terje Mathisen, an accomplished assembly programmer who had previously helped id Software with Quake optimization. Mathisen had written an implementation of a similar bit of code in the late 1990s, but the original authors proved to be much further back in the history of 3D computer graphics with Gary Tarolli's implementation for the SGI Indigo as a possible earliest known use. Rys Sommefeldt concluded that the original algorithm was devised by Greg Walsh at Ardent Computer in consultation with Cleve Moler of MATLAB fame.[19] Cleve Moler learned about this trick from code written by Velvel Kahan and K.C. Ng at Berkeley around 1986 (see the comment section at the end of fdlibm code for sqrt[20]).[21] Jim Blinn also demonstrated a simple approximation of the inverse square root in a 1997 column for IEEE Computer Graphics and Applications.[22]

It is not known precisely how the exact value for the magic number was determined. Chris Lomont developed a function to minimize approximation error by choosing the magic number R over a range. He first computed the optimal constant for the linear approximation step as 0x5f37642f, close to 0x5f3759df, but this new constant gave slightly less accuracy after one iteration of Newton's method.[23] Lomont then searched for a constant optimal even after one and two Newton iterations and found 0x5f375a86, which is more accurate than the original at every iteration stage.[23] He concluded by asking whether the exact value of the original constant was chosen through derivation or trial and error.[24] Lomont pointed out that the "magic number" for 64 bit IEEE754 size type double is 0x5fe6ec85e7de30da, but in fact it was shown to be exactly 0x5fe6eb50c7b537a9.[25] Charles McEniry performed a similar but more sophisticated optimization over likely values for R. His initial brute force search resulted in the same constant that Lomont determined.[26] When he attempted to find the constant through weighted bisection, the specific value of R used in the function occurred, leading McEniry to believe that the constant may have originally been derived through "bisecting to a given tolerance".[27]

See also[edit]

Notes[edit]

  1. ^ Use of the type long reduces the portability of this code on modern systems. For the code to execute properly, sizeof(long) must be 4 bytes, otherwise negative outputs may result. Under many modern 64-bit systems, sizeof(long) is 8 bytes.
  2. ^ Floating point numbers are normalized—the significand is expressed as a number m_x\in[0,1). See David Goldberg (March 1991). "What Every Computer Scientist Should Know About Floating-Point Arithmetic". ACM Computing Surveys 23 (1): 5–48. doi:10.1145/103162.103163.  for further explanation.
  3. ^ Lomont 2003 approaches the determination of R in a different fashion, splitting R into R_1 and R_2 for the significand and exponent bits of R.

References[edit]

  1. ^ a b c d Sommefeldt, Rys (November 29, 2006). "Origin of Quake3's Fast InvSqrt()". Beyond3D. Retrieved 2009-02-12. 
  2. ^ Blinn 2003, p. 130.
  3. ^ Middendorf 2007, pp. 155-164.
  4. ^ "quake3-1.32b/code/game/q_math.c". Quake III Arena. id Software. Retrieved 2010-11-15. 
  5. ^ Eberly 2001, p. 504.
  6. ^ Lomont 2003, p. 1.
  7. ^ McEniry 2007, p. 1.
  8. ^ Lomont 2003, p. 3.
  9. ^ McEniry 2007, p. 2, 16.
  10. ^ a b Eberly 2002, p. 2.
  11. ^ Ruskin, Elan (2009-10-16). "Timing square root". Some Assembly Required. Retrieved 2010-09-13. 
  12. ^ a b McEniry 2007, p. 2.
  13. ^ Hennessey & Patterson 1998, p. 276.
  14. ^ Hennessey & Patterson 1998, p. 305.
  15. ^ a b c McEniry 2007, p. 3.
  16. ^ Hennessey & Patterson 1998, p. 278, 282.
  17. ^ Lomont 2003, p. 5.
  18. ^ McEniry 2007, p. 6.
  19. ^ Sommefeldt, Rys (2006-12-19). "Origin of Quake3's Fast InvSqrt() - Part Two". Beyond3D. Retrieved 2008-04-19. 
  20. ^ "sqrt implementation in fdlibm". 
  21. ^ http://blogs.mathworks.com/cleve/2012/06/19/symplectic-spacewar/#comment-13.  Missing or empty |title= (help)
  22. ^ Blinn 1997, pp. 80–84.
  23. ^ a b Lomont 2003, p. 10.
  24. ^ Lomont 2003, pp. 10–11.
  25. ^ Matthew Robertson (2012-04-24). "A Brief History of InvSqrt". UNBSJ. 
  26. ^ McEniry 2007, pp. 11–12.
  27. ^ McEniry 2007, p. 16.

Documents[edit]

External links[edit]