Fast inverse square root

Fast inverse square root (sometimes referred to as Fast InvSqrt() or by the constant 0x5f3759df) is a method of calculating the reciprocal of a square root for a 32-bit floating point number (IEEE 754-2008). The algorithm was probably developed at Silicon Graphics in the early 1990s, and an implementation appeared in 1999 in the Quake III Arena source code, but the method did not appear on public forums such as Usenet until 2002 or 2003. The algorithm generates reasonably accurate results using a unique first approximation for Newton's method. At the time, the primary advantage of the algorithm came from avoiding computationally expensive floating point operations in favor of integer operations. Inverse square roots are used to normalize vectors in computer graphics.

The algorithm accepts a 32-bit floating point number as an input, halves it and converts the floating point number into an integer. Next, a logical shift right of one bit is performed and the result subtracted from the "magic" value 0x5f3759df. This is the first approximation of the inverse square root of the input. It then returns that result to floating point and runs one iteration of Newton's method to return a more precise approximation of the inverse square root of the argument. This gives an approximation of the inverse square root of a floating point number approximately four times faster than by floating point division.

Originally attributed to John Carmack, an investigation showed that the code had deeper roots in both the hardware and software side of computer graphics. Adjustments and alterations passed through both Silicon Graphics and 3dfx Interactive, with Gary Tarolli's implementation for the SGI Indigo as the earliest known use. It is not known how the constant was originally derived, though investigation has shed some light on possible methods.

Overview of the code

float InvSqrt (float x)
{
    float xhalf = 0.5f*x;
    int i = *(int*)&x;
    i = 0x5f3759df - (i>>1);
    x = *(float*)&i;
    x = x*(1.5f - xhalf*x*x);
    return x;
}

The code outputs an approximation of the "inverse square root", or reciprocal of the square root, of the input number. The inverse square root of a floating point number is used in calculating a normalized vector. The length of the vector is determined by calculating the Euclidean norm of the vector: the square root of the sum of squares of the vector components is the length of the vector. When each component of the vector is divided by that length, the new vector will be a unit vector pointing in the same direction.

\|{\boldsymbol {v}}\|={\sqrt {{v_{1}}^{2}+{v_{2}}^{2}+{v_{3}}^{2}}}

is the Euclidean norm of the vector, analogous to the calculation of the Euclidean distance between two points in Euclidean space.

{\boldsymbol {\hat {v}}}={\boldsymbol {v}}/\|{\boldsymbol {v}}\|

is the normalized (unit) vector. Using x to represent

{v_{1}}^{2}+{v_{2}}^{2}+{v_{3}}^{2}

,

{\boldsymbol {\hat {v}}}={\boldsymbol {v}}/{\sqrt {x}}

, which relates the unit vector to the inverse square root of the distance components.

Since a 3D graphics program uses these normalized vectors to determine lighting and reflection, millions of these calculations must be done per second.^[1] Before the creation of specialized hardware to do so, software computations could be slow, as the floating point processing power lagged behind the speed of integer processing.^[2]

In order to determine this inverse square root, an approximation for $x^{-1/2}$ would be determined by the software, then some numerical method would revise that approximation until it came within an acceptable error range of the actual result. Common software methods in the early 1990s drew a first approximation from a lookup table.^[3] This bit of code proved faster than table lookups and approximately four times faster than regular floating point division.^[4] Some loss of precision occurred, but was offset by the significant gains in performance.^[5]

The advantages in speed offered by the "fast inverse square root" hack came from converting the floating point number into an integer then subtracting it from a specific constant, 0x5f3759df. The provenance of the constant is not immediately clear to someone viewing the code, so the number came to be referred to as "magic."^[2]^[6]^[7]^[8] This integer subtraction and bit shift gives an integer number which when converted back into floating point notation, gives a rough approximation for the inverse square of the input number. One iteration of Newton's method is performed to gain some precision, and the code is finished.

Conversion from floating point to integer

Breaking this down requires some understanding of how a floating point number is stored in a register. A floating point number represents a rational number expressed in three portions across 32 bits. The first, the sign bit, is 0 for positive numbers and 1 for negative numbers. The next 8 bits form the exponent, which is biased in order to result in a range of values from -126 to 127. The significand comprises the next 23 bits and represents the significant digits of the number stored. This is expressed as $x=(-1)^{Si}(1.M)2^{E-B}$ where the bias B=127.^[9] The value of the exponent determines whether the significand (referred to as the mantissa by Lomont 2003 and McEniry 2007) represents a fraction or an integer.^[10]

sign bit
0	1	1	1	1	1	1	1	=	127
0	0	0	0	0	0	1	0	=	2
0	0	0	0	0	0	0	1	=	1
0	0	0	0	0	0	0	0	=	0
1	1	1	1	1	1	1	1	=	−1
1	1	1	1	1	1	1	0	=	−2
1	0	0	0	0	0	0	1	=	−127
1	0	0	0	0	0	0	0	=	−128

Template:Caption

Conversely, an integer represented in a two's complement system contains only a sign bit and the absolute value of the integer. Converting the floating point number to integer gives an integer value of $I=E*2^{23}+M$ where I is the integer value of the number, E the exponent and M the significand. The sign bit (Si) has been ignored since the inverse square root function deals only with positive numbers resulting in a value of 0 for Si (otherwise the equation would be $I=Si*2^{31}+E*2^{23}+M$ ). Converting the argument to an integer allows for the computationally inexpensive bit shift operations that follow. Shifting the bit to the right divides the integer by 2.^[11]

Where the "magic number" enters in

S	E	M
1 bit	b bits	(n-1-b) bits
n bits^[12]

The selection of 0x5f3759df as a constant prompted much of the original speculation surrounding the fast inverse square root function. In an attempt to determine how a programmer might have originally determined that constant as a mechanism to approximate the inverse square root, Charles McEniry first determined how the choice of any constant R could give a first approximation for the inverse square root. Recalling the integer and floating point comparison from above, note that $x$ , our floating point number, is $\scriptstyle x=(1+m_{x})2^{e_{x}}$ and $I_{x}$ , our integer value for that same number, is $I_{x}=E_{x}L+M_{x}$ .^{[note 1]} These identities introduce a few new elements, which are simply restatements of values for the exponent and significand.

\textstyle m_{x}={\frac {M_{x}}{L}}

where

L=2^{n-1-b}

.

e_{x}=E_{x}-B

where

B=2^{b-1}-1

.

Per these identities, a bit shift operation and integer subtraction can reliably output an approximation for the inverse square root of a floating point number. The proof from McEniry 2007 proceeds:

y={\frac {1}{\sqrt {x}}}

log_{2}{(y)}=-{\frac {1}{2}}log_{2}{(x)}

log_{2}(1+m_{y})+e_{y}=-{\frac {1}{2}}log_{2}{(1+m_{x})}-{\frac {1}{2}}e_{x}

,

taking the binary logarithm or $log_{2}$ of both sides. The binary logarithm is the inverse function of $f(n)=2^{n}$ and makes the multiplied terms in the floating point numbers x and y reduce to addition. The relationship between $log_{2}{(x)}$ and $log_{2}{(x^{-1/2})}$ is linear, allowing an equation to be constructed which can express x and $y_{0}$ (The input and first approximation) as a linear combination of terms.^[13] The next step in the proof introduces a term $\sigma$ which plays in the determination of the significand in the magic number R.^{[note 2]} For our purposes, $\sigma$ is a real number bounded by [0,1/3]—for an R equal to 0x5f3759df, $\sigma =0.0450461875791687011756$ .^[12] For any $\scriptstyle 0\leq {x}\leq 1$ , $\scriptstyle log_{2}{(1+x)}\approx {x}$ . $\scriptstyle log_{2}{(1+x)}\cong x+\sigma$ can now be defined, since the values of $m_{x}$ are bounded between 0 and 1, allowing us to approximate the binary logarithm using $m_{x}$ and $\sigma$ .

m_{y}+\sigma +e_{y}=-{\frac {1}{2}}m_{x}-{\frac {1}{2}}\sigma -{\frac {1}{2}}e_{x}

Using the identities for $M_{x}$ , $E_{x}$ , $B$ and $L$ :

M_{y}+(E_{y}-B)L=-{\frac {3}{2}}\sigma {L}-{\frac {1}{2}}M_{x}-{\frac {1}{2}}(E_{x}-B)L

Rearranging of terms leads to:

E_{y}L+M_{y}={\frac {3}{2}}(B-\sigma )L-{\frac {1}{2}}(E_{x}L+M_{x})

The integer value of a strictly positive floating point number x is $I_{x}=E_{x}L+M_{x}$ . This gives an expression for the integer value of y (where $\textstyle y={\frac {1}{\sqrt {x}}}$ , our first approximation for the inverse square root) in terms of the integer components of x. Specifically,

I_{y}=E_{y}L+M_{y}=R-{\frac {1}{2}}(E_{x}L+M_{x})=R-{\frac {1}{2}}I_{x}

where

R={\frac {3}{2}}(B-\sigma )L

.

The equation $\scriptstyle I_{y}=R-{\frac {1}{2}}I_{x}$ is the line i = 0x5f3759df - (i>>1); in Fast InvSqrt(), the integer approximation for $y_{0}$ is the integer value for x, shifted to the right and subtracted from R.^[12] McEniry's proof here shows only that a constant R can be used to approximate the integer value of the inverse square root of a floating point number. It does not prove that R assumes the value used in the code itself.

A relatively simple explanation for how a bit shift and a subtraction operation using the expected value of R results in division of the exponent E by negative two can be found in Chris Lomont's paper exploring the constant. As an example, for $10000=10^{4}$ , a division of the exponent by -2 would produce $10000^{-1/2}=10^{-2}=1/100$ . Since the exponent is biased, the true value of the exponent (here e) is $e=E-127$ , making the value of the biased result $-e/2+127$ .^[14] Subtracting the integer from R (the "magic" number 0x5f3759df) forces the least significant bit of the exponent to carry into the significand and when returned to floating point notation, outputs a floating point number very close to the inverse square root of the input. The specific value of R was chosen to minimize the expected error in division of the exponent as well as the expected error in shifting the significand. 0xbe represents an integer value which minimizes the expected error resulting from division of the floating point exponent through bit shift operations—notably the value of 0xbe shifted one to the right is 0x5f, the first digits in the magic number R.^[15]

Newton's method

After performing those integer operations, the algorithm returns the number to floating point notation (x = *(float*)&i;) and performs a floating point multiplication operation (x = x*(1.5f - xhalf*x*x);). The floating point operation represents a single iteration of Newton's method of finding roots for a given equation. For this example,

y={\frac {1}{\sqrt {x}}}

is the inverse square root, or, as a function of y,

f(y)={\frac {1}{y^{2}}}-x=0

.

As

y_{n+1}=y_{n}-{\frac {f(y_{n})}{f'(y_{n})}}

represents a general expression of Newton's method with

\,y_{n}

as the first approximation,

y_{n+1}={\frac {y_{n}(3-xy_{n}^{2})}{2}}

is the particularized expression where

f(y)={\frac {1}{y^{2}}}-x

and

f'(y)={\frac {-2}{y^{3}}}

.

Hence (x = x*(1.5f - xhalf*x*x) is the same as

\,y_{n+1}=y_{n}(1.5-{\frac {xy_{n}^{2}}{2}})={\frac {y_{n}(3-xy_{n}^{2})}{2}}

The first approximation is generated above through the integer operations and input into the last two lines of the function. Repeated iterations of the algorithm, using the output of the function ( $\,y_{n+1}$ ) as the input of the next iteration, cause the algorithm to converge on the root with increasing precision.^[16] For the purposes of the Quake III engine, only one iteration was used. A second iteration remained in the code but was commented out.^[8]

History and investigation

The source code for Quake III was not released until QuakeCon 2005, but copies of the fast inverse square root code appeared on Usenet and other forums as early as 2002 or 2003.^[2] Initial speculation pointed to John Carmack as the probable author of the code, but he demurred and suggested it was written by Terje Matheson, another id Software employee and accomplished assembly programmer. Matheson had written an implementation of a similar bit of code in the late 1990s, but the original authors proved to be much further back in the history of 3D computer graphics. Gary Tarolli at Silicon Graphics worked with (though he did not invent) the code in order to speed graphics processing on the SGI Indigo, in order to take advantage of a constraint particular to the time, "fast integer and slow floating point" computation.^[2]

It is not known precisely how the exact value for the magic number was determined. Chris Lomont developed a function to minimize approximation error by choosing the magic number R over a range. He predicted a new constant close to 0x5f3759df, but his constant was more accurate before one iteration of Newton's method and less accurate than the original constant after one iteration. He concluded that the exact value of the constant could have been chosen through trial and error.^[17] Charles McEniry performed a similar but more sophisticated optimization over likely values for R. His initial bruce force search resulted in the same constant that Lomont determined.^[18] When he attempted to find the constant through weighted bisection, the specific value of R used in the function occurred, leaving McEniry to believe that the constant may have originally been derived through "bisecting to a given tolerance."^[19]

Notes

^ Floating point numbers are normalized—the significand is expressed as a number $m_{x}\in [0,1)$ . See David Goldberg (1991). "What Every Computer Scientist Should Know About Floating-Point Arithmetic". ACM Computing Surveys (CSUR). 23 (1): 5–48. doi:10.1145/103162.103163. {{cite journal}}: Unknown parameter |month= ignored (help) for further explanation.
^ Lomont 2003 approaches the determination of R in a different fashion, splitting R into $R_{1}$ and $R_{2}$ for the significand and exponent bits of R.

References

^ Striegel, Jason (December 4, 2008). "Quake's fast inverse square root". Hackszine. O'Reilly Media. Retrieved 2009-02-13.
^ ^a ^b ^c ^d Sommefeldt, Rys (November 29, 2006). "Origin of Quake3's Fast InvSqrt()". Beyond3D. Retrieved 2009-02-12.
^ Eberly, David (2001). 3D Game Engine Design. Morgan Kaufmann. p. 504. ISBN 9781558605930.
^ Lomont, Chris (February, 2003). "Fast Inverse Square Root" (PDF). p. 1. Retrieved 2009-02-13. {{cite web}}: Check date values in: |date= (help)
^ McEniry, Charles (August, 2007). "The Mathematics Behind the Fast Inverse Square Root Function Code" (PDF). p. 1. Retrieved 2009-02-13. {{cite web}}: Check date values in: |date= (help)
^ Lomont 2003, p. 3
^ McEniry 2007, p. 2, 16
^ ^a ^b Eberly, David (2002), Fast Inverse Square Root (PDF), Geometric Tools, p. 2
^ McEniry 2007, p. 2
^ Hennessey, John (1998). Computer Organization and Design (Second ed.). San Francisco, CA: Morgan Kaufmann Publishers. p. 276. ISBN 9781558604919. {{cite book}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)
^ Hennessey and Patterson, p. 305
^ ^a ^b ^c McEniry 2007, p. 3
^ McEniry 2007, p. 2
^ Hennessey and Patterson, p. 278, 282
^ Lomont 2003, p. 5
^ McEniry 2007, p. 6
^ Lomont 2003, p. 10–11
^ McEniry 2007, p. 11–12
^ McEniry 2007, p. 16

External links and further reading

Kushner, David (Aug 2002). "The wizardry of Id". IEEE Spectrum. 39 (8): 42–47.
Weisstein, Eric W. "Newton's Method". MathWorld. Wolfram Research.
"Quake III Arena source code". id Software.

[13] Floating point numbers are normalized—the significand is expressed as a number $m_{x}\in [0,1)$ . See David Goldberg (1991). "What Every Computer Scientist Should Know About Floating-Point Arithmetic". ACM Computing Surveys (CSUR). 23 (1): 5–48. doi:10.1145/103162.103163. {{cite journal}}: Unknown parameter |month= ignored (help) for further explanation.

[15] Lomont 2003 approaches the determination of R in a different fashion, splitting R into $R_{1}$ and $R_{2}$ for the significand and exponent bits of R.

[hackzine-1] Striegel, Jason (December 4, 2008). "Quake's fast inverse square root". Hackszine. O'Reilly Media. Retrieved 2009-02-13.

[Beyond3D-2] Sommefeldt, Rys (November 29, 2006). "Origin of Quake3's Fast InvSqrt()". Beyond3D. Retrieved 2009-02-12.

[3] Eberly, David (2001). 3D Game Engine Design. Morgan Kaufmann. p. 504. ISBN 9781558605930.

[4] Lomont, Chris (February, 2003). "Fast Inverse Square Root" (PDF). p. 1. Retrieved 2009-02-13. {{cite web}}: Check date values in: |date= (help)

[5] McEniry, Charles (August, 2007). "The Mathematics Behind the Fast Inverse Square Root Function Code" (PDF). p. 1. Retrieved 2009-02-13. {{cite web}}: Check date values in: |date= (help)

[6] Lomont 2003, p. 3

[7] McEniry 2007, p. 2, 16

[Eberly_2-8] Eberly, David (2002), Fast Inverse Square Root (PDF), Geometric Tools, p. 2

[9] McEniry 2007, p. 2

[10] Hennessey, John (1998). Computer Organization and Design (Second ed.). San Francisco, CA: Morgan Kaufmann Publishers. p. 276. ISBN 9781558604919. {{cite book}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)

[11] Hennessey and Patterson, p. 305

[McEniry_3-12] McEniry 2007, p. 3

[14] McEniry 2007, p. 2

[16] Hennessey and Patterson, p. 278, 282

[17] Lomont 2003, p. 5

[18] McEniry 2007, p. 6

[19] Lomont 2003, p. 10–11

[20] McEniry 2007, p. 11–12

[21] McEniry 2007, p. 16

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[note 1]

[13]

[note 2]

[14]

[15]

[16]

[17]

[18]

[19]