Numeric precision in Microsoft Excel

From Wikipedia, the free encyclopedia
Jump to: navigation, search

As with other spreadsheets, Microsoft Excel works only to limited accuracy because it retains only a certain number of figures to describe numbers (it has limited precision). Excel nominally works with 8-byte numbers by default, a modified 1985 version of the IEEE 754 specification[1] (Besides numbers, Excel uses a few other data types.[2]) Although Excel can display 30 decimal points, its precision for a specified number is confined to 15 significant figures, and calculations may have an accuracy that is even less due to three issues: round off,[3] truncation, and binary storage.

Accuracy and binary storage[edit]

Excel maintains 15 figures in its numbers, but they are not always accurate: the bottom line should be the same as the top line.
Of course, 1 + x − 1 = x. The discrepancy indicates the error. All errors but the last are beyond the 15-th decimal.

In the top figure the fraction 1/9000 in Excel is displayed. Although this number has a decimal representation that is an infinite string of ones, Excel displays only the leading 15 figures. In the second line, the number one is added to the fraction, and again Excel displays only 15 figures. In the third line, one is subtracted from the sum using Excel. Because the sum has only eleven 1's after the decimal, the true difference when ‘1’ is subtracted is three 0's followed by a string of eleven 1's. However, the difference reported by Excel is three 0's followed by a 15-digit string of thirteen 1's and two extra erroneous digits. Thus, the numbers Excel calculates with are not the numbers that it displays. Moreover, the error in Excel's answer is not just round-off error. How was this answer obtained?

The inaccuracy in Excel calculations is more complicated than errors due to a precision of 15 significant figures. Excel's storage of numbers in binary format also affects its accuracy.[4] To illustrate, the lower figure tabulates the simple addition 1 + x − 1 for several values of x. All the values of x begin at the 15-th decimal, so Excel must take them into account. Before calculating the sum 1 + x, Excel first approximates x as a binary number. If this binary version of x is a simple power of 2, the 15-digit decimal approximation to x is stored in the sum, and the top two examples of the figure indicate recovery of x without error. In the third example, x is a more complicated binary number, x = 1.110111⋯111 × 2−49 (15 bits altogether). Here x is approximated by the 4-bit binary 1.111 × 2−49 (some insight into this approximation can be found using geometric progression: x = 1.11 × 2−49 + 2−52 × (1 − 2−11) ≈ 1.11 × 2−49 + 2−52 = 1.111 × 2−49 ) and the decimal equivalent of this crude 4-bit approximation is used. In the fourth example, x is a decimal number not equivalent to a simple binary (although it agrees with the binary of the third example to the precision displayed). The decimal input is approximated by a binary and then that decimal is used. These two middle examples in the figure show that some error is introduced.

The last two examples illustrate what happens if x is a rather small number. In the second from last example, x = 1.110111⋯111 × 2−50; 15 bits altogether. the binary is replaced very crudely by a single power of 2 (in this example, 2−49) and its decimal equivalent is used. In the bottom example, a decimal identical with the binary above to the precision shown, is nonetheless approximated differently from the binary, and is eliminated by truncation to 15 significant figures, making no contribution to 1 + x − 1, leading to x = 0.[5]

For x′s that are not simple powers of 2, a noticeable error in 1 + x − 1 can occur even when x is quite large. For example, if x = 1/1000, then 1 + x − 1 = 9.9999999999989 × 10−4, an error in the 13 significant figure. In this case, if Excel simply added and subtracted the decimal numbers, avoiding the conversion to binary and back again to decimal, no round-off error would occur and accuracy actually would be better. Excel has the option to "Set precision as displayed".[6] With this option, depending upon circumstance, accuracy may turn out to be better or worse, but you will know exactly what Excel is doing. (It should be noted, however, that only the selected precision is retained, and one cannot recover extra digits by reversing this option.) Some similar examples can be found at this link.[7]

In short, a variety of accuracy behavior is introduced by the combination of representing a number with a limited number of binary digits, along with truncating numbers beyond the fifteenth significant figure.[8] Excel's treatment of numbers beyond 15 significant figures sometimes contributes better accuracy to the final few significant figures of a computation than working directly with only 15 significant figures, and sometimes not.

For the reasoning behind the conversion to binary representation and back to decimal, and for more detail about accuracy in Excel and VBA consult these links.[9]

Examples where precision is no indicator of accuracy[edit]

Statistical functions[edit]

Error in Excel 2007 calculation of standard deviation. All four columns have the same deviation of 0.5

Accuracy in Excel-provided functions can be an issue. Micah Altman et al. provide this example:[10] The population standard deviation given by:

 \sqrt{ \frac{ \Sigma (x - \bar{x})^2 }{n}  } = \sqrt{  \frac{ \Sigma \left[ x - \left( \Sigma x \right) /n \right] ^2}{n}  } \ ,

is mathematically equivalent to:

\sqrt{ \frac{ n\Sigma x^2 - \left( \Sigma x \right) ^2 }{n^2}  } \ .

However, the first form keeps better numerical accuracy for large values of x, because squares of differences between x and xav leads to less round-off than the differences between the much larger numbers Σx2 and (Σx)2. The built-in Excel function STDEVP(), however, uses the less accurate formulation because it is faster computationally.[11]

Both the "compatibility" function STDEVP and the "consistency" function STDEV.P in Excel 2010 return the 0.5 population standard deviation for the given set of values. However, numerical inaccuracy still can be shown using this example by extending the existing figure to include 1015, whereupon the erroneous standard deviation found by Excel 2010 will be zero.

Subtraction of Subtraction Results[edit]

Doing simple subtractions may lead to errors as two cells may display the same numeric value while storing two separate values. An example of this occurs in an sheet where the following cells are set to the following numeric values:

A1:= 28.552
A2:= 27.399
A3:= 26.246

and the following cells contain the following formulas

B1: = A1 - A2
B2: = A2 - A3

Both cells B1 and B2 display 1.1530. However, if cell C1 contains the formula B1-B2 then C1 does not display 0 as would be expected, but displays -3.55271E-15 instead.

Round-off error[edit]

User computations must be carefully organized to ensure round-off error does not become an issue. An example occurs in solving a quadratic equation:

ax^2 + b x +c = 0 \ .

The solutions (the roots) of this equation are exactly determined by the quadratic formula:

x= \frac{-b \pm \sqrt{b^2-4ac} }{2a}.

When one of these roots is very large compared to the other, that is, when the square root is close to the value b, the evaluation of the root corresponding to subtraction of the two terms becomes very inaccurate due to round-off.

It is possible to determine the round-off error by using the Taylor series formula for the square root:[12]

\sqrt{b^2-4ac} = b \ \sqrt{1-\frac{4ac}{b^2}} \approx b \left( 1 -\frac{2ac}{b^2} + \frac{2 a^2 c^2 }{b^4} + \cdots \right ).

Consequently,

 b - \sqrt{b^2-4ac} \approx b \left (  \frac{2ac}{b^2} - \frac{2 a^2 c^2 }{b^4} + \cdots \right ),

indicating that, as b becomes larger, the first surviving term, say ε:

 \varepsilon = \frac{2ac}{b},

becomes smaller and smaller. The numbers for b and the square root become nearly the same, and the difference becomes small:

b - \sqrt{b^2-4ac} \approx b - b + \varepsilon.

Under these circumstances, all the significant figures go into expressing b. For example, if the precision is 15 figures, and these two numbers, b and the square root, are the same to 15 figures, the difference will be zero instead of the difference ε.

A better accuracy can be obtained from a different approach, outlined below.[13] If we denote the two roots by r1 and r2, the quadratic equation can be written:

\left(x - r_1\right) \left( x - r_2 \right) = x^2 - \left( r_1  + r_2 \right) x + r_1 \  r_2 = 0.

When the root r1 >> r2, the sum (r1 + r2 ) ≈ r1 and comparison of the two forms shows approximately:

 r_1  \approx -\frac{b}{a},

while

 r_1 \  r_2 = \frac{c}{a}.

Thus, we find the approximate form:

r_2 = \frac {c}{a \ r_1} \approx -\frac {c}{b}.

These results are not subject to round-off error, but they are not accurate unless b 2 is large compared to ac.

Excel graph of the difference between two evaluations of the smallest root of a quadratic: direct evaluation using the quadratic formula (accurate at smaller b) and an approximation for widely spaced roots (accurate for larger b). The difference reaches a minimum at the large dots, and round-off causes squiggles in the curves beyond this minimum.

The bottom line is that in doing this calculation using Excel, as the roots become farther apart in value, the method of calculation will have to switch from direct evaluation of the quadratic formula to some other method so as to limit round-off error. The point to switch methods varies according to the size of coefficients a and b.

In the figure, Excel is used to find the smallest root of the quadratic equation x2 + bx + c = 0 for c = 4 and c = 4 × 105. The difference between direct evaluation using the quadratic formula and the approximation described above for widely spaced roots is plotted vs. b. Initially the difference between the methods declines because the widely spaced root method becomes more accurate at larger b-values. However, beyond some b-value the difference increases because the quadratic formula (good for smaller b-values) becomes worse due to round-off, while the widely spaced root method (good for large b-values) continues to improve. The point to switch methods is indicated by large dots, and is larger for larger c -values. At large b-values, the upward sloping curve is Excel's round-off error in the quadratic formula, whose erratic behavior causes the curves to squiggle.

A different field where accuracy is an issue is the area of numerical computing of integrals and the solution of differential equations. Examples are Simpson's rule, the Runge–Kutta method, and the Numerov algorithm for the Schrödinger equation.[14] Using Visual Basic for Applications, any of these methods can be implemented in Excel. Numerical methods use a grid where functions are evaluated. The functions may be interpolated between grid points or extrapolated to locate adjacent grid points. These formulas involve comparisons of adjacent values. If the grid is spaced very finely, round-off error will occur, and the less the precision used, the worse the round-off error. If spaced widely, accuracy will suffer. If the numerical procedure is thought of as a feedback system, this calculation noise may be viewed as a signal that is applied to the system, which will lead to instability unless the system is carefully designed.[15]

Accuracy within VBA[edit]

Although Excel nominally works with 8-byte numbers by default, VBA has a variety of data types. The Double data type is 8 bytes, the Integer data type is 2 bytes, and the general purpose 16 byte Variant data type can be converted to a 12 byte Decimal data type using the VBA conversion function CDec.[16] Choice of variable types in a VBA calculation involves consideration of storage requirements, accuracy and speed.

References[edit]

  1. ^ "Floating-point arithmetic may give inaccurate results in Excel". Revision 8.2 ; article ID: 78113. Microsoft support. June 30, 2010. Retrieved 2010-07-02. 
  2. ^ Steve Dalton (2007). "Table 2.3: Worksheet data types and limits". Financial Applications Using Excel Add-in Development in C/C++ (2nd ed.). Wiley. pp. 13–14. ISBN 0-470-02797-5. 
  3. ^ Round-off is the loss of accuracy when numbers that differ by small amounts are subtracted. Because each number has only fifteen significant digits, their difference is inaccurate when there aren't enough significant digits to express the difference.
  4. ^ Robert de Levie (2004). "Algorithmic accuracy". Advanced Excel for scientific data analysis. Oxford University Press. p. 44. ISBN 0-19-515275-1. 
  5. ^ To input a number as binary, the number is submitted as a string of powers of 2: 2^(−50)*(2^0 + 2^−1 + ⋯). To input a number as decimal, the decimal number is typed in directly.
  6. ^ This option is found on the "Excel options/Advanced" tab. See How to correct rounding errors: Method 2
  7. ^ Excel addition strangeness
  8. ^ Robert de Levie (2004). cited work. pp. 45–46. ISBN 0-19-515275-1. 
  9. ^ Micah Altman, Jeff Gill, Michael McDonald (2004). "§2.1.1 Revealing example: Computing the coefficient standard deviation". Numerical issues in statistical computing for the social scientist. Wiley-IEEE. p. 12. ISBN 0-471-23633-0. 
  10. ^ Robert de Levie (2004). Advanced Excel for scientific data analysis. Oxford University Press. pp. 45–46. ISBN 0-19-515275-1. 
  11. ^ IS Gradshteyn & IM Ryzhik (2007). "§1.112 Power series". Table of integrals, series and products (7th ed.). Academic Press. p. 25. ISBN 0-12-373637-4. 
  12. ^ This approximate method is used often in the design of feedback amplifiers, where the two roots represent the response times of the system. See the article on step response.
  13. ^ Anders Blom Computer algorithms for solving the Schrödinger and Poisson equations, Department of Physics, Lund University, 2002.
  14. ^ R. W. Hamming (1986). Numerical Methods for Scientists and Engineers (2nd ed.). Courier Dover Publications. ISBN 0-486-65241-6.  This book discusses round-off, truncation and stability extensively. For example, see Chapter 21: Indefinite integrals – feedback, page 357.
  15. ^ John Walkenbach (2010). "Defining data types". Excel 2010 Power Programming with VBA. Wiley. pp. 198 ff and Table 8–1. ISBN 0-470-47535-8.