# Newton polynomial

(Redirected from Newton polynomials)

In the mathematical field of numerical analysis, a Newton polynomial, named after its inventor Isaac Newton, is the interpolation polynomial for a given set of data points in the Newton form. The Newton polynomial is sometimes called Newton's divided differences interpolation polynomial because the coefficients of the polynomial are calculated using divided differences.

(The other difference formulas, such as those of Gauss, Bessel and Stirling, can be derived from Newton's, by renaming of the x-values of the data points.)

For any given finite set of data points, there is only one polynomial, of least possible degree, that passes through all of them. Thus, it is more appropriate to speak of "the Newton form of the interpolation polynomial" rather than of "the Newton interpolation polynomial". Like the Lagrange form, it is merely another way to write the same polynomial.

## Definition

Given a set of k + 1 data points

$(x_0, y_0),\ldots,(x_k, y_k)$

where no two xj are the same, the interpolation polynomial in the Newton form is a linear combination of Newton basis polynomials

$N(x) := \sum_{j=0}^{k} a_{j} n_{j}(x)$

with the Newton basis polynomials defined as

$n_j(x) := \prod_{i=0}^{j-1} (x - x_i)$

for j > 0 and $n_0(x) \equiv 1$.

The coefficients are defined as

$a_j := [y_0,\ldots,y_j]$

where

$[y_0,\ldots,y_j]$

is the notation for divided differences.

Thus the Newton polynomial can be written as

$N(x) = [y_0] + [y_0,y_1](x-x_0) + \cdots + [y_0,\ldots,y_k](x-x_0)(x-x_1)\cdots(x-x_{k-1}).$

The Newton Polynomial above can be expressed in a simplified form when $x_0, x_1, \dots, x_k$ are arranged consecutively with equal space. Introducing the notation $h = x_{i+1}-x_i$ for each $i=0,1,\dots,k-1$ and $x=x_0+sh$, the difference $x-x_i$ can be written as $(s-i)h$. So the Newton Polynomial above becomes:

\begin{align} N(x) &= [y_0] + [y_0,y_1]sh + \cdots + [y_0,\ldots,y_k] s (s-1) \cdots (s-k+1){h}^{k} \\ &= \sum_{i=0}^{k}s(s-1) \cdots (s-i+1){h}^{i}[y_0,\ldots,y_i] \\ &= \sum_{i=0}^{k}{s \choose i}i!{h}^{i}[y_0,\ldots,y_i] \end{align}

is called the Newton Forward Divided Difference Formula.

If the nodes are reordered as ${x}_{k},{x}_{k-1},\dots,{x}_{0}$, the Newton Polynomial becomes:

$N(x)=[y_k]+[{y}_{k}, {y}_{k-1}](x-{x}_{k})+\cdots+[{y}_{k},\ldots,{y}_{0}](x-{x}_{k})(x-{x}_{k-1})\cdots(x-{x}_{1})$

If ${x}_{k},\;{x}_{k-1},\;\dots,\;{x}_{0}$ are equally spaced with x=${x}_{k}+sh$ and ${x}_{i}={x}_{k}-(k-i)h$ for i = 0, 1, ..., k, then,

\begin{align} N(x) &= [{y}_{k}]+ [{y}_{k}, {y}_{k-1}]sh+\cdots+[{y}_{k},\ldots,{y}_{0}]s(s+1)\cdots(s+k-1){h}^{k} \\ &=\sum_{i=0}^{k}{(-1)}^{i}{-s \choose i}i!{h}^{i}[{y}_{k},\ldots,{y}_{k-i}] \end{align}

is called the Newton Backward Divided Difference Formula.

## Significance

Newton's formula is of interest because it is the straightforward and natural differences-version of Taylor's polynomial. Taylor's polynomial tells where a function will go, based on its y value, and its derivatives (its rate of change, and the rate of change of its rate of change, etc.) at one particular x value. Newton's formula is Taylor's polynomial based on finite differences instead of instantaneous rates of change.

As with other difference formulas, the degree of a Newton's interpolating polynomial can be increased by adding more terms and points without discarding existing ones. Newton's form has the simplicity that the new points are always added at one end: Newton's forward formula can add new points to the right, and Newton's backwards formula can add new points to the left.

The accuracy of polynomial interpolation depends on how close the interpolated point is to the middle of the x values of the set of points used. Obviously, as new points are added at one end, that middle becomes farther and farther from the first data point. Therefore, if it isn't known how many points will be needed for the desired accuracy, the middle of the x-values might be far from where the interpolation is done.

Gauss, Stirling, and Bessel all developed formulae to remedy that problem.[citation needed]

Gauss's formula alternately adds new points at the left and right ends, thereby keeping the set of points centered near the same place (near the evaluated point). When so doing, it uses terms from Newton's formula, with data points and x values renamed in keeping with one's choice of what data point is designated as the x0 data point.

Stirling's formula remains centered about a particular data point, for use when the evaluated point is nearer to a data point than to a middle of two data points.

Bessel's formula remains centered about a particular middle between two data points, for use when the evaluated point is nearer to a middle than to a data point.

Bessel and Stirling achieve that by sometimes using the average of two differences, and sometimes using the average of two products of binomials in x, where Newton's or Gauss's would use just one difference or product. Stirling's uses an average difference in odd-degree terms (whose difference uses an even number of data points); Bessels uses an average difference in even-degree terms (whose difference uses an odd number of data points).

## Strengths and weaknesses of various formulae

Gauss vs Bessel & Stirling:

Bessel and Stirling require a bit more work than Gauss does. The desirability of using Bessel or Stirling depends on whether or not their small improvement in accuracy, over Gauss, is needed.

Bessel vs Stirling:

The choice between Bessel and Stirling depends on whether the interpolated point is closer to a data point, or closer to a middle between two data points.

But it should be pointed out that a polynomial interpolation's error approaches zero, as the interpolation point approaches a data-point. Therefore, Stirling's formula brings its accuracy improvement where it's least needed. ...and Bessel brings its accuracy improvement where it's most needed.

So, Bessel's formula could be said to be the most consistently accurate difference formula, and, in general, the most consistently accurate of the familiar polynomial interpolation formulas.

Divided-Difference Methods vs Lagrange:

Lagrange is sometimes said to require less work, and is sometimes recommended for problems in which it's known, in advance, from previous experience, how many terms are needed for sufficient accuracy.

The divided difference method have the advantage that more data points can be added, for improved accuracy, without re-doing the whole problem. The terms based on the previous data points can continue to be used. With the ordinary Lagrange formula, to do the problem with more data points would require re-doing the whole problem.

There is a "Barycentric" version of Lagrange that avoids the need to re-do the entire calculation when adding a new data point. But it requires that the values of each term be recorded.

But the ability, of Gauss, Bessel and Stirling, to keep the data points centered close to the interpolated point gives them an advantage over Lagrange, when it isn't known, in advance, how many data points will be needed.

Additionally, suppose that one wants to find out if, for some particular type of problem, linear interpolation is sufficiently accurate. That can be determined by evaluating the quadratic term of a divided difference formula. If the quadratic term is negligible--meaning that the linear term is sufficiently accurate without adding the quadratic term--then linear interpolation is sufficiently accurate. (If the problem is sufficiently important, or if the quadratic term is nearly big enough to matter, then one might want to determine whether the _sum_ of the quadratic and cubic terms is large enough to matter in the problem.)

Of course only a divided-difference method can be used for such a determination.

For that purpose, the divided-difference formula &/or its x0 point should be chosen so that the formula will use, for its linear term, the two data points between which the linear interpolation of interest would be done.

The divided difference formulas are more versatile, useful in more kinds of problems.

The Lagrange formula is at its best when all the interpolation will be done at one x value, with only the data points' y values varying from one problem to another, and when it's known, from past experience, how many terms are needed for sufficient accuracy.

With the Newton form of the interpolating polynomial a compact and effective algorithm exists for combining the terms to find the coefficients of the polynomial.[1]

### Accuracy

When, with Stirling's or Bessel's, the last term used includes the average of two differences, then one more point is being used than Newton's or other polynomial interpolations would use for the same polynomial degree. So, in that instance, Stirling's or Bessel's is not putting an N−1 degree polynomial through N points, but is, instead, trading equivalence with Newton's for better centering and accuracy, giving those methods sometimes potentially greater accuracy, for a given polynomial degree, than other polynomial interpolations.

## General case

For the special case of xi = i, there is a closely related set of polynomials, also called the Newton polynomials, that are simply the binomial coefficients for general argument. That is, one also has the Newton polynomials $p_n(z)$ given by

$p_n(z)={z \choose n}= \frac{z(z-1)\cdots(z-n+1)}{n!}$

In this form, the Newton polynomials generate the Newton series. These are in turn a special case of the general difference polynomials which allow the representation of analytic functions through generalized difference equations.

## Main idea

Solving an interpolation problem leads to a problem in linear algebra where we have to solve a system of linear equations. Using a standard monomial basis for our interpolation polynomial we get the very complicated Vandermonde matrix. By choosing another basis, the Newton basis, we get a system of linear equations with a much simpler lower triangular matrix which can be solved faster.

For k + 1 data points we construct the Newton basis as

$n_j(x) := \prod_{i=0}^{j-1} (x - x_i) \qquad j=0,\ldots,k.$

Using these polynomials as a basis for $\Pi_k$ we have to solve

$\begin{bmatrix} 1 & & \ldots & & 0 \\ 1 & x_1-x_0 & & & \\ 1 & x_2-x_0 & (x_2-x_0)(x_2-x_1) & & \vdots \\ \vdots & \vdots & & \ddots & \\ 1 & x_k-x_0 & \ldots & \ldots & \prod_{j=0}^{k-1}(x_k - x_j) \end{bmatrix} \begin{bmatrix} a_0 \\ \\ \vdots \\ \\ a_{k} \end{bmatrix} = \begin{bmatrix} y_0 \\ \\ \vdots \\ \\ y_{k} \end{bmatrix}$

to solve the polynomial interpolation problem.

This system of equations can be solved iteratively by solving

$\sum_{i=0}^{j} a_{i} n_{i}(x_j) = y_j \qquad j = 0,\dots,k.$

## Taylor polynomial

The limit of the Newton polynomial if all nodes coincide is a Taylor polynomial, because the divided differences become derivatives.

$\lim_{(x_0,\dots,x_n)\to(z,\dots,z)} f[x_0] + f[x_0,x_1]\cdot(\xi-x_0) + \dots + f[x_0,\dots,x_n]\cdot(\xi-x_0)\cdot\dots\cdot(\xi-x_{n-1}) =$
$= f(z) + f'(z)\cdot(\xi-z) + \dots + \frac{f^{(n)}(z)}{n!}\cdot(\xi-z)^n$

## Application

As can be seen from the definition of the divided differences new data points can be added to the data set to create a new interpolation polynomial without recalculating the old coefficients. And when a data point changes we usually do not have to recalculate all coefficients. Furthermore if the xi are distributed equidistantly the calculation of the divided differences becomes significantly easier. Therefore the divided-difference formulas are usually preferred over the Lagrange form for practical purposes.

### Example

The divided differences can be written in the form of a table. For example, for a function f is to be interpolated on points $x_0, \ldots, x_n$. Write

$\begin{matrix} x_0 & f(x_0) & & \\ & & {f(x_1)-f(x_0)\over x_1 - x_0} & \\ x_1 & f(x_1) & & {{f(x_2)-f(x_1)\over x_2 - x_1}-{f(x_1)-f(x_0)\over x_1 - x_0} \over x_2 - x_0} \\ & & {f(x_2)-f(x_1)\over x_2 - x_1} & \\ x_2 & f(x_2) & & \vdots \\ & & \vdots & \\ \vdots & & & \vdots \\ & & \vdots & \\ x_n & f(x_n) & & \\ \end{matrix}$

Then the interpolating polynomial is formed as above using the topmost entries in each column as coefficients.

For example, suppose we are to construct the interpolating polynomial to f(x) = tan(x) using divided differences, at the points

 $x_0=-\tfrac{3}{2}$ $x_1=-\tfrac{3}{4}$ $x_2=0$ $x_3=\tfrac{3}{4}$ $x_4=\tfrac{3}{2}$ $f(x_0)=-14.1014$ $f(x_1)=-0.931596$ $f(x_2)=0$ $f(x_3)=0.931596$ $f(x_4)=14.1014$

Using six digits of accuracy, we construct the table

$\begin{matrix} -\tfrac{3}{2} & -14.1014 & & & &\\ & & 17.5597 & & &\\ -\tfrac{3}{4} & -0.931596 & & -10.8784 & &\\ & & 1.24213 & & 4.83484 & \\ 0 & 0 & & 0 & & 0\\ & & 1.24213 & & 4.83484 &\\ \tfrac{3}{4} & 0.931596 & & 10.8784 & &\\ & & 17.5597 & & &\\ \tfrac{3}{2} & 14.1014 & & & &\\ \end{matrix}$

Thus, the interpolating polynomial is

$-14.1014+17.5597(x+\tfrac{3}{2})-10.8784(x+\tfrac{3}{2})(x+\tfrac{3}{4}) +4.83484(x+\tfrac{3}{2})(x+\tfrac{3}{4})(x)+0(x+\tfrac{3}{2})(x+\tfrac{3}{4})(x)(x-\tfrac{3}{4}) =$
$=-0.00005-1.4775x-0.00001x^2+4.83484x^3$

Given more digits of accuracy in the table, the first and third coefficients will be found to be zero.