Jump to content

Newton's method in optimization

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 195.60.18.238 (talk) at 23:08, 14 December 2006 (→‎Proof). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In mathematics, Newton's method is a well-known algorithm for finding roots of equations in one or more dimensions. It can also be used to find local maxima and local minima of functions by noticing that if a real number is a stationary point of a function , then is a root of the derivative , and therefore one can solve for by applying Newton's method to .

Thus, provided that is a twice-differentiable function and the initial guess is chosen close enough to , the sequence defined by

will converge towards .

This iterative scheme can be generalized to several dimensions by replacing the derivative with the gradient, , and the reciprocal of the second derivative with the inverse of the Hessian matrix, . One obtains the iterative scheme

Usually Newton's method is modified to include a small step size instead of

This is often done to ensure that the Wolfe conditions are satisfied at each step of the iteration.

The geometric interpretation of Newton's method is that at each iteration one approximates by a quadratic function around , and then takes a step towards the maximum/minimum of that quadratic function.

Newton's method converges much faster towards a local maximum or minimum than gradient descent. In fact, every local minimum has a neighborhood such that, if we start with Newton's method with step size converges quadratically (if the Hessian is invertible in that neighborhood).

Finding the inverse of the Hessian is an expensive operation, so the linear equation

.

is often solved approximately (but to great accuracy) using a method such as conjugate gradient. There also exist various quasi-Newton methods, where an approximation for the Hessian is used instead.

If the Hessian is close to a non-invertible matrix, the inverted Hessian can be numerically unstable and the solution may diverge. In this case, certain workarounds have been tried in the past, which have varied success with certain problems. One can, for example, modify the Hessian by adding a correction matrix so as to make positive definite. One approach is to diagonalize and choose so that has the same eigenvectors as , but with each negative eigenvalue replaced by

Proof

Assuming that the function is quadratic, Newton's method can find the solution in a single iteration, otherwise it will take further iterations.

A quadratic and its first and second derivatives are defined as:

Starting with an initial guess , the optimal solution (where the derivative is 0) will be at :

See also

References

  • Avriel, Mordecai (2003). Nonlinear Programming: Analysis and Methods. Dover Publishing. ISBN 0-486-43227-0.
  • Nocedal, Jorge & Wright, Stephen J. (1999). Numerical Optimization. Springer-Verlag. ISBN 0-387-98793-2.