Nonlinear conjugate gradient method
The minimum of is obtained when the gradient is 0:
Whereas linear conjugate gradient seeks a solution to the linear equation , the nonlinear conjugate gradient method is generally used to find the local minimum of a nonlinear function using its gradient alone. It works when the function is approximately quadratic near the minimum, which is the case when the function is twice differentiable at the minimum and the second derivative is non-singular there.
Given a function of variables to minimize, its gradient indicates the direction of maximum increase. One simply starts in the opposite (steepest descent) direction:
with an adjustable step length and performs a line search in this direction until it reaches the minimum of :
After this first iteration in the steepest direction , the following steps constitute one iteration of moving along a subsequent conjugate direction , where :
- Calculate the steepest direction: ,
- Compute according to one of the formulas below,
- Update the conjugate direction:
- Perform a line search: optimize ,
- Update the position: ,
With a pure quadratic function the minimum is reached within N iterations (excepting roundoff error), but a non-quadratic function will make slower progress. Subsequent search directions lose conjugacy requiring the search direction to be reset to the steepest descent direction at least every N iterations, or sooner if progress stops. However, resetting every iteration turns the method into steepest descent. The algorithm stops when it finds the minimum, determined when no progress is made after a direction reset (i.e. in the steepest descent direction), or when some tolerance criterion is reached.
Within a linear approximation, the parameters and are the same as in the linear conjugate gradient method but have been obtained with line searches. The conjugate gradient method can follow narrow (ill-conditioned) valleys, where the steepest descent method slows down and follows a criss-cross pattern.
Four of the best known formulas for are named after their developers:
These formulas are equivalent for a quadratic function, but for nonlinear optimization the preferred formula is a matter of heuristics or taste. A popular choice is , which provides a direction reset automatically.
Newton-based methods – Newton-Raphson Algorithm, Quasi-Newton methods (e.g., BFGS method) – tend to converge in fewer iterations, although each iteration typically requires more computation than a conjugate gradient iteration, as Newton-like methods require computing the Hessian (matrix of second derivatives) in addition to the gradient. Quasi-Newton methods also require more memory to operate (see also the limited-memory L-BFGS method).
- An Introduction to the Conjugate Gradient Method Without the Agonizing Pain by Jonathan Richard Shewchuk.
- A NONLINEAR CONJUGATE GRADIENT METHOD WITH A STRONG GLOBAL CONVERGENCE PROPERTY by Y. H. DAI and Y. YUAN.
- R. Fletcher and C. M. Reeves, "Function minimization by conjugate gradients", Comput. J. 7 (1964), 149–154.
- E. Polak and G. Ribière, "Note sur la convergence de directions conjugu´ee", Rev. Francaise Informat Recherche Operationelle, 3e Ann´ee 16 (1969), 35–43.
- M. R. Hestenes and E. Stiefel, "Methods of conjugate gradients for solving linear systems", J. Research Nat. Bur. Standards 49 (1952), 409–436 (1953).
- Y.-H. Dai and Y. Yuan, "A nonlinear conjugate gradient method with a strong global convergence property", SIAM J. Optim. 10 (1999), no. 1, 177–182.
- J. R. Shewchuk, "An Introduction to the Conjugate Gradient Method Without the Agonizing Pain", August 1994, https://www.cs.cmu.edu/~quake-papers/painless-conjugate-gradient.pdf.