Wolfe conditions

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In the unconstrained minimization problem, the Wolfe conditions are a set of inequalities for performing inexact line search, especially in quasi-Newton methods, first published by Philip Wolfe in 1969.[1][2]

In these methods the idea is to find

\min_x f(\mathbf{x})

for some smooth f:\mathbb R^n\to\mathbb R. Each step often involves approximately solving the subproblem

\min_{\alpha} f(\mathbf{x}_k + \alpha \mathbf{p}_k)

where \mathbf{x}_k is the current best guess, \mathbf{p}_k \in \mathbb R^n is a search direction, and \alpha \in \mathbb R is the step length.

The inexact line searches provide an efficient way of computing an acceptable step length \alpha that reduces the objective function 'sufficiently', rather than minimizing the objective function over \alpha\in\mathbb R^+ exactly. A line search algorithm can use Wolfe conditions as a requirement for any guessed \alpha, before finding a new search direction \mathbf{p}_k.

Armijo rule and curvature[edit]

Denote a univariate function \phi restricted to the direction \mathbf{p}_k as \phi(\alpha)=f(\mathbf{x}_k+\alpha\mathbf{p}_k). A step length \alpha_k is said to satisfy the Wolfe conditions if the following two inequalities hold:

i) f(\mathbf{x}_k+\alpha_k\mathbf{p}_k)\leq f(\mathbf{x}_k)+c_1\alpha_k\mathbf{p}_k^{\mathrm T}\nabla f(\mathbf{x}_k),
ii) \mathbf{p}_k^{\mathrm T}\nabla f(\mathbf{x}_k+\alpha_k\mathbf{p}_k) \geq c_2\mathbf{p}_k^{\mathrm T}\nabla f(\mathbf{x}_k),

with 0<c_1<c_2<1. (In examining condition (ii), recall that to ensure that \mathbf{p}_k is a descent direction, we have \mathbf{p}_k^{\mathrm T}\nabla f(\mathbf{x}_k) < 0 .)

c_1 is usually chosen to be quite small while c_2 is much larger; Nocedal gives example values of c_1=10^{-4}[3] and c_2=0.9 for Newton or quasi-Newton methods and c_2=0.1 for the nonlinear conjugate gradient method. Inequality i) is known as the Armijo rule [4] and ii) as the curvature condition; i) ensures that the step length \alpha_k decreases f 'sufficiently', and ii) ensures that the slope has been reduced sufficiently.

Strong Wolfe condition on curvature[edit]

The Wolfe conditions, however, can result in a value for the step length that is not close to a minimizer of \phi. If we modify the curvature condition to the following,

iia) \big|\mathbf{p}_k^{\mathrm T}\nabla f(\mathbf{x}_k+\alpha_k\mathbf{p}_k)\big|\leq c_2\big|\mathbf{p}_k^{\mathrm T}\nabla f(\mathbf{x}_k)\big|

then i) and iia) together form the so-called strong Wolfe conditions, and force \alpha_k to lie close to a critical point of \phi.

The principal reason for imposing the Wolfe conditions in an optimization algorithm where  \mathbf{x}_{k+1} = \mathbf{x}_k + \alpha \mathbf{p}_k is to ensure convergence of the gradient to zero. In particular, if the cosine of the angle between \mathbf{p}_k and the gradient,

 \cos \theta_k = \frac {\nabla f(\mathbf{x}_k)^{\mathrm T}\mathbf{p}_k }{\| \nabla f(\mathbf{x}_k)\| \|\mathbf{p}_k\| }

is bounded away from zero and the i) and ii) hold, then  \nabla f(\mathbf{x}_k) \rightarrow 0 .

An additional motivation, in the case of a quasi-Newton method is that if  \mathbf{p}_k = -B_k^{-1} \nabla f(\mathbf{x}_k) , where the matrix  B_k is updated by the BFGS or DFP formula, then if  B_k is positive definite ii) implies  B_{k+1} is also positive definite.

References[edit]

  1. ^ Wolfe, P. (1969). "Convergence Conditions for Ascent Methods". SIAM Review 11 (2): 226–000. doi:10.1137/1011036. JSTOR 2028111.  edit
  2. ^ Wolfe, P. (1971). "Convergence Conditions for Ascent Methods. II: Some Corrections". SIAM Review 13 (2): 185–000. doi:10.1137/1013035.  edit
  3. ^ Nocedal, Jorge; Wright, Stephen (1999). Numerical Optimization. 
  4. ^ Armijo, Larry (1966). "Minimization of functions having Lipschitz continuous first partial derivatives". Pacific J. Math. 16 (1): 1–3.