# Backtracking line search

In (unconstrained) minimization, a backtracking line search, a search scheme based on the Armijo–Goldstein condition, is a line search method to determine the maximum amount to move along a given search direction. It involves starting with a relatively large estimate of the step size for movement along the search direction, and iteratively shrinking the step size (i.e., "backtracking") until a decrease of the objective function is observed that adequately corresponds to the decrease that is expected, based on the local gradient of the objective function.

## Motivation

Given a starting position ${\displaystyle \mathbf {x} }$ and a search direction ${\displaystyle \mathbf {p} }$, the task of a line search is to determine a step size ${\displaystyle \alpha }$ that adequately reduces the objective function ${\displaystyle f:\mathbb {R} ^{n}\to \mathbb {R} }$ (assumed smooth), i.e., to find a value of ${\displaystyle \alpha }$ that reduces ${\displaystyle f(\mathbf {x} +\alpha \,\mathbf {p} )}$ relative to ${\displaystyle f(\mathbf {x} )}$. However, it is usually undesirable to devote substantial resources to finding a value of ${\displaystyle \alpha }$ to precisely minimize ${\displaystyle f}$. This is because the computing resources needed to find a more precise minimum along one particular direction could instead be employed to identify a better search direction. Once an improved starting point has been identified by the line search, another subsequent line search will ordinarily be performed in a new direction. The goal, then, is just to identify a value of ${\displaystyle \alpha }$ that provides a reasonable amount of improvement in the objective function, rather than to find the actual minimizing value of ${\displaystyle \alpha }$.

The backtracking line search starts with a large estimate of ${\displaystyle \alpha }$ and iteratively shrinks it. The shrinking continues until a value is found that is small enough to provide a decrease in the objective function that adequately matches the decrease that is expected to be achieved, based on the local function gradient ${\displaystyle \nabla f(\mathbf {x} )\,.}$

Define the local slope of the function of ${\displaystyle \alpha }$ along the search direction ${\displaystyle \mathbf {p} }$ as ${\displaystyle m=\mathbf {p} ^{\mathrm {T} }\,\nabla f(\mathbf {x} )\,.}$ It is assumed that ${\displaystyle \mathbf {p} }$ is a unit vector in a direction in which some local decrease is possible, i.e., it is assumed that ${\displaystyle m<0}$.

Based on a selected control parameter ${\displaystyle c\,\in \,(0,1)}$, the Armijo–Goldstein condition tests whether a step-wise movement from a current position ${\displaystyle \mathbf {x} }$ to a modified position ${\displaystyle \mathbf {x} +\alpha \,\mathbf {p} }$ achieves an adequately corresponding decrease in the objective function. The condition is fulfilled if ${\displaystyle f(\mathbf {x} +\alpha \,\mathbf {p} )\leq f(\mathbf {x} )+\alpha \,c\,m\,.}$

This condition, when used appropriately as part of a line search, can ensure that the step size is not excessively large. However, this condition is not sufficient on its own to ensure that the step size is nearly optimal, since any value of ${\displaystyle \displaystyle \alpha }$ that is sufficiently small will satisfy the condition.

Thus, the backtracking line search strategy starts with a relatively large step size, and repeatedly shrinks it by a factor ${\displaystyle \tau \,\in \,(0,1)}$ until the Armijo–Goldstein condition is fulfilled.

The search will terminate after a finite number of steps for any positive values of ${\displaystyle c}$ and ${\displaystyle \tau }$ that are less than 1. For example, Armijo used ​12 for both ${\displaystyle c}$ and ${\displaystyle \tau }$ in a paper he published in 1966.

## Algorithm

Starting with a maximum candidate step size value ${\displaystyle \alpha _{0}>0\,}$, using search control parameters ${\displaystyle \tau \,\in \,(0,1)}$ and ${\displaystyle c\,\in \,(0,1)}$, the backtracking line search algorithm can be expressed as follows:

1. Set ${\displaystyle t=-c\,m}$ and iteration counter ${\displaystyle j\,=\,0}$.
2. Until the condition is satisfied that ${\displaystyle f(\mathbf {x} )-f(\mathbf {x} +\alpha _{j}\,\mathbf {p} )\geq \alpha _{j}\,t,}$ repeatedly increment ${\displaystyle j}$ and set ${\displaystyle \alpha _{j}=\tau \,\alpha _{j-1}\,.}$
3. Return ${\displaystyle \alpha _{j}}$ as the solution.

In other words, reduce ${\displaystyle \alpha _{0}}$ by a factor of ${\displaystyle \tau \,}$ in each iteration until the Armijo–Goldstein condition is fulfilled.

## References

• Armijo, Larry (1966). "Minimization of functions having Lipschitz continuous first partial derivatives". Pacific J. Math. 16 (1): 1–3. doi:10.2140/pjm.1966.16.1.
• Dennis, J. E.; Schnabel, R. B. (1996). Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Philadelphia: SIAM Publications. ISBN 978-0-898713-64-0.
• Nocedal, Jorge; Wright, Stephen J. (2000), Numerical Optimization, Springer-Verlag, ISBN 0-387-98793-2