Frank–Wolfe algorithm

From Wikipedia, the free encyclopedia
  (Redirected from Frank-Wolfe algorithm)
Jump to: navigation, search

In mathematical optimization, the reduced gradient method of Frank and Wolfe is an iterative method for nonlinear programming. Also known as the Frank–Wolfe algorithm and the convex combination algorithm, the reduced gradient method was proposed by Marguerite Frank and Phil Wolfe in 1956[1] as an algorithm for quadratic programming. In phase one, the reduced gradient method finds a feasible solution to the linear constraints, if one exists. Thereafter, at each iteration, the method takes a descent step in the negative gradient direction, so reducing the objective function; this gradient descent step is "reduced" to remain in the polyhedral feasible region of the linear constraints. Because quadratic programming is a generalization of linear programming, the reduced gradient method is a generalization of Dantzig's simplex algorithm for linear programming.

The reduced gradient method is an iterative method for nonlinear programming, a method that need not be restricted to quadratic programming. While the method is slower than competing methods and has been abandoned as a general purpose method of nonlinear programming, it remains widely used for specially structured problems of large scale optimization. In particular, the reduced gradient method remains popular and effective for finding approximate minimum–cost flows in transportation networks, which often have enormous size.

Contents

[edit] Problem statement

Minimize  f(\mathbf{x}) = \frac{1}{2} \mathbf{x}^{\mathrm{T}} E\mathbf{x} +  \mathbf{h}^{\mathrm{T}} \mathbf{x}
subject to  \mathbf{x} \epsilon \mathbf{P}.

Where the n×n matrix E is positive semidefinite, h is an n×1 vector, and \mathbf{P} represents a feasible region defined by a mix of linear inequality and equality constraints (for example Ax ≤ b, Cx = d).

[edit] Algorithm

Step 1. Initialization. Let k \leftarrow 0 and let x_k \! be any point in \mathbf{P}.

Step 2. Convergence test. If  \nabla f(\mathbf{x})=\frac{1}{2}(E+E^T)\mathbf{x}+\mathbf{h}=\mathbf{0} then Stop, we have found the minimum.

Step 3. Direction-finding subproblem. The approximation of the problem that is obtained by replacing the function f with its first-order Taylor expansion around x_k \! is found. Solve for \bar{x}_k:

Minimize f(x_k) + \nabla^T f(x_k) \bar{x}_k
Subject to \bar{x}_k \epsilon \mathbf{P}
(note that this is a Linear Program. x_k \! is fixed during Step 3, while the minimization takes place by varying \bar{x}_k and is equivalent to minimization of \nabla^T f(x_k) \bar{x}_k).

Step 4. Step size determination. Find \lambda \! that minimizes  f(x_k+\lambda(\bar{x}_k-x_k)) subject to 0 \le \lambda \le 1 . If \nabla f(x_k)^T(\bar{x}_k-x_k) \ge 0 \! then Stop, we have found the minimum in  x_k\!.

Step 5. Update. Let x_{k+1}\leftarrow x_k+\lambda(\bar{x}_k-x_k), let k \leftarrow k+1 and go back to Step 2.

[edit] Comments

The algorithm generally makes good progress towards the optimum during the first few iterations, but convergence often slows down substantially when close to the minimum point. For this reason the algorithm is perhaps best used to find an approximate solution. It can be shown that the worst case convergence rate is sublinear; however, in practice the convergence rate has been observed to improve in case of many constraints.[2]

[edit] References

     (The formulation(2) in this article should take away f(xk) to make the following discussion valid and understandable.)

[edit] Notes

  1. ^ Frank, M.; Wolfe, P. (1956). "An algorithm for quadratic programming". Naval Research Logistics Quarterly 3 (1-2): 95–110. doi:10.1002/nav.3800030109. 
  2. ^ "Nonlinear Programming", Dimitri Bertsekas, 2003, page 222. Athena Scientific, ISBN 1-886529-00-0.
Personal tools
Namespaces

Variants
Actions
Navigation
Interaction
Toolbox
Print/export
Languages