# Augmented Lagrangian method

Augmented Lagrangian methods are a certain class of algorithms for solving constrained optimization problems. They have similarities to penalty methods in that they replace a constrained optimization problem by a series of unconstrained problems and add a penalty term to the objective; the difference is that the augmented Lagrangian method adds yet another term, designed to mimic a Lagrange multiplier. The augmented Lagrangian is related to, but not identical with the method of Lagrange multipliers.

Viewed differently, the unconstrained objective is the Lagrangian of the constrained problem, with an additional penalty term (the augmentation).

The method was originally known as the method of multipliers, and was studied much in the 1970 and 1980s as a good alternative to penalty methods. It was first discussed by Magnus Hestenes,[1] and by Michael Powell in 1969.[2] The method was studied by R. Tyrrell Rockafellar in relation to Fenchel duality, particularly in relation to proximal-point methods, Moreau–Yosida regularization, and maximal monotone operators: These methods were used in structural optimization. The method was also studied by Dimitri Bertsekas, notably in his 1982 book,[3] together with extensions involving nonquadratic regularization functions, such as entropic regularization, which gives rise to the "exponential method of multipliers," a method that handles inequality constraints with a twice differentiable augmented Lagrangian function.

Since the 1970s, sequential quadratic programming (SQP) and interior point methods (IPM) have had increasing attention, in part because they more easily use sparse matrix subroutines from numerical software libraries, and in part because IPMs have proven complexity results via the theory of self-concordant functions. The augmented Lagrangian method was rejuvenated by the optimization systems LANCELOT, ALGENCAN[4][5] and AMPL, which allowed sparse matrix techniques to be used on seemingly dense but "partially separable" problems. The method is still useful for some problems.[6] Around 2007, there was a resurgence of augmented Lagrangian methods in fields such as total-variation denoising and compressed sensing. In particular, a variant of the standard augmented Lagrangian method that uses partial updates (similar to the Gauss–Seidel method for solving linear equations) known as the alternating direction method of multipliers or ADMM gained some attention.

## General method

Let us say we are solving the following constrained problem:

${\displaystyle \min f(\mathbf {x} )}$

subject to

${\displaystyle c_{i}(\mathbf {x} )=0~\forall i\in {\mathcal {E}},}$

where ${\displaystyle {\mathcal {E}}}$ denotes the indices for equality constraints. This problem can be solved as a series of unconstrained minimization problems. For reference, we first list the kth step of the penalty method approach:

${\displaystyle \min \Phi _{k}(\mathbf {x} )=f(\mathbf {x} )+\mu _{k}~\sum _{i\in {\mathcal {E}}}~c_{i}(\mathbf {x} )^{2}.}$

The penalty method solves this problem, then at the next iteration it re-solves the problem using a larger value of ${\displaystyle \mu _{k}}$ (and using the old solution as the initial guess or "warm-start").

The augmented Lagrangian method uses the following unconstrained objective:

${\displaystyle \min \Phi _{k}(\mathbf {x} )=f(\mathbf {x} )+{\frac {\mu _{k}}{2}}~\sum _{i\in {\mathcal {E}}}~c_{i}(\mathbf {x} )^{2}+\sum _{i\in {\mathcal {E}}}~\lambda _{i}c_{i}(\mathbf {x} )}$

and after each iteration, in addition to updating ${\displaystyle \mu _{k}}$, the variable ${\displaystyle \lambda }$ is also updated according to the rule

${\displaystyle \lambda _{i}\leftarrow \lambda _{i}+\mu _{k}c_{i}(\mathbf {x} _{k})}$

where ${\displaystyle \mathbf {x} _{k}}$ is the solution to the unconstrained problem at the kth step, i.e. ${\displaystyle \mathbf {x} _{k}={\text{argmin}}\Phi _{k}(\mathbf {x} )}$

The variable ${\displaystyle \lambda }$ is an estimate of the Lagrange multiplier, and the accuracy of this estimate improves at every step. The major advantage of the method is that unlike the penalty method, it is not necessary to take ${\displaystyle \mu \rightarrow \infty }$ in order to solve the original constrained problem. Instead, because of the presence of the Lagrange multiplier term, ${\displaystyle \mu }$ can stay much smaller, thus avoiding ill-conditioning.[6] Nevertheless, it is common in practical implementations to project multipliers estimates in a large bounded set (safeguards), avoiding numerical instabilities and leading to a strong theoretical convergence.[5]

The method can be extended to handle inequality constraints. For a discussion of practical improvements, see.[6][5]

## Alternating direction method of multipliers

The alternating direction method of multipliers (ADMM) is a variant of the augmented Lagrangian scheme that uses partial updates for the dual variables. This method is often applied to solve problems such as

${\displaystyle \min _{x}f(x)+g(x).}$

This is equivalent to the constrained problem

${\displaystyle \min _{x,y}f(x)+g(y),\quad {\text{subject to}}\quad x=y.}$

Though this change may seem trivial, the problem can now be attacked using methods of constrained optimization (in particular, the augmented Lagrangian method), and the objective function is separable in x and y. The dual update requires solving a proximity function in x and y at the same time; the ADMM technique allows this problem to be solved approximately by first solving for x with y fixed, and then solving for y with x fixed. Rather than iterate until convergence (like the Jacobi method), the algorithm proceeds directly to updating the dual variable and then repeating the process. This is not equivalent to the exact minimization, but it can still be shown that this method converges to the right answer under some assumptions. Because of this approximation, the algorithm is distinct from the pure augmented Lagrangian method.

The ADMM can be viewed as an application of the Douglas-Rachford splitting algorithm, and the Douglas-Rachford algorithm is in turn an instance of the Proximal point algorithm; details can be found here.[7] There are several modern software packages that solve Basis pursuit and variants and use the ADMM; such packages include YALL1[8] (2009), SpaRSA[9] (2009) and SALSA[10] (2009). There are also packages that use the ADMM to solve more general problems, some of which can exploit multiple computing cores SNAPVX[11] (2015), parADMM[12] (2016).

## Stochastic optimization

Stochastic optimization considers the problem of minimizing a loss function with access to noisy samples of the (gradient of the) function. The goal is to have an estimate of the optimal parameter (minimizer) per new sample. ADMM is originally a batch method. However, with some modifications it can also be used for stochastic optimization. Since in stochastic setting we only have access to noisy samples of gradient, we use an inexact approximation of the Lagrangian as

${\displaystyle {\hat {\mathcal {L}}}_{\rho ,k}=f_{1}(x_{k})+\langle \nabla f(x_{k},\zeta _{k+1}),x\rangle +g(y)-z^{T}(Ax+By-c)+{\frac {\rho }{2}}\Vert Ax+By-c\Vert ^{2}+{\frac {\Vert x-x_{k}\Vert ^{2}}{2\eta _{k+1}}},}$

where ${\displaystyle \eta _{k+1}}$ is a time-varying step size.[13]

The alternating direction method of multipliers (ADMM) is a popular method for online and distributed optimization on a large scale,[14] and is employed in many applications, e.g.[15][16][17] ADMM is often applied to solve regularized problems, where the function optimization and regularization can be carried out locally, and then coordinated globally via constraints. Regularized optimization problems are especially relevant in the high dimensional regime since regularization is a natural mechanism to overcome ill-posedness and to encourage parsimony in the optimal solution, e.g., sparsity and low rank. Due to the efficiency of ADMM in solving regularized problems, it has a good potential for stochastic optimization in high dimensions.

## Software

Open source and non-free/commercial implementations of the augmented Lagrangian method:

• Accord.NET (C# implementation of augmented Lagrangian optimizer)
• ALGLIB (C# and C++ implementations of preconditioned augmented Lagrangian solver)
• PENNON (GPL 3, commercial license available)
• LANCELOT (free "internal use" license, paid commercial options)
• MINOS (also uses an augmented Lagrangian method for some types of problems).
• The code for Apache 2.0 licensed REASON is available online.[18]
• ALGENCAN (Fortran implementation of augmented Lagrangian method with safeguards). Available online.[19]

## References

1. ^ Hestenes, M. R. (1969). "Multiplier and gradient methods". Journal of Optimization Theory and Applications. 4 (5): 303–320. doi:10.1007/BF00927673. S2CID 121584579.
2. ^ Powell, M. J. D. (1969). "A method for nonlinear constraints in minimization problems". In Fletcher, R. (ed.). Optimization. New York: Academic Press. pp. 283–298. ISBN 0-12-260650-7.
3. ^ Bertsekas, Dimitri P. (1996) [1982]. Constrained optimization and Lagrange multiplier methods. Athena Scientific.
4. ^ Andreani, R.; Birgin, E. G.; Martínez, J. M.; Schuverdt, M. L. (2007). "On Augmented Lagrangian Methods with General Lower-Level Constraints". SIAM Journal on Optimization. 18 (4): 1286–1309. doi:10.1137/060654797.
5. ^ a b c Birgin & Martínez (2014)
6. ^ a b c Nocedal & Wright (2006), chapter 17
7. ^ Eckstein, J.; Bertsekas, D. P. (1992). "On the Douglas—Rachford splitting method and the proximal point algorithm for maximal monotone operators". Mathematical Programming. 55 (1–3): 293–318. CiteSeerX 10.1.1.141.6246. doi:10.1007/BF01581204. S2CID 15551627.
8. ^ "YALL1: Your ALgorithms for L1". yall1.blogs.rice.edu.
9. ^ "SpaRSA". www.lx.it.pt.
10. ^ "(C)SALSA: A Solver for Convex Optimization Problems in Image Recovery". cascais.lx.it.pt.
11. ^ "SnapVX". snap.stanford.edu.
12. ^ "parADMM/engine". February 6, 2021 – via GitHub.
13. ^ Ouyang, H.; He, N.; Tran, L. & Gray, A. G (2013). "Stochastic alternating direction method of multipliers". Proceedings of the 30th International Conference on Machine Learning: 80–88.
14. ^ Boyd, S.; Parikh, N.; Chu, E.; Peleato, B. & Eckstein, J. (2011). "Distributed optimization and statistical learning via the alternating direction method of multipliers". Foundations and Trends{\textregistered} in Machine Learning. 3 (1): 1–122. CiteSeerX 10.1.1.360.1664. doi:10.1561/2200000016.
15. ^ Wahlberg, B.; Boyd, S.; Annergren, M.; Wang, Y. (2012). "An ADMM algorithm for a class of total variation regularized estimation problems". arXiv:1203.1828 [stat.ML].
16. ^ Esser, E.; Zhang, X.; Chan, T. (2010). "A general framework for a class of first order primal-dual algorithms for convex optimization in imaging science". SIAM Journal on Imaging Sciences. 3 (4): 1015–1046. doi:10.1137/09076934X.
17. ^ Mota, J. FC; Xavier, J. MF; Aguiar, P. MQ; Puschel, M. (2012). "Distributed ADMM for model predictive control and congestion control". Decision and Control (CDC), 2012 IEEE 51st Annual Conference O: 5110–5115. doi:10.1109/CDC.2012.6426141. ISBN 978-1-4673-2066-5. S2CID 12128421.
18. ^ "Bitbucket". bitbucket.org.
19. ^ "TANGO Project". www.ime.usp.br.