# Coordinate descent

Coordinate descent is a non-derivative optimization algorithm. To find a local minimum of a function, one does line search along one coordinate direction at the current point in each iteration. One uses different coordinate directions cyclically throughout the procedure. On non-separable functions the algorithm may fail to find the optimum in a reasonable number of function evaluations. To improve the convergence an appropriate coordinate system can be gradually learned, such that new search coordinates obtained using PCA are as decorrelated as possible with respect to the objective function (see Adaptive coordinate descent for more details).

## Description

Coordinate descent is based on the idea that the minimization of a multivariable function $F(\mathbf{x})$ can be achieved by minimizing it along one direction at a time. Instead of varying descent direction according to gradient, one fixes descent direction at the outset. For instance, one chooses some basis as the search directions: $\mathbf{e}_1, \mathbf{e}_2, \dots, \mathbf{e}_n$. One cyclically iterates through each direction, one at a time, minimizing the objective function with respect to that coordinate direction. It follows that, if $\mathbf{x}^k$ is given, the $i$th coordinate of $\mathbf{x}^{k+1}$ is given by

$\mathbf{x}^{k+1}_i = \underset{y\in\mathbb R}{\operatorname{arg\,min}}\; f(x^{k+1}_1,...,x^{k+1}_{i-1},y,x^k_{i+1},...,x^k_n);$

Thus, one begins with an initial guess $\mathbf{x}^0$ for a local minimum of $F$, and get a sequence $\mathbf{x}^0, \mathbf{x}^1, \mathbf{x}^2, \dots$ iteratively.

By doing line search in each iteration, We automatically have

$F(\mathbf{x}^0)\ge F(\mathbf{x}^1)\ge F(\mathbf{x}^2)\ge \cdots,$

It can be shown that this sequence has similar convergence properties as steepest descent. No improvement after one cycle of line search along coordinate directions implies a stationary point is reached.

This process is illustrated below.

### Examples

Coordinate descent has problems with non-smooth functions. The following picture shows that coordinate descent iteration may get stuck at a non-stationary point if the level curves of a function are not smooth.

## Applications

Coordinate descent algorithms are used in machine learning, e.g. for training linear support vector machines[1] (see LIBLINEAR) and non-negative matrix factorization.[2]