= Smooth maximum =

In mathematics, a smooth maximum of an indexed family x_{1}, ..., x_{n} of numbers is a smooth approximation to the maximum function $\max(x_1,\ldots,x_n),$ meaning a parametric family of functions $m_\alpha(x_1,\ldots,x_n)$ such that for every $\alpha$, the function $m_\alpha$ is smooth, and the family converges to the maximum function $m_\alpha \to \max$ as $\alpha\to\infty$. The concept of smooth minimum is similarly defined. In many cases, a single family approximates both: maximum as the parameter goes to positive infinity, minimum as the parameter goes to negative infinity; in symbols, $m_\alpha \to \max$ as $\alpha \to \infty$ and $m_\alpha \to \min$ as $\alpha \to -\infty$. The term can also be used loosely for a specific smooth function that behaves similarly to a maximum, without necessarily being part of a parametrized family.

== Examples ==

=== Boltzmann operator ===

For large positive values of the parameter $\alpha > 0$, the following formulation is a smooth, differentiable approximation of the maximum function. For negative values of the parameter that are large in absolute value, it approximates the minimum.

$\mathcal{S}_\alpha (x_1,\ldots,x_n) = \frac{\sum_{i=1}^n x_i e^{\alpha x_i}}{\sum_{i=1}^n e^{\alpha x_i}}$

$\mathcal{S}_\alpha$ has the following properties:
1. $\mathcal{S}_\alpha\to \max$ as $\alpha\to\infty$
2. $\mathcal{S}_0$ is the arithmetic mean of its inputs
3. $\mathcal{S}_\alpha\to \min$ as $\alpha\to -\infty$

The gradient of $\mathcal{S}_{\alpha}$ is closely related to softmax and is given by

$\nabla_{x_i}\mathcal{S}_\alpha (x_1,\ldots,x_n) = \frac{e^{\alpha x_i}}{\sum_{j=1}^n e^{\alpha x_j}} [1 + \alpha(x_i - \mathcal{S}_\alpha (x_1,\ldots,x_n))].$

This makes the softmax function useful for optimization techniques that use gradient descent.

This operator is sometimes called the Boltzmann operator, after the Boltzmann distribution.

=== LogSumExp ===

Another smooth maximum is LogSumExp:

$\mathrm{LSE}_\alpha(x_1, \ldots, x_n) = \frac{1}{\alpha} \log \sum_{i=1}^n \exp \alpha x_i$

This can also be normalized if the $x_i$ are all non-negative, yielding a function with domain $[0,\infty)^n$ and range $[0, \infty)$:
$g(x_1, \ldots, x_n) = \log \left( \sum_{i=1}^n \exp x_i - (n-1) \right)$

The $(n - 1)$ term corrects for the fact that $\exp(0) = 1$ by canceling out all but one zero exponential, and $\log 1 = 0$ if all $x_i$ are zero.

=== Mellowmax ===

The mellowmax operator is defined as follows:
$\mathrm{mm}_\alpha(x) = \frac{1}{\alpha} \log \frac{1}{n} \sum_{i=1}^n \exp \alpha x_i$
It is a non-expansive operator. As $\alpha \to \infty$, it acts like a maximum. As $\alpha \to 0$, it acts like an arithmetic mean. As $\alpha \to -\infty$, it acts like a minimum. This operator can be viewed as a particular instantiation of the quasi-arithmetic mean. It can also be derived from information theoretical principles as a way of regularizing policies with a cost function defined by KL divergence. The operator has previously been utilized in other areas, such as power engineering.

==== Connection between LogSumExp and Mellowmax ====

LogSumExp and Mellowmax are the same function differing by a constant $\frac{\log {n}}{\alpha}$. LogSumExp is always larger than the true max, differing at most from the true max by $\frac{\log {n}}{\alpha}$ in the case where all n arguments are equal and being exactly equal to the true max when all but one argument is $-\infty$. Similarly, Mellowmax is always less than the true max, differing at most from the true max by $\frac{\log {n}}{\alpha}$ in the case where all but one argument is $-\infty$ and being exactly equal to the true max when all n arguments are equal.

=== p-Norm ===

Another smooth maximum is the p-norm:

$\| (x_1, \ldots, x_n) \|_p = \left( \sum_{i=1}^n |x_i|^p \right)^\frac{1}{p}$

which converges to $\| (x_1, \ldots, x_n) \|_\infty = \max_{1\leq i\leq n} |x_i|$ as $p \to \infty$.

An advantage of the p-norm is that it is a norm. As such it is scale invariant (homogeneous): $\| (\lambda x_1, \ldots, \lambda x_n) \|_p = |\lambda| \cdot \| (x_1, \ldots, x_n) \|_p$, and it satisfies the triangle inequality.

=== Smooth maximum unit ===

The following binary operator is called the Smooth Maximum Unit (SMU):
$\begin{align}
\textstyle\max_\varepsilon(a, b)
&= \frac{a + b + |a - b|_\varepsilon}{2} \\
&= \frac{a + b + \sqrt{(a - b)^2 + \varepsilon}}{2}
\end{align}$
where $\varepsilon \geq 0$ is a parameter. As $\varepsilon \to 0$, $|\cdot|_\varepsilon \to |\cdot|$ and thus $\textstyle\max_\varepsilon \to \max$.

==See also==
- LogSumExp
- Softmax function
- Generalized mean
