= Huber loss =

In statistics, the Huber loss is a loss function used in robust regression, that is less sensitive to outliers in data than the squared error loss. A variant for classification is also sometimes used.

==Definition==

The Huber loss function describes the penalty incurred by an estimation procedure f. Huber (1964) defines the loss function piecewise by
$L_\delta (a) = \begin{cases}
 \frac{1}{2}{a^2} & \text{for } |a| \le \delta, \\[4pt]
 \delta \cdot \left(|a| - \frac{1}{2}\delta\right), & \text{otherwise.}
\end{cases}$

This function is quadratic for small values of a, and linear for large values, with equal values and slopes of the different sections at the two points where $|a| = \delta$. The variable a often refers to the residuals, that is to the difference between the observed and predicted values $a = y - f(x)$, so the former can be expanded to

$L_\delta(y, f(x)) = \begin{cases}
 \frac{1}{2} {\left(y - f(x)\right)}^2 & \text{for } \left|y - f(x)\right| \le \delta, \\[4pt]
 \delta\ \cdot \left(\left|y - f(x)\right| - \frac{1}{2}\delta\right), & \text{otherwise.}
\end{cases}$

The Huber loss is the convolution of the absolute value function with the rectangular function, scaled and translated. Thus it "smoothens out" the former's corner at the origin.

==Motivation==
Two very commonly used loss functions are the squared loss, $L(a) = a^2$, and the absolute loss, $L(a)=|a|$. The squared loss function results in an arithmetic mean-unbiased estimator, and the absolute-value loss function results in a median-unbiased estimator (in the one-dimensional case, and a geometric median-unbiased estimator for the multi-dimensional case). The squared loss has the disadvantage that it has the tendency to be dominated by outliers—when summing over a set of $a$'s (as in $\sum_{i=1}^n L(a_i)$), the sample mean is influenced too much by a few particularly large $a$-values when the distribution is heavy tailed: in terms of estimation theory, the asymptotic relative efficiency of the mean is poor for heavy-tailed distributions.

As defined above, the Huber loss function is strongly convex in a uniform neighborhood of its minimum $a=0$; at the boundary of this uniform neighborhood, the Huber loss function has a differentiable extension to an affine function at points $a=-\delta$ and $a = \delta$. These properties allow it to combine much of the sensitivity of the mean-unbiased, minimum-variance estimator of the mean (using the quadratic loss function) and the robustness of the median-unbiased estimator (using the absolute value function).

==Pseudo-Huber loss function==

The Pseudo-Huber loss function can be used as a smooth approximation of the Huber loss function. It combines the best properties of L2 squared loss and L1 absolute loss by being strongly convex when close to the target/minimum and less steep for extreme values. The scale at which the Pseudo-Huber loss function transitions from L2 loss for values close to the minimum to L1 loss for extreme values and the steepness at extreme values can be controlled by the $\delta$ value. The Pseudo-Huber loss function ensures that derivatives are continuous for all degrees. It is defined as

$L_\delta (a) = \delta^2\left(\sqrt{1+(a/\delta)^2}-1\right).$

As such, this function approximates $a^2/2$ for small values of $a$, and approximates a straight line with slope $\delta$ for large values of $a$.

While the above is the most common form, other smooth approximations of the Huber loss function also exist.

==Variant for classification==
For classification purposes, a variant of the Huber loss called modified Huber is sometimes used. Given a prediction $f(x)$ (a real-valued classifier score) and a true binary class label $y \in \{+1, -1\}$, the modified Huber loss is defined as

$L(y, f(x)) = \begin{cases}
 \max(0, 1 - y \, f(x))^2 & \text{for }\, \, y \, f(x) > -1, \\[4pt]
 -4y \, f(x) & \text{otherwise.}
\end{cases}$

The term $\max(0, 1 - y \, f(x))$ is the hinge loss used by support vector machines; the quadratically smoothed hinge loss is a generalization of $L$.

==Applications==
The Huber loss function is used in robust statistics, M-estimation and additive modelling.

==See also==
- Winsorizing
- Robust regression
- M-estimator
- Visual comparison of different M-estimators
