Huber loss function

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In statistical theory, the Huber loss function is a function used in robust estimation that allows construction of an estimate which allows the effect of outliers to be reduced, while treating non-outliers in a more standard way.


The Huber loss function describes the penalty incurred by an estimation procedure. Huber (1964[1]) defines the loss function piecewise by

 L_\delta (a) = (1/2){a^2} \qquad \qquad \text{  for } |a| \le \delta ,
 L_\delta (a)             = \delta (|a| - \delta/2 ), \qquad \text{otherwise}.

This function is quadratic for small values of a, and linear for large values, with equal values and slopes of the different sections at the two points where |a| = δ. In use, the variable a often refers to the residuals, that is to the difference between the observed and predicted values, i.e.  a = y - \hat{y} .


For estimating parameters, it is desirable for a loss function to have the following properties (for all values of a of the parameter space):

  1. It is greater than or equal to the 0-1 loss function (which is defined as L(a)=0 if a=0 and L(a)=1 otherwise).
  2. It is continuous (or lower semicontinuous).

Two very commonly used loss functions are the squared loss, L(a) = a^2, and the absolute loss, L(a)=|a|. While the absolute loss is not differentiable at exactly one point, a=0, where it is subdifferentiable with its convex subdifferential equal to the interval [-1+1]; the absolute-value loss function results in a median-unbiased estimator, which can be evaluated for particular data sets by linear programming. The squared loss has the disadvantage that it has the tendency to be dominated by outliers---when summing over a set of a's (as in \sum_{i=1}^n L(a_i) ), the sample mean is influenced too much by a few particularly large a-values when the distribution is heavy tailed: in terms of estimation theory, the asymptotic relative efficiency of the mean is poor for heavy-tailed distributions

As defined above, the Huber loss function is convex in a uniform neighborhood of its minimum a=0, at the boundary of this uniform neighborhood, the Huber loss function has a differentiable extension to an affine function at points  a=-\delta and  a = \delta . These properties allow it to combine much of the sensitivity of the mean-unbiased, minimum-variance estimator of the mean (using the quadratic loss function) and the robustness of the median-unbiased estimor (using the absolute value function).

The log cosh loss function, which is defined as  L(a) = \log(\cosh(a)) has a behavior like that of the Huber loss function.

Pseudo-Huber loss function[edit]

The Pseudo-Huber loss function can be used as a smooth approximation of the Huber loss function, and ensures that derivatives are continuous for all degrees. It is defined as[citation needed]

L_\delta (a) = \delta^2(\sqrt{1+(a/\delta)^2}-1).

As such, this function approximates a^2/2 for small values of a, and is parallel with slope \delta for large values of a.


The Huber loss function is used in robust statistics, M-estimation and additive modelling.[2]

See also[edit]


  1. ^ Huber, Peter J. (1964), "Robust Estimation of a Location Parameter", Annals of Statistics 53: 73–101 
  2. ^ Friedman, J. H. (2001), "Greedy Function Approximation: A Gradient Boosting Machine", The Annals of Statistics, Vol. 26, No.5 (Oct. 2001), 1189-1232