Hellinger distance

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In probability and statistics, the Hellinger distance is used to quantify the similarity between two probability distributions. It is a type of f-divergence. The Hellinger distance is defined in terms of the Hellinger integral, which was introduced by Ernst Hellinger in 1909.[1][2]


Measure theory[edit]

To define the Hellinger distance in terms of measure theory, let P and Q denote two probability measures that are absolutely continuous with respect to a third probability measure λ. The square of the Hellinger distance between P and Q is defined as the quantity

H^2(P,Q) = \frac{1}{2}\displaystyle \int \left(\sqrt{\frac{dP}{d\lambda}} - \sqrt{\frac{dQ}{d\lambda}}\right)^2 d\lambda.

Here, dP /  and dQ / dλ are the Radon–Nikodym derivatives of P and Q respectively. This definition does not depend on λ, so the Hellinger distance between P and Q does not change if λ is replaced with a different probability measure with respect to which both P and Q are absolutely continuous. For compactness, the above formula is often written as

H^2(P,Q) = \frac{1}{2}\int \left(\sqrt{dP} - \sqrt{dQ}\right)^2.

Probability theory using Lebesgue measure[edit]

To define the Hellinger distance in terms of elementary probability theory, we take λ to be Lebesgue measure, so that dP /  and dQ / dλ are simply probability density functions. If we denote the densities as f and g, respectively, the squared Hellinger distance can be expressed as a standard calculus integral

\frac{1}{2}\int \left(\sqrt{f(x)} - \sqrt{g(x)}\right)^2 dx = 1 - \int \sqrt{f(x) g(x)} \, dx,

where the second form can be obtained by expanding the square and using the fact that the integral of a probability density over its domain must be one.

The Hellinger distance H(PQ) satisfies the property (derivable from the Cauchy-Schwarz inequality)

0\le H(P,Q) \le 1.

Discrete distributions[edit]

For two discrete probability distributions P=(p_1 \ldots p_k) and Q=(q_1 \ldots q_k), their Hellinger distance is defined as

  H(P, Q) = \frac{1}{\sqrt{2}} \; \sqrt{\sum_{i=1}^{k} (\sqrt{p_i} - \sqrt{q_i})^2},

which is directly related to the Euclidean norm of the difference of the square root vectors, i.e.

H(P, Q) = \frac{1}{\sqrt{2}} \; \bigl\|\sqrt{P} - \sqrt{Q} \bigr\|_2 .

Connection with the statistical distance[edit]

The Hellinger distance H(P,Q) and the total variation distance (or statistical distance) \delta(P,Q) are related as follows:[3]

H^2(P,Q) \leq \delta(P,Q) \leq \sqrt 2 H(P,Q)\,.

These inequalities follow immediately from the inequalities between the 1-norm and the 2-norm.


The maximum distance 1 is achieved when P assigns probability zero to every set to which Q assigns a positive probability, and vice versa.

Sometimes the factor 1/2 in front of the integral is omitted, in which case the Hellinger distance ranges from zero to the square root of two.

The Hellinger distance is related to the Bhattacharyya coefficient BC(P,Q) as it can be defined as

H(P,Q) = \sqrt{1 - BC(P,Q)}.

Hellinger distances are used in the theory of sequential and asymptotic statistics.[4][5]


The squared Hellinger distance between two normal distributions \scriptstyle P\,\sim\,\mathcal{N}(\mu_1,\sigma_1^2) and \scriptstyle Q\,\sim\,\mathcal{N}(\mu_2,\sigma_2^2) is:

  H^2(P, Q) = 1 - \sqrt{\frac{2\sigma_1\sigma_2}{\sigma_1^2+\sigma_2^2}} \,  e^{-\frac{1}{4}\frac{(\mu_1-\mu_2)^2}{\sigma_1^2+\sigma_2^2}}.

The squared Hellinger distance between two exponential distributions \scriptstyle P\,\sim \,\rm{Exp}(\alpha) and \scriptstyle Q\,\sim\,\rm{Exp}(\beta) is:

  H^2(P, Q) = 1 - \frac{2 \sqrt{\alpha \beta}}{\alpha + \beta}.

The squared Hellinger distance between two Weibull distributions \scriptstyle P\,\sim \,\rm{W}(k,\alpha) and \scriptstyle Q\,\sim\,\rm{W}(k,\beta) (where  k is a common shape parameter and  \alpha\, , \beta are the scale parameters respectively):

  H^2(P, Q) = 1 - \frac{2 (\alpha \beta)^{k/2}}{\alpha^k + \beta^k}.

The squared Hellinger distance between two Poisson distributions with rate parameters \alpha and \beta, so that \scriptstyle P\,\sim \,\rm{Poisson}(\alpha) and \scriptstyle Q\,\sim\,\rm{Poisson}(\beta), is:

  H^2(P,Q) = 1-e^{-\frac{1}{2}(\sqrt{\alpha} - \sqrt{\beta})^2}.

See also[edit]


  1. ^ Nikulin, M.S. (2001), "Hellinger distance", in Hazewinkel, Michiel, Encyclopedia of Mathematics, Springer, ISBN 978-1-55608-010-4 
  2. ^ Hellinger, Ernst (1909), "Neue Begründung der Theorie quadratischer Formen von unendlichvielen Veränderlichen", Journal für die reine und angewandte Mathematik (in German) 136: 210–271, JFM 40.0393.01 
  3. ^ Harsha's lecture notes on communication complexity
  4. ^ Erik Torgerson (1991) Comparison of Statistical Experiments, volume 36 of Encyclopedia of Mathematics. Cambridge University Press.
  5. ^ Liese, Friedrich and Miescke, Klaus-J. (2008). Statistical Decision Theory: Estimation, Testing, and Selection. Springer. ISBN 0-387-73193-8. 


  • Yang, Grace Lo; Le Cam, Lucien M. (2000). Asymptotics in Statistics: Some Basic Concepts. Berlin: Springer. ISBN 0-387-95036-2. 
  • Vaart, A. W. van der. Asymptotic Statistics (Cambridge Series in Statistical and Probabilistic Mathematics). Cambridge, UK: Cambridge University Press. ISBN 0-521-78450-6. 
  • Pollard, David E. (2002). A user's guide to measure theoretic probability. Cambridge, UK: Cambridge University Press. ISBN 0-521-00289-3.