# Hellinger distance

In probability and statistics, the Hellinger distance (closely related to, although different from, the Bhattacharyya distance) is used to quantify the similarity between two probability distributions. It is a type of f-divergence. The Hellinger distance is defined in terms of the Hellinger integral, which was introduced by Ernst Hellinger in 1909.

## Definition

### Measure theory

To define the Hellinger distance in terms of measure theory, let P and Q denote two probability measures that are absolutely continuous with respect to a third probability measure λ. The square of the Hellinger distance between P and Q is defined as the quantity

$H^{2}(P,Q)={\frac {1}{2}}\displaystyle \int \left({\sqrt {\frac {dP}{d\lambda }}}-{\sqrt {\frac {dQ}{d\lambda }}}\right)^{2}d\lambda .$ Here, dP /  and dQ / dλ are the Radon–Nikodym derivatives of P and Q respectively. This definition does not depend on λ, so the Hellinger distance between P and Q does not change if λ is replaced with a different probability measure with respect to which both P and Q are absolutely continuous. For compactness, the above formula is often written as

$H^{2}(P,Q)={\frac {1}{2}}\int \left({\sqrt {dP}}-{\sqrt {dQ}}\right)^{2}.$ ### Probability theory using Lebesgue measure

To define the Hellinger distance in terms of elementary probability theory, we take λ to be Lebesgue measure, so that dP /  and dQ / dλ are simply probability density functions. If we denote the densities as f and g, respectively, the squared Hellinger distance can be expressed as a standard calculus integral

$H^{2}(P,Q)={\frac {1}{2}}\int \left({\sqrt {f(x)}}-{\sqrt {g(x)}}\right)^{2}\,dx=1-\int {\sqrt {f(x)g(x)}}\,dx,$ where the second form can be obtained by expanding the square and using the fact that the integral of a probability density over its domain equals 1.

The Hellinger distance H(PQ) satisfies the property (derivable from the Cauchy–Schwarz inequality)

$0\leq H(P,Q)\leq 1.$ ### Discrete distributions

For two discrete probability distributions $P=(p_{1},\ldots ,p_{k})$ and $Q=(q_{1},\ldots ,q_{k})$ , their Hellinger distance is defined as

$H(P,Q)={\frac {1}{\sqrt {2}}}\;{\sqrt {\sum _{i=1}^{k}({\sqrt {p_{i}}}-{\sqrt {q_{i}}})^{2}}},$ which is directly related to the Euclidean norm of the difference of the square root vectors, i.e.

$H(P,Q)={\frac {1}{\sqrt {2}}}\;{\bigl \|}{\sqrt {P}}-{\sqrt {Q}}{\bigr \|}_{2}.$ Also, $1-H^{2}(P,Q)=\sum _{i=1}^{k}({\sqrt {p_{i}q_{i}}}).$ ## Connection with the statistical distance

The Hellinger distance $H(P,Q)$ and the total variation distance (or statistical distance) $\delta (P,Q)$ are related as follows:

$H^{2}(P,Q)\leq \delta (P,Q)\leq {\sqrt {2}}H(P,Q)\,.$ These inequalities follow immediately from the inequalities between the 1-norm and the 2-norm.

## Properties

The Hellinger distance forms a bounded metric on the space of probability distributions over a given probability space.

The maximum distance 1 is achieved when P assigns probability zero to every set to which Q assigns a positive probability, and vice versa.

Sometimes the factor $1/2$ in front of the integral is omitted, in which case the Hellinger distance ranges from zero to the square root of two.

The Hellinger distance is related to the Bhattacharyya coefficient $BC(P,Q)$ as it can be defined as

$H(P,Q)={\sqrt {1-BC(P,Q)}}.$ Hellinger distances are used in the theory of sequential and asymptotic statistics.

The squared Hellinger distance between two normal distributions $P\,\sim \,{\mathcal {N}}(\mu _{1},\sigma _{1}^{2})$ and $Q\,\sim \,{\mathcal {N}}(\mu _{2},\sigma _{2}^{2})$ is:

$H^{2}(P,Q)=1-{\sqrt {\frac {2\sigma _{1}\sigma _{2}}{\sigma _{1}^{2}+\sigma _{2}^{2}}}}\,e^{-{\frac {1}{4}}{\frac {(\mu _{1}-\mu _{2})^{2}}{\sigma _{1}^{2}+\sigma _{2}^{2}}}}.$ The squared Hellinger distance between two multivariate normal distributions $P\,\sim \,{\mathcal {N}}(\mu _{1},\sum _{1})$ and $Q\,\sim \,{\mathcal {N}}(\mu _{2},\sum _{2})$ is :

$H^{2}(P,Q)=1-{\frac {\det(\sum _{1})^{1/4}\det(\sum _{2})^{1/4}}{\det \left({\frac {\sum _{1}+\sum _{2}}{2}}\right)^{1/2}}}\exp \left\{-{\frac {1}{8}}(\mu _{1}-\mu _{2})^{T}\left({\frac {\sum _{1}+\sum _{2}}{2}}\right)^{-1}(\mu _{1}-\mu _{2})\right\}$ The squared Hellinger distance between two exponential distributions $P\,\sim \,{\rm {{Exp}(\alpha )}}$ and $Q\,\sim \,{\rm {{Exp}(\beta )}}$ is:

$H^{2}(P,Q)=1-{\frac {2{\sqrt {\alpha \beta }}}{\alpha +\beta }}.$ The squared Hellinger distance between two Weibull distributions $P\,\sim \,{\rm {{W}(k,\alpha )}}$ and $Q\,\sim \,{\rm {{W}(k,\beta )}}$ (where $k$ is a common shape parameter and $\alpha \,,\beta$ are the scale parameters respectively):

$H^{2}(P,Q)=1-{\frac {2(\alpha \beta )^{k/2}}{\alpha ^{k}+\beta ^{k}}}.$ The squared Hellinger distance between two Poisson distributions with rate parameters $\alpha$ and $\beta$ , so that $P\,\sim \,{\rm {{Poisson}(\alpha )}}$ and $Q\,\sim \,{\rm {{Poisson}(\beta )}}$ , is:

$H^{2}(P,Q)=1-e^{-{\frac {1}{2}}({\sqrt {\alpha }}-{\sqrt {\beta }})^{2}}.$ The squared Hellinger distance between two Beta distributions $P\,\sim \,{\text{Beta}}(a_{1},b_{1})$ and $Q\,\sim \,{\text{Beta}}(a_{2},b_{2})$ is:

$H^{2}(P,Q)=1-{\frac {B\left({\frac {a_{1}+a_{2}}{2}},{\frac {b_{1}+b_{2}}{2}}\right)}{\sqrt {B(a_{1},b_{1})B(a_{2},b_{2})}}}$ where $B$ is the Beta function.