= Hellinger distance =

In probability and statistics, the Hellinger distance (closely related to, although different from, the Bhattacharyya distance) is used to quantify the similarity between two probability distributions. It is a type of f-divergence. The Hellinger distance is defined in terms of the Hellinger integral, which was introduced by Ernst Hellinger in 1909.

It is sometimes called the Jeffreys distance.

==Definition==

===Measure theory===
To define the Hellinger distance in terms of measure theory, let $P$ and $Q$ denote two probability measures on a measure space $\mathcal{X}$ that are absolutely continuous with respect to an auxiliary measure $\lambda$. Such a measure always exists, e.g $\lambda = (P + Q)$. The square of the Hellinger distance between $P$ and $Q$ is defined as the quantity

$H^2(P,Q) = \frac{1}{2}\displaystyle \int_{\mathcal{X}} \left(\sqrt{p(x)} - \sqrt{q(x)}\right)^2 \lambda(dx).$

Here, $P(dx) = p(x)\lambda(dx)$ and $Q(dx) = q(x) \lambda(dx)$, i.e. $p$ and $q$ are the Radon–Nikodym derivatives of P and Q respectively with respect to $\lambda$. This definition does not depend on $\lambda$, i.e. the Hellinger distance between P and Q does not change if $\lambda$ is replaced with a different probability measure with respect to which both P and Q are absolutely continuous. For compactness, the above formula is often written as

$H^2(P,Q) = \frac{1}{2}\int_{\mathcal{X}} \left(\sqrt{P(dx)} - \sqrt{Q(dx)}\right)^2.$

===Probability theory using Lebesgue measure===
To define the Hellinger distance in terms of elementary probability theory, we take λ to be the Lebesgue measure, so that dP / dλ and dQ / dλ are simply probability density functions. If we denote the densities as f and g, respectively, the squared Hellinger distance can be expressed as a standard calculus integral

$H^2(f,g) =\frac{1}{2}\int \left(\sqrt{f(x)} - \sqrt{g(x)}\right)^2 \, dx = 1 - \int \sqrt{f(x) g(x)} \, dx,$

where the second form can be obtained by expanding the square and using the fact that the integral of a probability density over its domain equals 1.

The Hellinger distance H(P, Q) satisfies the property (derivable from the Cauchy–Schwarz inequality)

 $0\le H(P,Q) \le 1.$

===Discrete distributions===
For two discrete probability distributions $P=(p_1, \ldots, p_k)$ and $Q=(q_1, \ldots, q_k)$,
their Hellinger distance is defined as

 $H(P, Q) = \frac{1}{\sqrt{2}} \; \sqrt{\sum_{i=1}^k (\sqrt{p_i} - \sqrt{q_i})^2},$

which is directly related to the Euclidean norm of the difference of the square root vectors, i.e.
 $H(P, Q) = \frac{1}{\sqrt{2}} \; \bigl\|\sqrt{P} - \sqrt{Q} \bigr\|_2 .$

Also, $1 - H^2(P,Q) = \sum_{i=1}^k \sqrt{p_i q_i}.$

==Properties==
The Hellinger distance forms a bounded metric on the space of probability distributions over a given probability space.

The maximum distance 1 is achieved when P assigns probability zero to every set to which Q assigns a positive probability, and vice versa.

Sometimes the factor $1/\sqrt{2}$ in front of the integral is omitted, in which case the Hellinger distance ranges from zero to the square root of two.

The Hellinger distance is related to the Bhattacharyya coefficient $BC(P,Q)$ as it can be defined as

 $H(P,Q) = \sqrt{1 - BC(P,Q)}.$

Hellinger distances are used in the theory of sequential and asymptotic statistics.

The squared Hellinger distance between two normal distributions $P \sim \mathcal{N}(\mu_1,\sigma_1^2)$ and $Q \sim \mathcal{N}(\mu_2,\sigma_2^2)$ is:
 $H^2(P, Q) = 1 - \sqrt{\frac{2\sigma_1\sigma_2}{\sigma_1^2+\sigma_2^2}} \, e^{-\frac{1}{4}\frac{(\mu_1-\mu_2)^2}{\sigma_1^2+\sigma_2^2}}.$

The squared Hellinger distance between two multivariate normal distributions $P \sim \mathcal{N}(\mu_1,\Sigma_1)$ and $Q \sim \mathcal{N}(\mu_2,\Sigma_2)$ is
 $H^2(P, Q) = 1 - \frac{ \det (\Sigma_1)^{1/4} \det (\Sigma_2) ^{1/4}} { \det \left( \frac{\Sigma_1 + \Sigma_2}{2}\right)^{1/2} }
              \exp\left\{-\frac{1}{8}(\mu_1 - \mu_2)^T
              \left(\frac{\Sigma_1 + \Sigma_2}{2}\right)^{-1}
              (\mu_1 - \mu_2)
              \right\}$

The squared Hellinger distance between two exponential distributions $P \sim \mathrm{Exp}(\alpha)$ and $Q \sim \mathrm{Exp}(\beta)$ is:
 $H^2(P, Q) = 1 - \frac{2 \sqrt{\alpha \beta}}{\alpha + \beta}.$

The squared Hellinger distance between two Weibull distributions $P \sim \mathrm{W}(k,\alpha)$ and $Q \sim \mathrm{W}(k,\beta)$ (where $k$ is a common shape parameter and $\alpha\, , \beta$ are the scale parameters respectively):
 $H^2(P, Q) = 1 - \frac{2 (\alpha \beta)^{k/2}}{\alpha^k + \beta^k}.$

The squared Hellinger distance between two Poisson distributions with rate parameters $\alpha$ and $\beta$, so that $P \sim \mathrm{Poisson}(\alpha)$ and $Q \sim \mathrm{Poisson}(\beta)$, is:
 $H^2(P,Q) = 1-e^{-\frac{1}{2} (\sqrt{\alpha} - \sqrt{\beta})^2}.$

The squared Hellinger distance between two beta distributions $P \sim \text{Beta}(a_1,b_1)$ and $Q \sim \text{Beta}(a_2, b_2)$ is:
 $H^2(P,Q) = 1 - \frac{B\left(\frac{a_1 + a_2}{2}, \frac{b_1 + b_2}{2}\right)}{\sqrt{B(a_1, b_1) B(a_2, b_2)}}$
where $B$ is the beta function.

The squared Hellinger distance between two gamma distributions $P \sim \text{Gamma}(a_1,b_1)$ and $Q \sim \text{Gamma}(a_2, b_2)$ is:
 $H^2(P,Q) = 1 - \Gamma\left({\scriptstyle\frac{a_1 + a_2}{2}}\right)\left(\frac{b_1+b_2}{2}\right)^{-(a_1+a_2)/2}\sqrt{\frac{b_1^{a_1}b_2^{a_2}}{\Gamma(a_1)\Gamma(a_2)}}$
where $\Gamma$ is the gamma function.

== Connection with total variation distance ==

The Hellinger distance $H(P,Q)$ and the total variation distance (or statistical distance) $\delta(P,Q)$ are related as follows:

 $H^2(P,Q) \leq \delta(P,Q) \leq \sqrt{2}H(P,Q)\,.$

The constants in this inequality may change depending on which renormalization you choose ($1/2$ or $1/\sqrt{2}$).

These inequalities follow immediately from the inequalities between the 1-norm and the 2-norm.

==See also==
- Statistical distance
- Kullback–Leibler divergence
- Bhattacharyya distance
- Total variation distance
- Fisher information metric
