# Total variation distance of probability measures

In probability theory, the total variation distance is a distance measure for probability distributions. It is an example of a statistical distance metric, and is sometimes called the statistical distance, statistical difference or variational distance.

## Definition

Consider two probability spaces ${\displaystyle (\Omega ,{\mathcal {F}},P)}$ and ${\displaystyle (\Omega ,{\mathcal {F}},Q)}$ sharing the same sample space ${\displaystyle \Omega }$ and event space ${\displaystyle {\mathcal {F}}}$. The total variation distance between the two probability measures ${\displaystyle P}$ and ${\displaystyle Q}$ is defined via:[1]

${\displaystyle \delta (P,Q)=\sup _{A\in {\mathcal {F}}}\left|P(A)-Q(A)\right|.}$

Informally, this is the largest possible difference between the probabilities that the two probability distributions can assign to the same event.

## Properties

### Relation to other distances

The total variation distance is related to the Kullback–Leibler divergence by Pinsker's inequality:

${\displaystyle \delta (P,Q)\leq {\sqrt {{\frac {1}{2}}D_{\mathrm {KL} }(P\parallel Q)}}.}$

One also has the following inequality, due to Bretagnolle and Huber[2] (see, also, Tsybakov[3]), which has the advantage of providing a non-vacuous bound even when ${\displaystyle D_{\mathrm {KL} }(P\parallel Q)>2}$:

${\displaystyle \delta (P,Q)\leq {\sqrt {1-e^{-D_{\mathrm {KL} }(P\parallel Q)}}}.}$

When the set is countable, the total variation distance is related to the L1 norm by the identity:[4]

${\displaystyle \delta (P,Q)={\frac {1}{2}}\|P-Q\|_{1}={\frac {1}{2}}\sum _{\omega \in \Omega }|P(\omega )-Q(\omega )|.}$

The total variation distance is related to the Hellinger distance ${\displaystyle H(P,Q)}$ as follows:[5]

${\displaystyle H^{2}(P,Q)\leq \delta (P,Q)\leq {\sqrt {2}}H(P,Q)\,.}$

These inequalities follow immediately from the inequalities between the 1-norm and the 2-norm.

### Connection to transportation theory

The total variation distance (or half the norm) arises as the optimal transportation cost, when the cost function is ${\displaystyle c(x,y)={\mathbf {1} }_{x\neq y}}$, that is,

${\displaystyle {\frac {1}{2}}\|P-Q\|_{1}=\delta (P,Q)=\inf\{\mathbb {P} (X\neq Y):{\text{Law}}(X)=P,{\text{Law}}(Y)=Q\}=\inf _{\pi }\operatorname {E} _{\pi }[{\mathbf {1} }_{x\neq y}],}$

where the expectation is taken with respect to the probability measure ${\displaystyle \pi }$ on the space where ${\displaystyle (x,y)}$ lives, and the infimum is taken over all such ${\displaystyle \pi }$ with marginals ${\displaystyle P}$ and ${\displaystyle Q}$, respectively.[6]