# Variation of information

In probability theory and information theory, the variation of information or shared information distance is a measure of the distance between two clusterings (partitions of elements). It is closely related to mutual information; indeed, it is a simple linear expression involving the mutual information. Unlike the mutual information, however, the variation of information is a true metric, in that it obeys the triangle inequality.

## Definition

Suppose we have two partitions $X$ and $Y$ of a set $A$ into disjoint subsets, namely $X=\{X_{1},X_{2},..,,X_{k}\}$ and $Y=\{Y_{1},Y_{2},..,,Y_{l}\}$ . Let $n=\sum _{i}|X_{i}|=\sum _{j}|Y_{j}|=|A|$ , $p_{i}=|X_{i}|/n$ , $q_{j}=|Y_{j}|/n$ , $r_{ij}=|X_{i}\cap Y_{j}|/n$ . Then the variation of information between the two partitions is:

$\mathrm {VI} (X;Y)=-\sum _{i,j}r_{ij}\left[\log(r_{ij}/p_{i})+\log(r_{ij}/q_{j})\right]$ .

This is equivalent to the shared information distance between the random variables i and j with respect to the uniform probability measure on $A$ defined by $\mu (B):=|B|/n$ for $B\subseteq A$ .

## Identities

The variation of information satisfies

$\mathrm {VI} (X;Y)=H(X)+H(Y)-2I(X,Y)$ ,

where $H(X)$ is the entropy of $X$ , and $I(X,Y)$ is mutual information between $X$ and $Y$ with respect to the uniform probability measure on $A$ . This can be rewritten as

$\mathrm {VI} (X;Y)=H(X,Y)-I(X,Y)$ ,

where $H(X,Y)$ is the joint entropy of $X$ and $Y$ , or

$\mathrm {VI} (X;Y)=H(X|Y)+H(Y|X)$ ,

where $H(X|Y)$ and $H(Y|X)$ are the respective conditional entropies.

The variation of information can also be bounded, either in terms of the number of elements:

$\mathrm {VI} (X;Y)\leq \log(n)$ ,

Or with respect to a maximum number of clusters, $K^{*}$ :

$\mathrm {VI} (X;Y)\leq 2\log(K^{*})$ 