# Variation of information

In probability theory and information theory, the variation of information or shared information distance is a measure of the distance between two clusterings (partitions of elements). It is closely related to mutual information; indeed, it is a simple linear expression involving the mutual information. Unlike the mutual information, however, the variation of information is a true metric, in that it obeys the triangle inequality. Even more, it is a universal metric, in that if any[dubious ] other distance measure two items close-by, then the variation of information will also judge them close.[1]

## Definition

Suppose we have two partitions $X$ and $Y$ of a set $A$ into disjoint subsets, namely $X = \{X_{1}, X_{2}, ..,, X_{k}\}$, $Y = \{Y_{1}, Y_{2}, ..,, Y_{l}\}$. Let $n = \Sigma_{i} |X_{i}| = \Sigma_{j} |Y_{j}|=|A|$, $p_{i} = |X_{i}| / n$ , $q_{j} = |Y_{j}| / n$, $r_{ij} = |X_i\cap Y_{j}| / n$. Then the variation of information between the two partitions is:

$VI(X; Y ) = - \sum_{i,j} r_{ij} \left[\log(r_{ij}/p_i)+\log(r_{ij}/q_j) \right]$.

This is equivalent to the shared information distance between the random variables i and j with respect to the uniform probability measure on $A$defined by $\mu(B):=|B|/n$ for $B\subseteq A$. The variation of information satisfies

$VI(X; Y ) =H(X) + H(Y) - 2I(X, Y)$.

where $H(X)$ is the entropy of $X$, and $I(X, Y)$ is mutual information between $X$ and $Y$ with respect to the uniform probability measure on $A$.

## References

1. ^ Alexander Kraskov, Harald Stögbauer, Ralph G. Andrzejak, and Peter Grassberger, "Hierarchical Clustering Based on Mutual Information", (2003) ArXiv q-bio/0311039