# Pointwise mutual information

Pointwise mutual information (PMI),[1] or point mutual information, is a measure of association used in information theory and statistics.

## Definition

The PMI of a pair of outcomes x and y belonging to discrete random variables X and Y quantifies the discrepancy between the probability of their coincidence given their joint distribution and their individual distributions, assuming independence. Mathematically:

$\operatorname{pmi}(x;y) \equiv \log\frac{p(x,y)}{p(x)p(y)} = \log\frac{p(x|y)}{p(x)} = \log\frac{p(y|x)}{p(y)}.$

The mutual information (MI) of the random variables X and Y is the expected value of the PMI over all possible outcomes (with respect to the joint distribution $p(x,y)$).

The measure is symmetric ($\operatorname{pmi}(x;y)=\operatorname{pmi}(y;x)$). It can take positive or negative values, but is zero if X and Y are independent. Note that even though PMI may be negative or positive, its expected outcome over all joint events (MI) is positive. PMI maximizes when X and Y are perfectly associated (i.e. $p(x|y)$ or $p(y|x)=1$), yielding the following bounds:

$-\infty \leq \operatorname{pmi}(x;y) \leq \min\left[ -\log p(x), -\log p(y) \right] .$

Finally, $\operatorname{pmi}(x;y)$ will increase if $p(x|y)$ is fixed but $p(x)$decreases.

Here is an example to illustrate:

x y p(xy)
0 0 0.1
0 1 0.7
1 0 0.15
1 1 0.05

Using this table we can marginalize to get the following additional table for the individual distributions:

p(x) p(y)
0 .8 0.25
1 .2 0.75

With this example, we can compute four values for $pmi(x;y)$. Using base-2 logarithms:

 pmi(x=0;y=0) −1 pmi(x=0;y=1) 0.222392421 pmi(x=1;y=0) 1.584962501 pmi(x=1;y=1) −1.584962501

(For reference, the mutual information $\operatorname{I}(X;Y)$ would then be 0.214170945)

## Similarities to mutual information

Pointwise Mutual Information has many of the same relationships as the mutual information. In particular,

\begin{align} \operatorname{pmi}(x;y) &=& h(x) + h(y) - h(x,y) \\ &=& h(x) - h(x|y) \\ &=& h(y) - h(y|x) \end{align}

Where $h(x)$ is the self-information, or $-\log_2 p(X=x)$.

## Normalized pointwise mutual information (npmi)

Pointwise mutual information can be normalized between [-1,+1] resulting in -1 (in the limit) for never occurring together, 0 for independence, and +1 for complete co-occurrence.

$\operatorname{npmi}(x;y) = \frac{\operatorname{pmi}(x;y)}{-\log \left[ p(x, y) \right] }$

## Chain-rule for pmi

Pointwise mutual information follows the chain rule, that is,

$\operatorname{pmi}(x;yz) = \operatorname{pmi}(x;y) + \operatorname{pmi}(x;z|y)$

This is easily proven by:

\begin{align} \operatorname{pmi}(x;y) + \operatorname{pmi}(x;z|y) & {} = \log\frac{p(x,y)}{p(x)p(y)} + \log\frac{p(x,z|y)}{p(x|y)p(z|y)} \\ & {} = \log \left[ \frac{p(x,y)}{p(x)p(y)} \frac{p(x,z|y)}{p(x|y)p(z|y)} \right] \\ & {} = \log \frac{p(x|y)p(y)p(x,z|y)}{p(x)p(y)p(x|y)p(z|y)} \\ & {} = \log \frac{p(x,yz)}{p(x)p(yz)} \\ & {} = \operatorname{pmi}(x;yz) \end{align}

## References

1. ^ Kenneth Ward Church and Patrick Hanks (March 1990). "Word association norms, mutual information, and lexicography". Comput. Linguist. 16 (1): 22–29.