# Pointwise mutual information

(Redirected from Pointwise Mutual Information)

Pointwise mutual information (PMI),[1] or point mutual information, is a measure of association used in information theory and statistics. In contrast to mutual information (MI) which builds upen PMI, it refers to single events, whereas MI refers to the average of all possible events.

## Definition

The PMI of a pair of outcomes x and y belonging to discrete random variables X and Y quantifies the discrepancy between the probability of their coincidence given their joint distribution and their individual distributions, assuming independence. Mathematically:

$\operatorname{pmi}(x;y) \equiv \log\frac{p(x,y)}{p(x)p(y)} = \log\frac{p(x|y)}{p(x)} = \log\frac{p(y|x)}{p(y)}.$

The mutual information (MI) of the random variables X and Y is the expected value of the PMI over all possible outcomes (with respect to the joint distribution $p(x,y)$).

The measure is symmetric ($\operatorname{pmi}(x;y)=\operatorname{pmi}(y;x)$). It can take positive or negative values, but is zero if X and Y are independent. Note that even though PMI may be negative or positive, its expected outcome over all joint events (MI) is positive. PMI maximizes when X and Y are perfectly associated (i.e. $p(x|y)$ or $p(y|x)=1$), yielding the following bounds:

$-\infty \leq \operatorname{pmi}(x;y) \leq \min\left[ -\log p(x), -\log p(y) \right] .$

Finally, $\operatorname{pmi}(x;y)$ will increase if $p(x|y)$ is fixed but $p(x)$decreases.

Here is an example to illustrate:

x y p(xy)
0 0 0.1
0 1 0.7
1 0 0.15
1 1 0.05

Using this table we can marginalize to get the following additional table for the individual distributions:

p(x) p(y)
0 .8 0.25
1 .2 0.75

With this example, we can compute four values for $pmi(x;y)$. Using base-2 logarithms:

 pmi(x=0;y=0) −1 pmi(x=0;y=1) 0.222392421 pmi(x=1;y=0) 1.584962501 pmi(x=1;y=1) −1.584962501

(For reference, the mutual information $\operatorname{I}(X;Y)$ would then be 0.214170945)

## Similarities to mutual information

Pointwise Mutual Information has many of the same relationships as the mutual information. In particular,

\begin{align} \operatorname{pmi}(x;y) &=& h(x) + h(y) - h(x,y) \\ &=& h(x) - h(x|y) \\ &=& h(y) - h(y|x) \end{align}

Where $h(x)$ is the self-information, or $-\log_2 p(X=x)$.

## Normalized pointwise mutual information (npmi)

Pointwise mutual information can be normalized between [-1,+1] resulting in -1 (in the limit) for never occurring together, 0 for independence, and +1 for complete co-occurrence.[2]

$\operatorname{npmi}(x;y) = \frac{\operatorname{pmi}(x;y)}{-\log \left[ p(x, y) \right] }$

## Chain-rule for pmi

Like MI,[3] PMI follows the chain rule, that is,

$\operatorname{pmi}(x;yz) = \operatorname{pmi}(x;y) + \operatorname{pmi}(x;z|y)$

This is easily proven by:

\begin{align} \operatorname{pmi}(x;y) + \operatorname{pmi}(x;z|y) & {} = \log\frac{p(x,y)}{p(x)p(y)} + \log\frac{p(x,z|y)}{p(x|y)p(z|y)} \\ & {} = \log \left[ \frac{p(x,y)}{p(x)p(y)} \frac{p(x,z|y)}{p(x|y)p(z|y)} \right] \\ & {} = \log \frac{p(x|y)p(y)p(x,z|y)}{p(x)p(y)p(x|y)p(z|y)} \\ & {} = \log \frac{p(x,yz)}{p(x)p(yz)} \\ & {} = \operatorname{pmi}(x;yz) \end{align}

## References

1. ^ Kenneth Ward Church and Patrick Hanks (March 1990). "Word association norms, mutual information, and lexicography". Comput. Linguist. 16 (1): 22–29.
2. ^ Bouma, Gerlof (2009). "Normalized (Pointwise) Mutual Information in Collocation Extraction" (PDF). Proceedings of the Biennial GSCL Conference.
3. ^ Paul L. Williams. INFORMATION DYNAMICS: ITS THEORY AND APPLICATION TO EMBODIED COGNITIVE SYSTEMS (PDF).
• Fano, R M (1961). "chapter 2". Transmission of Information: A Statistical Theory of Communications. MIT Press, Cambridge, MA. ISBN 978-0262561693.