Pointwise mutual information
|
|
This article provides insufficient context for those unfamiliar with the subject. Please help improve the article with a good introductory style. (February 2012) |
Pointwise mutual information (PMI), or point mutual information, is a measure of association used in information theory and statistics.
Contents |
[edit] Definition
The PMI of a pair of outcomes x and y belonging to discrete random variables X and Y quantifies the discrepancy between the probability of their coincidence given their joint distribution and the probability of their coincidence given only their individual distributions, assuming independence. Mathematically:
The mutual information (MI) of the random variables X and Y is the expected value of the PMI over all possible outcomes.
The measure is symmetric (pmi(x;y) = pmi(y;x)). It can take positive or negative values, but is zero if X and Y are independent. PMI maximizes when X and Y are perfectly associated, yielding the following bounds:
Finally,
will increase if p(x | y) is fixed but p(x)decreases.
Here is an example to illustrate:
| x | y | p(x, y) |
|---|---|---|
| 0 | 0 | 0.1 |
| 0 | 1 | 0.7 |
| 1 | 0 | 0.15 |
| 1 | 1 | 0.05 |
Using this table we can marginalize to get the following additional table for the individual distributions:
| p(x) | p(y) | |
|---|---|---|
| 0 | .8 | 0.25 |
| 1 | .2 | 0.75 |
With this example, we can compute four values for pmi(x;y). Using base-2 logarithms:
| pmi(x=0;y=0) | −1 |
| pmi(x=0;y=1) | 0.222392421 |
| pmi(x=1;y=0) | 1.584962501 |
| pmi(x=1;y=1) | −1.584962501 |
(For reference, the mutual information
would then be 0.214170945)
[edit] Similarities to mutual information
Pointwise Mutual Information has many of the same relationships as the mutual information. In particular,

Where h(x) is the self-information, or − log 2p(X = x).
[edit] Normalized pointwise mutual information (npmi)
Pointwise mutual information can be normalized between [-1,+1] resulting in -1 (in the limit) for never occurring together, 0 for independence, and +1 for complete co-occurrence.
![\operatorname{npmi}(x;y) = \frac{\operatorname{pmi}(x;y)}{-\log \left[ \max ( p(x), p(y) ) \right] }](http://upload.wikimedia.org/wikipedia/en/math/c/b/e/cbe033c97521857cc64f2d802a6f46bb.png)
[edit] Chain-rule for pmi
Pointwise mutual information follows the chain-rule, that is,
This is easily proven by:
|
|
This article includes a list of references, related reading or external links, but its sources remain unclear because it lacks inline citations. Please improve this article by introducing more precise citations. (February 2012) |
[edit] References
- Normalized (Pointwise) Mutual Information in Collocation Extraction http://www.ling.uni-potsdam.de/~gerlof/docs/npmi-pfd.pdf
- Fano, R M (1961), Transmission of Information: A Statistical Theory of Communications, MIT Press, Cambridge, MA (Chapter 2).
[edit] External links
- Demo at Rensselaer MSR Server (PMI values normalized to be between 0 and 1)

![-\infty \leq \operatorname{pmi}(x;y) \leq \min\left[ -\log p(x), -\log p(y) \right] .](http://upload.wikimedia.org/wikipedia/en/math/4/6/4/464207317c0a7e9c6f927eb4966431b5.png)

![\begin{align}
\operatorname{pmi}(x;y) + \operatorname{pmi}(x;y|z) & {} = \log\frac{p(x,y)}{p(x)p(y)} + \log\frac{p(x,z|y)}{p(x|y)p(z|y)} \\
& {} = \log \left[ \frac{p(x,y)}{p(x)p(y)} \frac{p(x,z|y)}{p(x|y)p(z|y)} \right] \\
& {} = \log \frac{p(x|y)p(y)p(x,z|y)}{p(x)p(y)p(x|y)p(z|y)} \\
& {} = \log \frac{p(x,yz)}{p(x)p(yz)} \\
& {} = \operatorname{pmi}(x;yz)
\end{align}](http://upload.wikimedia.org/wikipedia/en/math/b/2/3/b2303c1845905c004c18dbf21ac1ef1f.png)