Jump to content

Sanov's theorem

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by InternetArchiveBot (talk | contribs) at 04:49, 12 June 2020 (Bluelink 1 book for verifiability (prndis)) #IABot (v2.0.1) (GreenC bot). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In information theory, Sanov's theorem gives a bound on the probability of observing an atypical sequence of samples from a given probability distribution. In the language of large deviations theory, Sanov's theorem identifies the rate function for large deviations of the empirical measure of a sequence of i.i.d. random variables.

Let A be a set of probability distributions over an alphabet X, and let q be an arbitrary distribution over X (where q may or may not be in A). Suppose we draw n i.i.d. samples from q, represented by the vector . Further, let us ask that the empirical measure, , of the samples falls within the set A—formally, we write . Then,

,

where

  • is shorthand for , and
  • is the information projection of q onto A.

In words, the probability of drawing an atypical distribution is a function of the KL divergence from the true distribution to the atypical one; in the case that we consider a set of possible atypical distributions, there is a dominant atypical distribution, given by the information projection.

Furthermore, if A is the closure of its interior,

References

  • Cover, Thomas M.; Thomas, Joy A. (2006). Elements of Information Theory (2 ed.). Hoboken, New Jersey: Wiley Interscience. pp. 362.
  • Sanov, I. N. (1957) "On the probability of large deviations of random variables". Mat. Sbornik 42(84), No. 1, 11–44.
  • Санов, И. Н. (1957) "О вероятности больших отклонений случайных величин". МАТЕМАТИЧЕСКИЙ СБОРНИК' 42(84), No. 1, 11–44.