# Sanov's theorem

In mathematics and information theory, Sanov's theorem gives a bound on the probability of observing an atypical sequence of samples from a given probability distribution. In the language of large deviations theory, Sanov's theorem identifies the rate function for large deviations of the empirical measure of a sequence of i.i.d. random variables.

Let A be a set of probability distributions over an alphabet X, and let q be an arbitrary distribution over X (where q may or may not be in A). Suppose we draw n i.i.d. samples from q, represented by the vector ${\displaystyle x^{n}=x_{1},x_{2},\ldots ,x_{n}}$. Then, we have the following bound on the probability that the empirical measure ${\displaystyle {\hat {p}}_{x^{n}}}$ of the samples falls within the set A:

${\displaystyle q^{n}({\hat {p}}_{x^{n}}\in A)\leq (n+1)^{|X|}2^{-nD_{\mathrm {KL} }(p^{*}||q)}}$,

where

• ${\displaystyle q^{n}}$ is the joint probability distribution on ${\displaystyle X^{n}}$, and
• ${\displaystyle p^{*}}$ is the information projection of q onto A.

In words, the probability of drawing an atypical distribution is bounded by a function of the KL divergence from the true distribution to the atypical one; in the case that we consider a set of possible atypical distributions, there is a dominant atypical distribution, given by the information projection.

Furthermore, if A is a closed set, then

${\displaystyle \lim _{n\to \infty }{\frac {1}{n}}\log q^{n}({\hat {p}}_{x^{n}}\in A)=-D_{\mathrm {KL} }(p^{*}||q).}$

## Technical statement

Define:

• ${\textstyle \Sigma }$ is a finite set with size ${\textstyle \geq 2}$. Understood as “alphabet”.
• ${\textstyle \Delta (\Sigma )}$ is the simplex spanned by the alphabet. It is a subset of ${\textstyle \mathbb {R} ^{\Sigma }}$.
• ${\textstyle L_{n}}$ is a random variable taking values in ${\textstyle \Delta (\Sigma )}$. Take ${\textstyle n}$ samples from the distribution ${\textstyle \mu }$, then ${\textstyle L_{n}}$ is the frequency probability vector for the sample.
• ${\textstyle {\mathcal {L}}_{n}}$ is the space of values that ${\textstyle L_{n}}$ can take. In other words, it is

${\displaystyle \{(a_{1}/n,\dots ,a_{|\Sigma |}/n):\sum _{i}a_{i}=n,a_{i}\in \mathbb {N} \}}$Then, Sanov's theorem states:[1]

• For every measurable subset ${\textstyle S\in \Delta (\Sigma )}$,${\displaystyle -\inf _{\nu \in int(S)}D(\nu \|\mu )\leq \liminf _{n}{\frac {1}{n}}\ln P_{\mu }(L_{n}\in S)\leq \limsup _{n}{\frac {1}{n}}\ln P_{\mu }(L_{n}\in S)\leq -\inf _{\nu \in cl(S)}D(\nu \|\mu )}$
• For every open subset ${\textstyle U\in \Delta (\Sigma )}$,${\displaystyle -\lim _{n}\lim _{\nu \in U\cap {\mathcal {L}}_{n}}D(\nu \|\mu )=\lim _{n}{\frac {1}{n}}\ln P_{\mu }(L_{n}\in S)=-\inf _{\nu \in U}D(\nu \|\mu )}$

Here, ${\displaystyle int(S)}$ means the interior, and ${\displaystyle cl(S)}$ means the closure.

## References

1. ^ Dembo, Amir; Zeitouni, Ofer (2010). "Large Deviations Techniques and Applications". Stochastic Modelling and Applied Probability. 38: 16–17. doi:10.1007/978-3-642-03311-7. ISBN 978-3-642-03310-0. ISSN 0172-4568.
• Sanov, I. N. (1957) "On the probability of large deviations of random variables". Mat. Sbornik 42(84), No. 1, 11–44.
• Санов, И. Н. (1957) "О вероятности больших отклонений случайных величин". МАТЕМАТИЧЕСКИЙ СБОРНИК' 42(84), No. 1, 11–44.