From Wikipedia, the free encyclopedia
In information theory , Sanov's theorem gives a bound on the probability of observing an atypical sequence of samples from a given probability distribution .
Let A be a set of probability distributions over an alphabet X , and let q be an arbitrary distribution over X (where q may or may not be in A ). Suppose we draw n i.i.d. samples from q , represented by the vector
x
n
=
x
1
,
x
2
,
…
,
x
n
{\displaystyle x^{n}=x_{1},x_{2},\ldots ,x_{n}}
. Then
q
n
(
{
x
n
:
p
^
x
n
∈
A
}
)
≤
(
n
+
1
)
|
X
|
2
−
n
D
K
L
(
p
∗
|
|
q
)
{\displaystyle q^{n}(\{x^{n}:{\hat {p}}_{x^{n}}\in A\})\leq (n+1)^{|X|}2^{-nD_{\mathrm {KL} }(p^{*}||q)}}
,
where
q
n
(
x
n
)
{\displaystyle q^{n}(x^{n})}
is shorthand for
q
(
x
1
)
q
(
x
2
)
⋯
q
(
x
n
)
{\displaystyle q(x_{1})q(x_{2})\cdots q(x_{n})}
and
q
n
(
S
)
{\displaystyle q^{n}(S)}
is shorthand for
∑
x
n
∈
S
q
n
(
x
n
)
{\displaystyle \sum _{x^{n}\in S}q^{n}(x^{n})}
,
p
^
x
n
{\displaystyle {\hat {p}}_{x^{n}}}
is the empirical distribution of the sample
x
n
{\displaystyle x^{n}}
, and
p
∗
{\displaystyle p^{*}}
is the information projection of q onto A .
Furthermore, if A is a closed set ,
lim
n
→
∞
1
n
log
q
n
(
{
x
n
:
p
^
x
n
∈
A
}
)
=
−
D
K
L
(
p
∗
|
|
q
)
.
{\displaystyle \lim _{n\to \infty }{\frac {1}{n}}\log q^{n}(\{x^{n}:{\hat {p}}_{x^{n}}\in A\})=-D_{\mathrm {KL} }(p^{*}||q).}
References
Cover, Thomas M.; Thomas, Joy A. (2006). Elements of Information Theory (2 ed.). Hoboken, New Jersey: Wiley Interscience. p. 362.