Sanov's theorem: Difference between revisions

Content deleted Content added

Inline

Revision as of 02:59, 25 April 2011

In information theory, Sanov's theorem gives a bound on the probability of observing an atypical sequence of samples from a given probability distribution.

Let A be a set of probability distributions over an alphabet X, and let q be an arbitrary distribution over X (where q may or may not be in A). Suppose we draw n i.i.d. samples from q, represented by the vector $x^{n}=x_{1},x_{2},...,x_{n}$ . Then

q^{n}(\{x^{n}:{\hat {p}}_{x^{n}}\in A\})\leq (n+1)^{|X|}2^{-nD_{\mathrm {KL} }(p^{*}||q)}

,

where

$q^{n}(x^{n})$ is shorthand for $q(x_{1})q(x_{2})...q(x_{n})$ and $q^{n}(S)$ is shorthand for $\sum _{x^{n}\in S}q^{n}(x^{n})$ ,
${\hat {p}}_{x^{n}}$ is the empirical distribution of the sample $x^{n}$ , and
$p^{*}$ is the information projection of q onto A.

Furthermore, if A is a closed set,

\lim _{n\to \infty }{\frac {1}{n}}\log q^{n}(\{x^{n}:{\hat {p}}_{x^{n}}\in A\})=-D_{\mathrm {KL} }(p^{*}||q)

.

@@ Line 6: / Line 6: @@
 where
+* <math>q^n(x^n)</math> is shorthand for <math>q(x_1)q(x_2)...q(x_n)</math> and <math>q^n(S)</math> is shorthand for <math>\sum_{x^n\in S}q^n(x^n)</math>,
-* <math>p^*</math> is the [[information projection]] of ''q'' onto ''A'',
-* <math>q^n(x^n)</math> is shorthand for <math>q(x_1)q(x_2)...q(x_n)</math>, and
+* <math>\hat{p}_{x^n}</math> is the [[Empirical distribution function|empirical distribution]] of the sample <math>x^n</math>, and
-* <math>\hat{p}_{x^n}</math> is the [[Empirical distribution function|empirical distribution]] of the sample <math>x^n</math>.
+* <math>p^*</math> is the [[information projection]] of ''q'' onto ''A''.
 Furthermore, if ''A'' is a [[closed set]],