Sanov's theorem: Difference between revisions
Appearance
Content deleted Content added
definition |
No edit summary |
||
Line 6: | Line 6: | ||
where |
where |
||
* <math>q^n(x^n)</math> is shorthand for <math>q(x_1)q(x_2)...q(x_n)</math> and <math>q^n(S)</math> is shorthand for <math>\sum_{x^n\in S}q^n(x^n)</math>, |
|||
* <math>p^*</math> is the [[information projection]] of ''q'' onto ''A'', |
|||
* <math> |
* <math>\hat{p}_{x^n}</math> is the [[Empirical distribution function|empirical distribution]] of the sample <math>x^n</math>, and |
||
* <math> |
* <math>p^*</math> is the [[information projection]] of ''q'' onto ''A''. |
||
Furthermore, if ''A'' is a [[closed set]], |
Furthermore, if ''A'' is a [[closed set]], |
Revision as of 02:59, 25 April 2011
In information theory, Sanov's theorem gives a bound on the probability of observing an atypical sequence of samples from a given probability distribution.
Let A be a set of probability distributions over an alphabet X, and let q be an arbitrary distribution over X (where q may or may not be in A). Suppose we draw n i.i.d. samples from q, represented by the vector . Then
- ,
where
- is shorthand for and is shorthand for ,
- is the empirical distribution of the sample , and
- is the information projection of q onto A.
Furthermore, if A is a closed set,
- .