# Error exponent

In information theory, the error exponent of a channel code or source code over the block length of the code is the logarithm of the error probability. For example, if the probability of error of a decoder drops as enα, where n is the block length, the error exponent is α. Many of the information-theoretic theorems are of asymptotic nature, for example, the channel coding theorem states that for any rate less than the channel capacity, the probability of the error of the channel code can be made to go to zero as the block length goes to infinity. In practical situations, there are limitations to the delay of the communication and the block length must be finite. Therefore, it is important to study how the probability of error drops as the block length go to infinity.

## Error exponent in channel coding

### For time-invariant DMC's

The channel coding theorem states that for any ε > 0 and for any rate less than the channel capacity, there is an encoding and decoding scheme that can be used to ensure that the probability of block error is less than ε > 0 for sufficiently long message block X. Also, for any rate greater than the channel capacity, the probability of block error at the receiver goes to one as the block length goes to infinity.

Assuming a channel coding setup as follows: the channel can transmit any of ${\displaystyle M=2^{nR}\;}$ messages, by transmitting the corresponding codeword (which is of length n). Each component in the codebook is drawn i.i.d. according to some probability distribution with probability mass function Q. At the decoding end, ML decoding is done.

Given that ${\displaystyle y_{1}^{n}}$ is received, X(1) or first message is transmitted, the probability that X(1) is incorrectly detected as X(2) is:

${\displaystyle P_{\mathrm {error} \ 1\to 2}=\sum _{x_{1}^{n}(2)}Q(x_{1}^{n}(2))1(p(y_{1}^{n}|x_{1}^{n}(2))>p(y_{1}^{n}|x_{1}^{n}(1)))}$

The function ${\displaystyle 1(p(y_{1}^{n}|x_{1}^{n}(2))>p(y_{1}^{n}|x_{1}^{n}(1))}$ has upper bound

${\displaystyle \left({\frac {p(y_{1}^{n}|x_{1}^{n}(2))}{p(y_{1}^{n}|x_{1}^{n}(1))}}\right)^{s}}$

for ${\displaystyle s>0\;}$ Thus,

${\displaystyle P_{\mathrm {error} \ 1\to 2}\leq \sum _{x_{1}^{n}(2)}Q(x_{1}^{n}(2))\left({\frac {p(y_{1}^{n}|x_{2}^{n}(2))}{p(y_{1}^{n}|x_{1}^{n}(1))}}\right)^{s}.}$

Since there are a total of M messages, the Probability that X(1) is confused with any other message is M times the above expression. Since each entry in the codebook is i.i.d., the notation of X(2) can be replaced simply by X. Using the Hokey union bound, the probability of confusing X(1) with any message is bounded by:

${\displaystyle P_{\mathrm {error} \ 1\to \mathrm {any} }\leq M^{\rho }\sum _{x_{1}^{n}}Q(x_{1}^{n})\left({\frac {p(y_{1}^{n}|x_{2}^{n})}{p(y_{1}^{n}|x_{1}^{n}(1))}}\right)^{s\rho }.}$

Averaging over all combinations of ${\displaystyle X_{1}^{n}(1),y_{1}^{n}}$:

${\displaystyle P_{\mathrm {error} \ 1\to \mathrm {any} }\leq M^{\rho }\sum _{y_{1}^{n}}\left(\sum _{x_{1}^{n}(1)}Q(x_{1}^{n}(1))[p(y_{1}^{n}|x_{1}^{n}(1))]^{1-s\rho }\right)\left(\sum _{x_{1}^{n}}Q(x_{1}^{n})[p(y_{1}^{n}|x_{1}^{n})]^{s}\right)^{\rho }.}$

Choosing ${\displaystyle s=1-s\rho }$ and combining the two sums over ${\displaystyle x_{1}^{n}}$ in the above formula:

${\displaystyle P_{\mathrm {error} \ 1\to \mathrm {any} }\leq M^{\rho }\sum _{y_{1}^{n}}\left(\sum _{x_{1}^{n}}Q(x_{1}^{n})[p(y_{1}^{n}|x_{1}^{n})]^{\frac {1}{1+\rho }}\right)^{1+\rho }.}$

Using the independence nature of the elements of the codeword, and the discrete memoryless nature of the channel:

${\displaystyle P_{\mathrm {error} \ 1\to \mathrm {any} }\leq M^{\rho }\prod _{i=1}^{n}\sum _{y_{i}}\left(\sum _{x_{i}}Q_{i}(x_{i})[p_{i}(y_{i}|x_{i})]^{\frac {1}{1+\rho }}\right)^{1+\rho }}$

Using the fact that each element of codeword is identically distributed and thus stationary:

${\displaystyle P_{\mathrm {error} \ 1\to \mathrm {any} }\leq M^{\rho }\left(\sum _{y}\left(\sum _{x}Q(x)[p(y|x)]^{\frac {1}{1+\rho }}\right)^{1+\rho }\right)^{n}.}$

Replacing M by 2nR and defining

${\displaystyle E_{o}(\rho ,Q)=-\ln \left(\sum _{y}\left(\sum _{x}Q(x)[p(y|x)]^{\frac {1}{1+\rho }}\right)^{1+\rho }\right),}$

probability of error becomes

${\displaystyle P_{\mathrm {error} }\leq \exp(-n(E_{o}(\rho ,Q)-\rho R)).}$

Q and ${\displaystyle \rho }$ should be chosen so that the bound is tighest. Thus, the error exponent can be defined as

${\displaystyle E_{r}(R)=\max _{Q}\max _{\rho \varepsilon [0,1]}E_{o}(\rho ,Q)-\rho R.\;}$

## Error exponent in source coding

### For time invariant discrete memoryless sources

The source coding theorem states that for any ${\displaystyle \varepsilon >0}$ and any discrete-time i.i.d. source such as ${\displaystyle X}$ and for any rate less than the entropy of the source, there is large enough ${\displaystyle n}$ and an encoder that takes ${\displaystyle n}$ i.i.d. repetition of the source, ${\displaystyle X^{1:n}}$, and maps it to ${\displaystyle n.(H(X)+\varepsilon )}$ binary bits such that the source symbols ${\displaystyle X^{1:n}}$ are recoverable from the binary bits with probability at least ${\displaystyle 1-\varepsilon }$.

Let ${\displaystyle M=e^{nR}\,\!}$ be the total number of possible messages. Next map each of the possible source output sequences to one of the messages randomly using a uniform distribution and independently from everything else. When a source is generated the corresponding message ${\displaystyle M=m\,}$ is then transmitted to the destination. The message gets decoded to one of the possible source strings. In order to minimize the probability of error the decoder will decode to the source sequence ${\displaystyle X_{1}^{n}}$ that maximizes ${\displaystyle P(X_{1}^{n}|A_{m})}$, where ${\displaystyle A_{m}\,}$ denotes the event that message ${\displaystyle m}$ was transmitted. This rule is equivalent to finding the source sequence ${\displaystyle X_{1}^{n}}$ among the set of source sequences that map to message ${\displaystyle m}$ that maximizes ${\displaystyle P(X_{1}^{n})}$. This reduction follows from the fact that the messages were assigned randomly and independently of everything else.

Thus, as an example of when an error occurs, supposed that the source sequence ${\displaystyle X_{1}^{n}(1)}$ was mapped to message ${\displaystyle 1}$ as was the source sequence ${\displaystyle X_{1}^{n}(2)}$. If ${\displaystyle X_{1}^{n}(1)\,}$ was generated at the source, but ${\displaystyle P(X_{1}^{n}(2))>P(X_{1}^{n}(1))}$ then an error occurs.

Let ${\displaystyle S_{i}\,}$ denote the event that the source sequence ${\displaystyle X_{1}^{n}(i)}$ was generated at the source, so that ${\displaystyle P(S_{i})=P(X_{1}^{n}(i))\,.}$ Then the probability of error can be broken down as ${\displaystyle P(E)=\sum _{i}P(E|S_{i})P(S_{i})\,.}$ Thus, attention can be focused on finding an upper bound to the ${\displaystyle P(E|S_{i})\,}$.

Let ${\displaystyle A_{i'}\,}$ denote the event that the source sequence ${\displaystyle X_{1}^{n}(i')}$ was mapped to the same message as the source sequence ${\displaystyle X_{1}^{n}(i)}$ and that ${\displaystyle P(X_{1}^{n}(i'))\geq P(X_{1}^{n}(i))}$. Thus, letting ${\displaystyle X_{i,i'}\,}$ denote the event that the two source sequences ${\displaystyle i\,}$ and ${\displaystyle i'\,}$ map to the same message, we have that

${\displaystyle P(A_{i'})=P\left(X_{i,i'}\bigcap P(X_{1}^{n}(i')\right)\geq P(X_{1}^{n}(i)))\,}$

and using the fact that ${\displaystyle P(X_{i,i'})={\frac {1}{M}}\,}$ and is independent of everything else have that

${\displaystyle P(A_{i'})={\frac {1}{M}}P(P(X_{1}^{n}(i'))\geq P(X_{1}^{n}(i)))\,.}$

A simple upper bound for the term on the left can be established as

${\displaystyle \left[P(P(X_{1}^{n}(i'))\geq P(X_{1}^{n}(i)))\right]\leq \left({\frac {P(X_{1}^{n}(i'))}{P(X_{1}^{n}(i))}}\right)^{s}\,}$

for some arbitrary real number ${\displaystyle s>0\,.}$ This upper bound can be verified by noting that ${\displaystyle P(P(X_{1}^{n}(i'))>P(X_{1}^{n}(i)))\,}$ either equals ${\displaystyle 1\,}$ or ${\displaystyle 0\,}$ because the probabilities of a given input sequence are completely deterministic. Thus, if ${\displaystyle P(X_{1}^{n}(i'))\geq P(X_{1}^{n}(i))\,,}$ then ${\displaystyle {\frac {P(X_{1}^{n}(i'))}{P(X_{1}^{n}(i))}}\geq 1\,}$ so that the inequality holds in that case. The inequality holds in the other case as well because

${\displaystyle \left({\frac {P(X_{1}^{n}(i'))}{P(X_{1}^{n}(i))}}\right)^{s}\geq 0\,}$

for all possible source strings. Thus, combining everything and introducing some ${\displaystyle \rho \in [0,1]\,}$, have that

${\displaystyle P(E|S_{i})\leq P(\bigcup _{i\neq i'}A_{i'})\leq \left(\sum _{i\neq i'}P(A_{i'})\right)^{\rho }\leq \left({\frac {1}{M}}\sum _{i\neq i'}\left({\frac {P(X_{1}^{n}(i'))}{P(X_{1}^{n}(i))}}\right)^{s}\right)^{\rho }\,.}$

Where the inequalities follow from a variation on the Union Bound. Finally applying this upper bound to the summation for ${\displaystyle P(E)\,}$ have that:

${\displaystyle P(E)=\sum _{i}P(E|S_{i})P(S_{i})\leq \sum _{i}P(X_{1}^{n}(i))\left({\frac {1}{M}}\sum _{i'}\left({\frac {P(X_{1}^{n}(i'))}{P(X_{1}^{n}(i))}}\right)^{s}\right)^{\rho }\,.}$

Where the sum can now be taken over all ${\displaystyle i'\,}$ because that will only increase the bound. Ultimately yielding that

${\displaystyle P(E)\leq {\frac {1}{M^{\rho }}}\sum _{i}P(X_{1}^{n}(i))^{1-s\rho }\left(\sum _{i'}P(X_{1}^{n}(i'))^{s}\right)^{\rho }\,.}$

Now for simplicity let ${\displaystyle 1-s\rho =s\,}$ so that ${\displaystyle s={\frac {1}{1+\rho }}\,.}$ Substituting this new value of ${\displaystyle s\,}$ into the above bound on the probability of error and using the fact that ${\displaystyle i'\,}$ is just a dummy variable in the sum gives the following as an upper bound on the probability of error:

${\displaystyle P(E)\leq {\frac {1}{M^{\rho }}}\left(\sum _{i}P(X_{1}^{n}(i))^{\frac {1}{1+\rho }}\right)^{1+\rho }\,.}$
${\displaystyle M=e^{nR}\,\!}$ and each of the components of ${\displaystyle X_{1}^{n}(i)\,}$ are independent. Thus, simplifying the above equation yields
${\displaystyle P(E)\leq \exp \left(-n\left[\rho R-\ln \left(\sum _{x_{i}}P(x_{i})^{\frac {1}{1+\rho }}\right)(1+\rho )\right]\right).}$

The term in the exponent should be maximized over ${\displaystyle \rho \,}$ in order to achieve the tightest upper bound on the probability of error.

Letting ${\displaystyle E_{0}(\rho )=\ln \left(\sum _{x_{i}}P(x_{i})^{\frac {1}{1+\rho }}\right)(1+\rho )\,,}$ see that the error exponent for the source coding case is:

${\displaystyle E_{r}(R)=\max _{\rho \in [0,1]}\left[\rho R-E_{0}(\rho )\right].\,}$