User:InfoTheorist/Shannon's theorem

From Wikipedia, the free encyclopedia

In information theory, a result is known as a converse if it provides an upper bound on the achievable rate for a given channel. Thus a converse, combined with a matching achievability result (lower bound) determines the capacity of a channel. In general, two types of converse results have been studied. The first type, known as a weak converse, provides an upper bound on the achievable rate for a given probability of error ε. The implies that if a transmitter sends at a rate higher than the bound provided by the weak converse, the probability of error will be greater than ε. A strong converse, on the other hand, provides a much stronger result: if a transmitter sends at a rate higher than the given bound, the probability of error will not only be greater than ε, but will converge to one as codes with larger blocklengths are utilized.

In Shannon's noisy-channel coding theorem, Fano's inequality is used to obtain a weak converse. However, while Fano's inequality provides a non-vanishing lower bound for the probability of error when the transmitter's rate is above the channel capacity, it is not sufficient to prove that the probability of error converges to one when the blocklength goes to infinity.

Problem Formulation[edit]

Let and be sets. A channel, with input alphabet and output alphabet , is a sequence of conditional probability distributions where

The channel is said to be discrete if both and are finite sets. A channel is called memoryless if for every positive integer we have

We say a channel is stationary if every stationary input results in a stationary output. A memoryless channel is stationary if the functions are equal for all .

Therefore a stationary memoryless channel can be simply represented as the triple

A (2nR,n) code consists of a message set

an encoder

a decoder

The average probability of error of the code is given by

The value of n is known as the blocklength of the code.

A rate R (which is a nonnegative number) is said to be achievable, if there exists a sequence of (2nR,n) codes with P(n)e going to zero as n goes to infinity. The noisy-channel coding theorem states that a rate R is achievable if and only if R is smaller than the capacity of the channel C, where

Wolfowitz's theorem states that for any discrete memoryless channel with capacity C and any (2nR,n) code with rate R>C,

for some positive constant A dependent on the channel but not on n or M. The proof which follows is based on Gallager's book[1].

Proof[edit]

For the proof we first require a lemma. This lemma is essentially a special case of the method of Lagrange multipliers for a concave function defined on the standard simplex . It is then followed by a corollary which simply applies the lemma to the mutual information.

Lemma[edit]

Let be a concave function. Suppose f has continuous partial derivatives on its domain. Then maximizes iff there exists some real such that for every

and for every

Proof of Lemma[edit]

Suppose satisfies the above conditions. We'll show that achieves its maximum at . Let be any element of . By the concavity of for any , we have

thus

Allowing and making use of the continuity of partial derivatives results in

For the other direction suppose maximizes . Then for every and every ,

This implies

and by the continuity of the partial derivatives,

(1)

Since , at least one of its components, say is strictly positive. Now let be an arbitrary element of . Furthermore, for every , let denote the element of that consists of all zeros but one one in the position. Define

Then inequality (1) simplifies to

In addition, if and we define

then (1) results in

Thus if we define as

the result follows.

Corollary[edit]

For any discrete memoryless channel the distribution achieves capacity iff there exists a real number such that for , and for , where

.

Furthermore, is the capacity of the channel.

Proof of Corollary[edit]

The proof of the corollary is straightforward and follows directly from the lemma. To see this, note that for any ,

Since

this implies

Now using the lemma, the claim follows.

Proof of the Strong Converse[edit]

For any two random variables X and Y define the information density as

Note that

and

Let be the capacity-achieving output distribution. For any positive integer n, define

For any , define

where

Consider a (2nR,n) code with codewords and decoding regions . Then the probability that a codeword is decoded correctly is given by

Fix positive ε. For every m, define the set

Then

Based on the definition of Bm, the second sum can be upper bounded as

Using Chebyshev's inequality we can find an upper bound on the first sum

where

If we define

(note that A only depends on the channel and is independent of n and M) then

Therefore

Since , we get

Should R>C, setting ε to R-C/2 results in

Notes[edit]

References[edit]

  • Gallager, Robert G. (1968). Information Theory and Reliable Communication. Wiley. p. 87-89 and 173-175.

InfoTheorist (talk) 00:24, 5 November 2014 (UTC)