Classical capacity

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In quantum information theory, the classical capacity of a quantum channel is the maximum rate at which classical data can be sent over it error-free in the limit of many uses of the channel. Holevo, Schumacher, and Westmoreland proved the following lower bound on the classical capacity of any quantum channel \mathcal{N}:

\chi(\mathcal{N}) = \max_{\rho^{XA}} I(X;B)_{\mathcal{N}(\rho)}

where \rho^{XA} is a classical-quantum state of the following form:

\rho^{XA} = \sum_x p_X(x) \vert x \rangle \langle x \vert^X \otimes \rho_x^A ,

p_X(x) is a probability distribution, and each \rho_x^A is a density operator that can be input to the channel \mathcal{N}.

Achievability using sequential decoding[edit]

We briefly review the HSW coding theorem (the statement of the achievability of the Holevo information rate I(X;B) for communicating classical data over a quantum channel). We first review the minimal amount of quantum mechanics needed for the theorem. We then cover quantum typicality, and finally we prove the theorem using a recent sequential decoding technique.

Review of Quantum Mechanics[edit]

In order to prove the HSW coding theorem, we really just need a few basic things from quantum mechanics. First, a quantum state is a unit trace, positive operator known as a density operator. Usually, we denote it by \rho, \sigma, \omega, etc. The simplest model for a quantum channel is known as a classical-quantum channel:

x\mapsto \rho_{x}.

The meaning of the above notation is that inputting the classical letter x at the transmitting end leads to a quantum state \rho_{x} at the receiving end. It is the task of the receiver to perform a measurement to determine the input of the sender. If it is true that the states \rho_{x} are perfectly distinguishable from one another (i.e., if they have orthogonal supports such that \mathrm{Tr}\,\left\{  \rho_{x}\rho_{x^{\prime}}\right\}  =0 for x\neq x^{\prime}
), then the channel is a noiseless channel. We are interested in situations for which this is not the case. If it is true that the states \rho_{x} all commute with one another, then this is effectively identical to the situation for a classical channel, so we are also not interested in these situations. So, the situation in which we are interested is that in which the states \rho_{x} have overlapping support and are non-commutative.

The most general way to describe a quantum measurement is with a positive operator-valued measure (POVM). We usually denote the elements of a POVM as \left\{  \Lambda_{m}\right\}  _{m}. These operators should satisfy positivity and completeness in order to form a valid POVM:

\Lambda_{m}   \geq0\ \ \ \ \forall m
\sum_{m}\Lambda_{m}  =I.

The probabilistic interpretation of quantum mechanics states that if someone measures a quantum state \rho using a measurement device corresponding to the POVM \left\{  \Lambda_{m}\right\}  , then the probability p\left(
m\right) for obtaining outcome m is equal to

p\left(  m\right)  =\text{Tr}\left\{  \Lambda_{m}\rho\right\}  ,

and the post-measurement state is

\rho_{m}^{\prime}=\frac{1}{p\left(  m\right)  }\sqrt{\Lambda_{m}}\rho

if the person measuring obtains outcome m. These rules are sufficient for us to consider classical communication schemes over cq channels.

Quantum Typicality[edit]

The reader can find a good review of this topic in the article about the typical subspace.

Gentle Operator Lemma[edit]

The following lemma is important for our proofs. It demonstrates that a measurement that succeeds with high probability on average does not disturb the state too much on average:

Lemma: [Winter] Given an ensemble \left\{  p_{X}\left(  x\right)  ,\rho_{x}\right\}  with expected density operator \rho\equiv\sum_{x}p_{X}\left(  x\right)  \rho_{x}, suppose that an operator \Lambda such that I\geq\Lambda\geq0 succeeds with high probability on the state \rho:

\text{Tr}\left\{  \Lambda\rho\right\}  \geq1-\epsilon.

Then the subnormalized state \sqrt{\Lambda}\rho_{x}\sqrt{\Lambda} is close in expected trace distance to the original state \rho_{x}:

\mathbb{E}_{X}\left\{  \left\Vert \sqrt{\Lambda}\rho_{X}\sqrt{\Lambda}
-\rho_{X}\right\Vert _{1}\right\}  \leq2\sqrt{\epsilon}.

(Note that \left\Vert A\right\Vert _{1} is the nuclear norm of the operator A so that \left\Vert A\right\Vert _{1}\equivTr\left\{  \sqrt{A^{\dagger}
A}\right\}  .)

The following inequality is useful for us as well. It holds for any operators \rho, \sigma, \Lambda such that 0\leq\rho,\sigma,\Lambda\leq I:

\text{Tr}\left\{  \Lambda\rho\right\}  \leq\text{Tr}\left\{  \Lambda
\sigma\right\}  +\left\Vert \rho-\sigma\right\Vert _{1}.






The quantum information-theoretic interpretation of the above inequality is that the probability of obtaining outcome \Lambda from a quantum measurement acting on the state \rho is upper bounded by the probability of obtaining outcome \Lambda on the state \sigma summed with the distinguishability of the two states \rho and \sigma.

Non-Commutative Union Bound[edit]

Lemma: [Sen's bound] The following bound holds for a subnormalized state \sigma such that 0\leq\sigma and Tr\left\{  \sigma\right\}  \leq1 with \Pi_{1}, ... , \Pi_{N} being projectors: 
\text{Tr}\left\{  \sigma\right\}  -\text{Tr}\left\{  \Pi_{N}\cdots\Pi
_{1}\ \sigma\ \Pi_{1}\cdots\Pi_{N}\right\}  \leq2\sqrt{\sum_{i=1}^{N}
\text{Tr}\left\{  \left(  I-\Pi_{i}\right)  \sigma\right\}  },

We can think of Sen's bound as a "non-commutative union bound" because it is analogous to the following union bound from probability theory:

\Pr\left\{  \left(  A_{1}\cap\cdots\cap A_{N}\right)  ^{c}\right\}
=\Pr\left\{  A_{1}^{c}\cup\cdots\cup A_{N}^{c}\right\}  \leq\sum_{i=1}^{N}
\Pr\left\{  A_{i}^{c}\right\}  ,

where A_{1}, \ldots, A_{N} are events. The analogous bound for projector logic would be

\text{Tr}\left\{  \left(  I-\Pi_{1}\cdots\Pi_{N}\cdots\Pi_{1}\right)
\rho\right\}  \leq\sum_{i=1}^{N}\text{Tr}\left\{  \left(  I-\Pi_{i}\right)
\rho\right\}  ,

if we think of \Pi_{1}\cdots\Pi_{N} as a projector onto the intersection of subspaces. Though, the above bound only holds if the projectors \Pi_{1}, ..., \Pi_{N} are commuting (choosing \Pi_{1}=\left\vert +\right\rangle
\left\langle +\right\vert , \Pi_{2}=\left\vert 0\right\rangle \left\langle
0\right\vert , and \rho=\left\vert 0\right\rangle \left\langle 0\right\vert
gives a counterexample). If the projectors are non-commuting, then Sen's bound is the next best thing and suffices for our purposes here.

HSW Theorem with the non-commutative union bound[edit]

We now prove the HSW theorem with Sen's non-commutative union bound. We divide up the proof into a few parts: codebook generation, POVM construction, and error analysis.

Codebook Generation. We first describe how Alice and Bob agree on a random choice of code. They have the channel x\rightarrow\rho_{x} and a distribution p_{X}\left(  x\right)  . They choose M classical sequences x^{n} according to the IID\ distribution p_{X^{n}}\left(  x^{n}\right)  . After selecting them, they label them with indices as \left\{  x^{n}\left(
m\right)  \right\}  _{m\in\left[  M\right]  }. This leads to the following quantum codewords:

\rho_{x^{n}\left(  m\right)  }=\rho_{x_{1}\left(  m\right)  }\otimes
\cdots\otimes\rho_{x_{n}\left(  m\right)  }.

The quantum codebook is then \left\{  \rho_{x^{n}\left(  m\right)  }\right\}
. The average state of the codebook is then

\mathbb{E}_{X^{n}}\left\{  \rho_{X^{n}}\right\}  =\sum_{x^{n}}p_{X^{n}}\left(
x^{n}\right)  \rho_{x^{n}}=\rho^{\otimes n},






where \rho=\sum_{x}p_{X}\left(  x\right)  \rho_{x}.

POVM Construction . Sens' bound from the above lemma suggests a method for Bob to decode a state that Alice transmits. Bob should first ask "Is the received state in the average typical subspace?" He can do this operationally by performing a typical subspace measurement corresponding to \left\{  \Pi_{\rho,\delta}
^{n},I-\Pi_{\rho,\delta}^{n}\right\}  . Next, he asks in sequential order, "Is the received codeword in the m^{\text{th}} conditionally typical subspace?" This is in some sense equivalent to the question, "Is the received codeword the m^{\text{th}} transmitted codeword?" He can ask these questions operationally by performing the measurements corresponding to the conditionally typical projectors \left\{  \Pi_{\rho_{x^{n}\left(  m\right)
},\delta},I-\Pi_{\rho_{x^{n}\left(  m\right)  },\delta}\right\}  .

Why should this sequential decoding scheme work well? The reason is that the transmitted codeword lies in the typical subspace on average:

\mathbb{E}_{X^{n}}\left\{  \text{Tr}\left\{  \Pi_{\rho,\delta}\ \rho_{X^{n}
}\right\}  \right\}    =\text{Tr}\left\{  \Pi_{\rho,\delta}\ \mathbb{E}
_{X^{n}}\left\{  \rho_{X^{n}}\right\}  \right\}
 =\text{Tr}\left\{  \Pi_{\rho,\delta}\ \rho^{\otimes n}\right\}

where the inequality follows from (\ref{eq:1st-typ-prop}). Also, the projectors \Pi_{\rho_{x^{n}\left(  m\right)  },\delta} are "good detectors" for the states \rho_{x^{n}\left(  m\right)
} (on average) because the following condition holds from conditional quantum typicality:

\mathbb{E}_{X^{n}}\left\{  \text{Tr}\left\{  \Pi_{\rho_{X^{n}},\delta}
\ \rho_{X^{n}}\right\}  \right\}  \geq1-\epsilon.

Error Analysis . The probability of detecting the m^{\text{th}} codeword correctly under our sequential decoding scheme is equal to

\text{Tr}\left\{  \Pi_{\rho_{X^{n}\left(  m\right)  },\delta}\hat{\Pi}
_{\rho_{X^{n}\left(  m-1\right)  },\delta}\cdots\hat{\Pi}_{\rho_{X^{n}\left(
1\right)  },\delta}\ \Pi_{\rho,\delta}^{n}\ \rho_{x^{n}\left(  m\right)
}\ \Pi_{\rho,\delta}^{n}\ \hat{\Pi}_{\rho_{X^{n}\left(  1\right)  },\delta
}\cdots\hat{\Pi}_{\rho_{X^{n}\left(  m-1\right)  },\delta}\Pi_{\rho
_{X^{n}\left(  m\right)  },\delta}\right\}  ,

where we make the abbreviation \hat{\Pi}\equiv I-\Pi. (Observe that we project into the average typical subspace just once.) Thus, the probability of an incorrect detection for the m^{\text{th}} codeword is given by

1-\text{Tr}\left\{  \Pi_{\rho_{X^{n}\left(  m\right)  },\delta}\hat{\Pi}
_{\rho_{X^{n}\left(  m-1\right)  },\delta}\cdots\hat{\Pi}_{\rho_{X^{n}\left(
1\right)  },\delta}\ \Pi_{\rho,\delta}^{n}\ \rho_{x^{n}\left(  m\right)
}\ \Pi_{\rho,\delta}^{n}\ \hat{\Pi}_{\rho_{X^{n}\left(  1\right)  },\delta
}\cdots\hat{\Pi}_{\rho_{X^{n}\left(  m-1\right)  },\delta}\Pi_{\rho
_{X^{n}\left(  m\right)  },\delta}\right\}  ,

and the average error probability of this scheme is equal to

1-\frac{1}{M}\sum_{m}\text{Tr}\left\{  \Pi_{\rho_{X^{n}\left(  m\right)
},\delta}\hat{\Pi}_{\rho_{X^{n}\left(  m-1\right)  },\delta}\cdots\hat{\Pi
}_{\rho_{X^{n}\left(  1\right)  },\delta}\ \Pi_{\rho,\delta}^{n}\ \rho
_{x^{n}\left(  m\right)  }\ \Pi_{\rho,\delta}^{n}\ \hat{\Pi}_{\rho
_{X^{n}\left(  1\right)  },\delta}\cdots\hat{\Pi}_{\rho_{X^{n}\left(
m-1\right)  },\delta}\Pi_{\rho_{X^{n}\left(  m\right)  },\delta}\right\}  .

Instead of analyzing the average error probability, we analyze the expectation of the average error probability, where the expectation is with respect to the random choice of code:

1-\mathbb{E}_{X^{n}}\left\{  \frac{1}{M}\sum_{m}\text{Tr}\left\{  \Pi
_{\rho_{X^{n}\left(  m\right)  },\delta}\hat{\Pi}_{\rho_{X^{n}\left(
m-1\right)  },\delta}\cdots\hat{\Pi}_{\rho_{X^{n}\left(  1\right)  },\delta
}\ \Pi_{\rho,\delta}^{n}\ \rho_{X^{n}\left(  m\right)  }\ \Pi_{\rho,\delta
}^{n}\ \hat{\Pi}_{\rho_{X^{n}\left(  1\right)  },\delta}\cdots\hat{\Pi}
_{\rho_{X^{n}\left(  m-1\right)  },\delta}\Pi_{\rho_{X^{n}\left(  m\right)
},\delta}\right\}  \right\}  .






Our first step is to apply Sen's bound to the above quantity. But before doing so, we should rewrite the above expression just slightly, by observing that

1   =\mathbb{E}_{X^{n}}\left\{  \frac{1}{M}\sum_{m}\text{Tr}\left\{
\rho_{X^{n}\left(  m\right)  }\right\}  \right\}
 =\mathbb{E}_{X^{n}}\left\{  \frac{1}{M}\sum_{m}\text{Tr}\left\{  \Pi
_{\rho,\delta}^{n}\rho_{X^{n}\left(  m\right)  }\right\}  +\text{Tr}\left\{
\hat{\Pi}_{\rho,\delta}^{n}\rho_{X^{n}\left(  m\right)  }\right\}  \right\}
 =\mathbb{E}_{X^{n}}\left\{  \frac{1}{M}\sum_{m}\text{Tr}\left\{  \Pi
_{\rho,\delta}^{n}\rho_{X^{n}\left(  m\right)  }\Pi_{\rho,\delta}^{n}\right\}
\right\}  +\frac{1}{M}\sum_{m}\text{Tr}\left\{  \hat{\Pi}_{\rho,\delta}
^{n}\mathbb{E}_{X^{n}}\left\{  \rho_{X^{n}\left(  m\right)  }\right\}
 =\mathbb{E}_{X^{n}}\left\{  \frac{1}{M}\sum_{m}\text{Tr}\left\{  \Pi
_{\rho,\delta}^{n}\rho_{X^{n}\left(  m\right)  }\Pi_{\rho,\delta}^{n}\right\}
\right\}  +\text{Tr}\left\{  \hat{\Pi}_{\rho,\delta}^{n}\rho^{\otimes
 \leq\mathbb{E}_{X^{n}}\left\{  \frac{1}{M}\sum_{m}\text{Tr}\left\{
\Pi_{\rho,\delta}^{n}\rho_{X^{n}\left(  m\right)  }\Pi_{\rho,\delta}
^{n}\right\}  \right\}  +\epsilon

Substituting into (3) (and forgetting about the small \epsilon term for now) gives an upper bound of

\mathbb{E}_{X^{n}}\left\{  \frac{1}{M}\sum_{m}\text{Tr}\left\{  \Pi
_{\rho,\delta}^{n}\rho_{X^{n}\left(  m\right)  }\Pi_{\rho,\delta}^{n}\right\}

-\mathbb{E}_{X^{n}}\left\{  \frac{1}{M}\sum_{m}\text{Tr}\left\{  \Pi
_{\rho_{X^{n}\left(  m\right)  },\delta}\hat{\Pi}_{\rho_{X^{n}\left(
m-1\right)  },\delta}\cdots\hat{\Pi}_{\rho_{X^{n}\left(  1\right)  },\delta
}\ \Pi_{\rho,\delta}^{n}\ \rho_{X^{n}\left(  m\right)  }\ \Pi_{\rho,\delta
}^{n}\ \hat{\Pi}_{\rho_{X^{n}\left(  1\right)  },\delta}\cdots\hat{\Pi}
_{\rho_{X^{n}\left(  m-1\right)  },\delta}\Pi_{\rho_{X^{n}\left(  m\right)
},\delta}\right\}  \right\}  .

We then apply Sen's bound to this expression with \sigma=\Pi_{\rho,\delta
}^{n}\rho_{X^{n}\left(  m\right)  }\Pi_{\rho,\delta}^{n} and the sequential projectors as \Pi_{\rho_{X^{n}\left(  m\right)  },\delta}, \hat{\Pi}
_{\rho_{X^{n}\left(  m-1\right)  },\delta}, ..., \hat{\Pi}_{\rho
_{X^{n}\left(  1\right)  },\delta}. This gives the upper bound 
\mathbb{E}_{X^{n}}\left\{  \frac{1}{M}\sum_{m}2\left[  \text{Tr}\left\{
\left(  I-\Pi_{\rho_{X^{n}\left(  m\right)  },\delta}\right)  \Pi_{\rho
,\delta}^{n}\rho_{X^{n}\left(  m\right)  }\Pi_{\rho,\delta}^{n}\right\}
+\sum_{i=1}^{m-1}\text{Tr}\left\{  \Pi_{\rho_{X^{n}\left(  i\right)  },\delta
}\Pi_{\rho,\delta}^{n}\rho_{X^{n}\left(  m\right)  }\Pi_{\rho,\delta}
^{n}\right\}  \right]  ^{1/2}\right\}  .
Due to concavity of the square root, we can bound this expression from above by

 2\left[  \mathbb{E}_{X^{n}}\left\{  \frac{1}{M}\sum_{m}\text{Tr}\left\{
\left(  I-\Pi_{\rho_{X^{n}\left(  m\right)  },\delta}\right)  \Pi_{\rho
,\delta}^{n}\rho_{X^{n}\left(  m\right)  }\Pi_{\rho,\delta}^{n}\right\}
+\sum_{i=1}^{m-1}\text{Tr}\left\{  \Pi_{\rho_{X^{n}\left(  i\right)  },\delta
}\Pi_{\rho,\delta}^{n}\rho_{X^{n}\left(  m\right)  }\Pi_{\rho,\delta}
^{n}\right\}  \right\}  \right]  ^{1/2}
 \leq2\left[  \mathbb{E}_{X^{n}}\left\{  \frac{1}{M}\sum_{m}\text{Tr}\left\{
\left(  I-\Pi_{\rho_{X^{n}\left(  m\right)  },\delta}\right)  \Pi_{\rho
,\delta}^{n}\rho_{X^{n}\left(  m\right)  }\Pi_{\rho,\delta}^{n}\right\}
+\sum_{i\neq m}\text{Tr}\left\{  \Pi_{\rho_{X^{n}\left(  i\right)  },\delta
}\Pi_{\rho,\delta}^{n}\rho_{X^{n}\left(  m\right)  }\Pi_{\rho,\delta}
^{n}\right\}  \right\}  \right]  ^{1/2},

where the second bound follows by summing over all of the codewords not equal to the m^{\text{th}} codeword (this sum can only be larger).

We now focus exclusively on showing that the term inside the square root can be made small. Consider the first term:

 \mathbb{E}_{X^{n}}\left\{  \frac{1}{M}\sum_{m}\text{Tr}\left\{  \left(
I-\Pi_{\rho_{X^{n}\left(  m\right)  },\delta}\right)  \Pi_{\rho,\delta}
^{n}\rho_{X^{n}\left(  m\right)  }\Pi_{\rho,\delta}^{n}\right\}  \right\}
 \leq\mathbb{E}_{X^{n}}\left\{  \frac{1}{M}\sum_{m}\text{Tr}\left\{  \left(
I-\Pi_{\rho_{X^{n}\left(  m\right)  },\delta}\right)  \rho_{X^{n}\left(
m\right)  }\right\}  +\left\Vert \rho_{X^{n}\left(  m\right)  }-\Pi
_{\rho,\delta}^{n}\rho_{X^{n}\left(  m\right)  }\Pi_{\rho,\delta}
^{n}\right\Vert _{1}\right\}

where the first inequality follows from (1) and the second inequality follows from the Gentle Operator Lemma and the properties of unconditional and conditional typicality. Consider now the second term and the following chain of inequalities:

  \sum_{i\neq m}\mathbb{E}_{X^{n}}\left\{  \text{Tr}\left\{  \Pi_{\rho
_{X^{n}\left(  i\right)  },\delta}\ \Pi_{\rho,\delta}^{n}\ \rho_{X^{n}\left(
m\right)  }\ \Pi_{\rho,\delta}^{n}\right\}  \right\}
  =\sum_{i\neq m}\text{Tr}\left\{  \mathbb{E}_{X^{n}}\left\{  \Pi
_{\rho_{X^{n}\left(  i\right)  },\delta}\right\}  \ \Pi_{\rho,\delta}
^{n}\ \mathbb{E}_{X^{n}}\left\{  \rho_{X^{n}\left(  m\right)  }\right\}
\ \Pi_{\rho,\delta}^{n}\right\}
 =\sum_{i\neq m}\text{Tr}\left\{  \mathbb{E}_{X^{n}}\left\{  \Pi
_{\rho_{X^{n}\left(  i\right)  },\delta}\right\}  \ \Pi_{\rho,\delta}
^{n}\ \rho^{\otimes n}\ \Pi_{\rho,\delta}^{n}\right\}
  \leq\sum_{i\neq m}2^{-n\left[  H\left(  B\right)  -\delta\right]
}\ \text{Tr}\left\{  \mathbb{E}_{X^{n}}\left\{  \Pi_{\rho_{X^{n}\left(
i\right)  },\delta}\right\}  \ \Pi_{\rho,\delta}^{n}\right\}

The first equality follows because the codewords X^{n}\left(  m\right)  and X^{n}\left(  i\right)  are independent since they are different. The second equality follows from (2). The first inequality follows from (\ref{eq:3rd-typ-prop}). Continuing, we have

  \leq\sum_{i\neq m}2^{-n\left[  H\left(  B\right)  -\delta\right]
}\ \mathbb{E}_{X^{n}}\left\{  \text{Tr}\left\{  \Pi_{\rho_{X^{n}\left(
i\right)  },\delta}\right\}  \right\}
  \leq\sum_{i\neq m}2^{-n\left[  H\left(  B\right)  -\delta\right]
}\ 2^{n\left[  H\left(  B|X\right)  +\delta\right]  }
  =\sum_{i\neq m}2^{-n\left[  I\left(  X;B\right)  -2\delta\right]  }
  \leq M\ 2^{-n\left[  I\left(  X;B\right)  -2\delta\right]  }.

The first inequality follows from \Pi_{\rho,\delta}^{n}\leq I and exchanging the trace with the expectation. The second inequality follows from (\ref{eq:2nd-cond-typ}). The next two are straightforward.

Putting everything together, we get our final bound on the expectation of the average error probability:

1-\mathbb{E}_{X^{n}}\left\{  \frac{1}{M}\sum_{m}\text{Tr}\left\{  \Pi
_{\rho_{X^{n}\left(  m\right)  },\delta}\hat{\Pi}_{\rho_{X^{n}\left(
m-1\right)  },\delta}\cdots\hat{\Pi}_{\rho_{X^{n}\left(  1\right)  },\delta
}\ \Pi_{\rho,\delta}^{n}\ \rho_{X^{n}\left(  m\right)  }\ \Pi_{\rho,\delta
}^{n}\ \hat{\Pi}_{\rho_{X^{n}\left(  1\right)  },\delta}\cdots\hat{\Pi}
_{\rho_{X^{n}\left(  m-1\right)  },\delta}\Pi_{\rho_{X^{n}\left(  m\right)
},\delta}\right\}  \right\}
\leq\epsilon+2\left[  \left(  \epsilon+2\sqrt{\epsilon}\right)
+M\ 2^{-n\left[  I\left(  X;B\right)  -2\delta\right]  }\right]  ^{1/2}.

Thus, as long as we choose M=2^{n\left[  I\left(  X;B\right)  -3\delta
\right]  }, there exists a code with vanishing error probability.

See also[edit]