# Classical capacity

In quantum information theory, the classical capacity of a quantum channel is the maximum rate at which classical data can be sent over it error-free in the limit of many uses of the channel. Holevo, Schumacher, and Westmoreland proved the following lower bound on the classical capacity of any quantum channel $\mathcal{N}$:

$\chi(\mathcal{N}) = \max_{\rho^{XA}} I(X;B)_{\mathcal{N}(\rho)}$

where $\rho^{XA}$ is a classical-quantum state of the following form:

$\rho^{XA} = \sum_x p_X(x) \vert x \rangle \langle x \vert^X \otimes \rho_x^A ,$

$p_X(x)$ is a probability distribution, and each $\rho_x^A$ is a density operator that can be input to the channel $\mathcal{N}$.

## Achievability using sequential decoding

We briefly review the HSW coding theorem (the statement of the achievability of the Holevo information rate $I(X;B)$ for communicating classical data over a quantum channel). We first review the minimal amount of quantum mechanics needed for the theorem. We then cover quantum typicality, and finally we prove the theorem using a recent sequential decoding technique.

## Review of Quantum Mechanics

In order to prove the HSW coding theorem, we really just need a few basic things from quantum mechanics. First, a quantum state is a unit trace, positive operator known as a density operator. Usually, we denote it by $\rho$, $\sigma$, $\omega$, etc. The simplest model for a quantum channel is known as a classical-quantum channel:

$x\rightarrow\rho_{x}.$

The meaning of the above notation is that inputting the classical letter $x$ at the transmitting end leads to a quantum state $\rho_{x}$ at the receiving end. It is the task of the receiver to perform a measurement to determine the input of the sender. If it is true that the states $\rho_{x}$ are perfectly distinguishable from one another (i.e., if they have orthogonal supports such that Tr$\left\{ \rho_{x}\rho_{x^{\prime}}\right\} =0$ for $x\neq x^{\prime}$), then the channel is a noiseless channel. We are interested in situations for which this is not the case. If it is true that the states $\rho_{x}$ all commute with one another, then this is effectively identical to the situation for a classical channel, so we are also not interested in these situations. So, the situation in which we are interested is that in which the states $\rho_{x}$ have overlapping support and are non-commutative.

The most general way to describe a quantum measurement is with a [[positive operator-valued measure]] (POVM). We usually denote the elements of a POVM as $\left\{ \Lambda_{m}\right\} _{m}$. These operators should satisfy positivity and completeness in order to form a valid POVM:

$\Lambda_{m} \geq0\ \ \ \ \forall m$
$\sum_{m}\Lambda_{m} =I.$

The probabilistic interpretation of quantum mechanics states that if someone measures a quantum state $\rho$ using a measurement device corresponding to the POVM $\left\{ \Lambda_{m}\right\}$, then the probability $p\left( m\right)$ for obtaining outcome $m$ is equal to

$p\left( m\right) =\text{Tr}\left\{ \Lambda_{m}\rho\right\} ,$

and the post-measurement state is

$\rho_{m}^{\prime}=\frac{1}{p\left( m\right) }\sqrt{\Lambda_{m}}\rho \sqrt{\Lambda_{m}},$

if the person measuring obtains outcome $m$. These rules are sufficient for us to consider classical communication schemes over cq channels.

## Quantum Typicality

The reader can find a good review of this topic in the article about the typical subspace.

## Gentle Operator Lemma

The following lemma is important for our proofs. It demonstrates that a measurement that succeeds with high probability on average does not disturb the state too much on average:

Lemma: [Winter] Given an ensemble $\left\{ p_{X}\left( x\right) ,\rho_{x}\right\}$ with expected density operator $\rho\equiv\sum_{x}p_{X}\left( x\right) \rho_{x}$, suppose that an operator $\Lambda$ such that $I\geq\Lambda\geq0$ succeeds with high probability on the state $\rho$:

$\text{Tr}\left\{ \Lambda\rho\right\} \geq1-\epsilon.$

Then the subnormalized state $\sqrt{\Lambda}\rho_{x}\sqrt{\Lambda}$ is close in expected trace distance to the original state $\rho_{x}$:

$\mathbb{E}_{X}\left\{ \left\Vert \sqrt{\Lambda}\rho_{X}\sqrt{\Lambda} -\rho_{X}\right\Vert _{1}\right\} \leq2\sqrt{\epsilon}.$

(Note that $\left\Vert A\right\Vert _{1}$ is the nuclear norm of the operator $A$ so that $\left\Vert A\right\Vert _{1}\equiv$Tr$\left\{ \sqrt{A^{\dagger} A}\right\}$.)

The following inequality is useful for us as well. It holds for any operators $\rho$, $\sigma$, $\Lambda$ such that $0\leq\rho,\sigma,\Lambda\leq I$:

$\text{Tr}\left\{ \Lambda\rho\right\} \leq\text{Tr}\left\{ \Lambda \sigma\right\} +\left\Vert \rho-\sigma\right\Vert _{1}.$

The quantum information-theoretic interpretation of the above inequality is that the probability of obtaining outcome $\Lambda$ from a quantum measurement acting on the state $\rho$ is upper bounded by the probability of obtaining outcome $\Lambda$ on the state $\sigma$ summed with the distinguishability of the two states $\rho$ and $\sigma$.

## Non-Commutative Union Bound

Lemma: [Sen's bound] The following bound holds for a subnormalized state $\sigma$ such that $0\leq\sigma$ and $Tr\left\{ \sigma\right\} \leq1$ with $\Pi_{1}$, ... , $\Pi_{N}$ being projectors: $\text{Tr}\left\{ \sigma\right\} -\text{Tr}\left\{ \Pi_{N}\cdots\Pi _{1}\ \sigma\ \Pi_{1}\cdots\Pi_{N}\right\} \leq2\sqrt{\sum_{i=1}^{N} \text{Tr}\left\{ \left( I-\Pi_{i}\right) \sigma\right\} },$

We can think of Sen's bound as a "non-commutative union bound" because it is analogous to the following union bound from probability theory:

$\Pr\left\{ \left( A_{1}\cap\cdots\cap A_{N}\right) ^{c}\right\} =\Pr\left\{ A_{1}^{c}\cup\cdots\cup A_{N}^{c}\right\} \leq\sum_{i=1}^{N} \Pr\left\{ A_{i}^{c}\right\} ,$

where $A_{1}$, \ldots, $A_{N}$ are events. The analogous bound for projector logic would be

$\text{Tr}\left\{ \left( I-\Pi_{1}\cdots\Pi_{N}\cdots\Pi_{1}\right) \rho\right\} \leq\sum_{i=1}^{N}\text{Tr}\left\{ \left( I-\Pi_{i}\right) \rho\right\} ,$

if we think of $\Pi_{1}\cdots\Pi_{N}$ as a projector onto the intersection of subspaces. Though, the above bound only holds if the projectors $\Pi_{1}$, ..., $\Pi_{N}$ are commuting (choosing $\Pi_{1}=\left\vert +\right\rangle \left\langle +\right\vert$, $\Pi_{2}=\left\vert 0\right\rangle \left\langle 0\right\vert$, and $\rho=\left\vert 0\right\rangle \left\langle 0\right\vert$ gives a counterexample). If the projectors are non-commuting, then Sen's bound is the next best thing and suffices for our purposes here.

## HSW Theorem with the non-commutative union bound

We now prove the HSW theorem with Sen's non-commutative union bound. We divide up the proof into a few parts: codebook generation, POVM construction, and error analysis.

Codebook Generation. We first describe how Alice and Bob agree on a random choice of code. They have the channel $x\rightarrow\rho_{x}$ and a distribution $p_{X}\left( x\right)$. They choose $M$ classical sequences $x^{n}$ according to the IID\ distribution $p_{X^{n}}\left( x^{n}\right)$. After selecting them, they label them with indices as $\left\{ x^{n}\left( m\right) \right\} _{m\in\left[ M\right] }$. This leads to the following quantum codewords:

$\rho_{x^{n}\left( m\right) }=\rho_{x_{1}\left( m\right) }\otimes \cdots\otimes\rho_{x_{n}\left( m\right) }.$

The quantum codebook is then $\left\{ \rho_{x^{n}\left( m\right) }\right\}$. The average state of the codebook is then

$\mathbb{E}_{X^{n}}\left\{ \rho_{X^{n}}\right\} =\sum_{x^{n}}p_{X^{n}}\left( x^{n}\right) \rho_{x^{n}}=\rho^{\otimes n},$

where $\rho=\sum_{x}p_{X}\left( x\right) \rho_{x}$.

POVM Construction . Sens' bound from the above lemma suggests a method for Bob to decode a state that Alice transmits. Bob should first ask "Is the received state in the average typical subspace?" He can do this operationally by performing a typical subspace measurement corresponding to $\left\{ \Pi_{\rho,\delta} ^{n},I-\Pi_{\rho,\delta}^{n}\right\}$. Next, he asks in sequential order, "Is the received codeword in the $m^{\text{th}}$ conditionally typical subspace?" This is in some sense equivalent to the question, "Is the received codeword the $m^{\text{th}}$ transmitted codeword?" He can ask these questions operationally by performing the measurements corresponding to the conditionally typical projectors $\left\{ \Pi_{\rho_{x^{n}\left( m\right) },\delta},I-\Pi_{\rho_{x^{n}\left( m\right) },\delta}\right\}$.

Why should this sequential decoding scheme work well? The reason is that the transmitted codeword lies in the typical subspace on average:

$\mathbb{E}_{X^{n}}\left\{ \text{Tr}\left\{ \Pi_{\rho,\delta}\ \rho_{X^{n} }\right\} \right\} =\text{Tr}\left\{ \Pi_{\rho,\delta}\ \mathbb{E} _{X^{n}}\left\{ \rho_{X^{n}}\right\} \right\}$
$=\text{Tr}\left\{ \Pi_{\rho,\delta}\ \rho^{\otimes n}\right\}$
$\geq1-\epsilon,$

where the inequality follows from (\ref{eq:1st-typ-prop}). Also, the projectors $\Pi_{\rho_{x^{n}\left( m\right) },\delta}$ are "good detectors" for the states $\rho_{x^{n}\left( m\right) }$ (on average) because the following condition holds from conditional quantum typicality:

$\mathbb{E}_{X^{n}}\left\{ \text{Tr}\left\{ \Pi_{\rho_{X^{n}},\delta} \ \rho_{X^{n}}\right\} \right\} \geq1-\epsilon.$

Error Analysis . The probability of detecting the $m^{\text{th}}$ codeword correctly under our sequential decoding scheme is equal to

$\text{Tr}\left\{ \Pi_{\rho_{X^{n}\left( m\right) },\delta}\hat{\Pi} _{\rho_{X^{n}\left( m-1\right) },\delta}\cdots\hat{\Pi}_{\rho_{X^{n}\left( 1\right) },\delta}\ \Pi_{\rho,\delta}^{n}\ \rho_{x^{n}\left( m\right) }\ \Pi_{\rho,\delta}^{n}\ \hat{\Pi}_{\rho_{X^{n}\left( 1\right) },\delta }\cdots\hat{\Pi}_{\rho_{X^{n}\left( m-1\right) },\delta}\Pi_{\rho _{X^{n}\left( m\right) },\delta}\right\} ,$

where we make the abbreviation $\hat{\Pi}\equiv I-\Pi$. (Observe that we project into the average typical subspace just once.) Thus, the probability of an incorrect detection for the $m^{\text{th}}$ codeword is given by

$1-\text{Tr}\left\{ \Pi_{\rho_{X^{n}\left( m\right) },\delta}\hat{\Pi} _{\rho_{X^{n}\left( m-1\right) },\delta}\cdots\hat{\Pi}_{\rho_{X^{n}\left( 1\right) },\delta}\ \Pi_{\rho,\delta}^{n}\ \rho_{x^{n}\left( m\right) }\ \Pi_{\rho,\delta}^{n}\ \hat{\Pi}_{\rho_{X^{n}\left( 1\right) },\delta }\cdots\hat{\Pi}_{\rho_{X^{n}\left( m-1\right) },\delta}\Pi_{\rho _{X^{n}\left( m\right) },\delta}\right\} ,$

and the average error probability of this scheme is equal to

$1-\frac{1}{M}\sum_{m}\text{Tr}\left\{ \Pi_{\rho_{X^{n}\left( m\right) },\delta}\hat{\Pi}_{\rho_{X^{n}\left( m-1\right) },\delta}\cdots\hat{\Pi }_{\rho_{X^{n}\left( 1\right) },\delta}\ \Pi_{\rho,\delta}^{n}\ \rho _{x^{n}\left( m\right) }\ \Pi_{\rho,\delta}^{n}\ \hat{\Pi}_{\rho _{X^{n}\left( 1\right) },\delta}\cdots\hat{\Pi}_{\rho_{X^{n}\left( m-1\right) },\delta}\Pi_{\rho_{X^{n}\left( m\right) },\delta}\right\} .$

Instead of analyzing the average error probability, we analyze the expectation of the average error probability, where the expectation is with respect to the random choice of code:

$1-\mathbb{E}_{X^{n}}\left\{ \frac{1}{M}\sum_{m}\text{Tr}\left\{ \Pi _{\rho_{X^{n}\left( m\right) },\delta}\hat{\Pi}_{\rho_{X^{n}\left( m-1\right) },\delta}\cdots\hat{\Pi}_{\rho_{X^{n}\left( 1\right) },\delta }\ \Pi_{\rho,\delta}^{n}\ \rho_{X^{n}\left( m\right) }\ \Pi_{\rho,\delta }^{n}\ \hat{\Pi}_{\rho_{X^{n}\left( 1\right) },\delta}\cdots\hat{\Pi} _{\rho_{X^{n}\left( m-1\right) },\delta}\Pi_{\rho_{X^{n}\left( m\right) },\delta}\right\} \right\} .$

Our first step is to apply Sen's bound to the above quantity. But before doing so, we should rewrite the above expression just slightly, by observing that

$1 =\mathbb{E}_{X^{n}}\left\{ \frac{1}{M}\sum_{m}\text{Tr}\left\{ \rho_{X^{n}\left( m\right) }\right\} \right\}$
$=\mathbb{E}_{X^{n}}\left\{ \frac{1}{M}\sum_{m}\text{Tr}\left\{ \Pi _{\rho,\delta}^{n}\rho_{X^{n}\left( m\right) }\right\} +\text{Tr}\left\{ \hat{\Pi}_{\rho,\delta}^{n}\rho_{X^{n}\left( m\right) }\right\} \right\}$
$=\mathbb{E}_{X^{n}}\left\{ \frac{1}{M}\sum_{m}\text{Tr}\left\{ \Pi _{\rho,\delta}^{n}\rho_{X^{n}\left( m\right) }\Pi_{\rho,\delta}^{n}\right\} \right\} +\frac{1}{M}\sum_{m}\text{Tr}\left\{ \hat{\Pi}_{\rho,\delta} ^{n}\mathbb{E}_{X^{n}}\left\{ \rho_{X^{n}\left( m\right) }\right\} \right\}$
$=\mathbb{E}_{X^{n}}\left\{ \frac{1}{M}\sum_{m}\text{Tr}\left\{ \Pi _{\rho,\delta}^{n}\rho_{X^{n}\left( m\right) }\Pi_{\rho,\delta}^{n}\right\} \right\} +\text{Tr}\left\{ \hat{\Pi}_{\rho,\delta}^{n}\rho^{\otimes n}\right\}$
$\leq\mathbb{E}_{X^{n}}\left\{ \frac{1}{M}\sum_{m}\text{Tr}\left\{ \Pi_{\rho,\delta}^{n}\rho_{X^{n}\left( m\right) }\Pi_{\rho,\delta} ^{n}\right\} \right\} +\epsilon$

Substituting into (\ref{eq:error-term}) (and forgetting about the small $\epsilon$ term for now) gives an upper bound of

$\mathbb{E}_{X^{n}}\left\{ \frac{1}{M}\sum_{m}\text{Tr}\left\{ \Pi _{\rho,\delta}^{n}\rho_{X^{n}\left( m\right) }\Pi_{\rho,\delta}^{n}\right\} \right\}$
$-\mathbb{E}_{X^{n}}\left\{ \frac{1}{M}\sum_{m}\text{Tr}\left\{ \Pi _{\rho_{X^{n}\left( m\right) },\delta}\hat{\Pi}_{\rho_{X^{n}\left( m-1\right) },\delta}\cdots\hat{\Pi}_{\rho_{X^{n}\left( 1\right) },\delta }\ \Pi_{\rho,\delta}^{n}\ \rho_{X^{n}\left( m\right) }\ \Pi_{\rho,\delta }^{n}\ \hat{\Pi}_{\rho_{X^{n}\left( 1\right) },\delta}\cdots\hat{\Pi} _{\rho_{X^{n}\left( m-1\right) },\delta}\Pi_{\rho_{X^{n}\left( m\right) },\delta}\right\} \right\} .$

We then apply Sen's bound to this expression with $\sigma=\Pi_{\rho,\delta }^{n}\rho_{X^{n}\left( m\right) }\Pi_{\rho,\delta}^{n}$ and the sequential projectors as $\Pi_{\rho_{X^{n}\left( m\right) },\delta}$, $\hat{\Pi} _{\rho_{X^{n}\left( m-1\right) },\delta}$, ..., $\hat{\Pi}_{\rho _{X^{n}\left( 1\right) },\delta}$. This gives the upper bound $\mathbb{E}_{X^{n}}\left\{ \frac{1}{M}\sum_{m}2\left[ \text{Tr}\left\{ \left( I-\Pi_{\rho_{X^{n}\left( m\right) },\delta}\right) \Pi_{\rho ,\delta}^{n}\rho_{X^{n}\left( m\right) }\Pi_{\rho,\delta}^{n}\right\} +\sum_{i=1}^{m-1}\text{Tr}\left\{ \Pi_{\rho_{X^{n}\left( i\right) },\delta }\Pi_{\rho,\delta}^{n}\rho_{X^{n}\left( m\right) }\Pi_{\rho,\delta} ^{n}\right\} \right] ^{1/2}\right\} .$ Due to concavity of the square root, we can bound this expression from above by

$2\left[ \mathbb{E}_{X^{n}}\left\{ \frac{1}{M}\sum_{m}\text{Tr}\left\{ \left( I-\Pi_{\rho_{X^{n}\left( m\right) },\delta}\right) \Pi_{\rho ,\delta}^{n}\rho_{X^{n}\left( m\right) }\Pi_{\rho,\delta}^{n}\right\} +\sum_{i=1}^{m-1}\text{Tr}\left\{ \Pi_{\rho_{X^{n}\left( i\right) },\delta }\Pi_{\rho,\delta}^{n}\rho_{X^{n}\left( m\right) }\Pi_{\rho,\delta} ^{n}\right\} \right\} \right] ^{1/2}$
$\leq2\left[ \mathbb{E}_{X^{n}}\left\{ \frac{1}{M}\sum_{m}\text{Tr}\left\{ \left( I-\Pi_{\rho_{X^{n}\left( m\right) },\delta}\right) \Pi_{\rho ,\delta}^{n}\rho_{X^{n}\left( m\right) }\Pi_{\rho,\delta}^{n}\right\} +\sum_{i\neq m}\text{Tr}\left\{ \Pi_{\rho_{X^{n}\left( i\right) },\delta }\Pi_{\rho,\delta}^{n}\rho_{X^{n}\left( m\right) }\Pi_{\rho,\delta} ^{n}\right\} \right\} \right] ^{1/2},$

where the second bound follows by summing over all of the codewords not equal to the $m^{\text{th}}$ codeword (this sum can only be larger).

We now focus exclusively on showing that the term inside the square root can be made small. Consider the first term:

$\mathbb{E}_{X^{n}}\left\{ \frac{1}{M}\sum_{m}\text{Tr}\left\{ \left( I-\Pi_{\rho_{X^{n}\left( m\right) },\delta}\right) \Pi_{\rho,\delta} ^{n}\rho_{X^{n}\left( m\right) }\Pi_{\rho,\delta}^{n}\right\} \right\}$
$\leq\mathbb{E}_{X^{n}}\left\{ \frac{1}{M}\sum_{m}\text{Tr}\left\{ \left( I-\Pi_{\rho_{X^{n}\left( m\right) },\delta}\right) \rho_{X^{n}\left( m\right) }\right\} +\left\Vert \rho_{X^{n}\left( m\right) }-\Pi _{\rho,\delta}^{n}\rho_{X^{n}\left( m\right) }\Pi_{\rho,\delta} ^{n}\right\Vert _{1}\right\}$
$\leq\epsilon+2\sqrt{\epsilon}.$

where the first inequality follows from (\ref{eq:trace-inequality}) and the second inequality follows from the Gentle Operator Lemma and the properties of unconditional and conditional typicality. Consider now the second term and the following chain of inequalities:

$\sum_{i\neq m}\mathbb{E}_{X^{n}}\left\{ \text{Tr}\left\{ \Pi_{\rho _{X^{n}\left( i\right) },\delta}\ \Pi_{\rho,\delta}^{n}\ \rho_{X^{n}\left( m\right) }\ \Pi_{\rho,\delta}^{n}\right\} \right\}$
$=\sum_{i\neq m}\text{Tr}\left\{ \mathbb{E}_{X^{n}}\left\{ \Pi _{\rho_{X^{n}\left( i\right) },\delta}\right\} \ \Pi_{\rho,\delta} ^{n}\ \mathbb{E}_{X^{n}}\left\{ \rho_{X^{n}\left( m\right) }\right\} \ \Pi_{\rho,\delta}^{n}\right\}$
$=\sum_{i\neq m}\text{Tr}\left\{ \mathbb{E}_{X^{n}}\left\{ \Pi _{\rho_{X^{n}\left( i\right) },\delta}\right\} \ \Pi_{\rho,\delta} ^{n}\ \rho^{\otimes n}\ \Pi_{\rho,\delta}^{n}\right\}$
$\leq\sum_{i\neq m}2^{-n\left[ H\left( B\right) -\delta\right] }\ \text{Tr}\left\{ \mathbb{E}_{X^{n}}\left\{ \Pi_{\rho_{X^{n}\left( i\right) },\delta}\right\} \ \Pi_{\rho,\delta}^{n}\right\}$

The first equality follows because the codewords $X^{n}\left( m\right)$ and $X^{n}\left( i\right)$ are independent since they are different. The second equality follows from (\ref{eq:avg-state}). The first inequality follows from (\ref{eq:3rd-typ-prop}). Continuing, we have

$\leq\sum_{i\neq m}2^{-n\left[ H\left( B\right) -\delta\right] }\ \mathbb{E}_{X^{n}}\left\{ \text{Tr}\left\{ \Pi_{\rho_{X^{n}\left( i\right) },\delta}\right\} \right\}$
$\leq\sum_{i\neq m}2^{-n\left[ H\left( B\right) -\delta\right] }\ 2^{n\left[ H\left( B|X\right) +\delta\right] }$
$=\sum_{i\neq m}2^{-n\left[ I\left( X;B\right) -2\delta\right] }$
$\leq M\ 2^{-n\left[ I\left( X;B\right) -2\delta\right] }.$

The first inequality follows from $\Pi_{\rho,\delta}^{n}\leq I$ and exchanging the trace with the expectation. The second inequality follows from (\ref{eq:2nd-cond-typ}). The next two are straightforward.

Putting everything together, we get our final bound on the expectation of the average error probability:

$1-\mathbb{E}_{X^{n}}\left\{ \frac{1}{M}\sum_{m}\text{Tr}\left\{ \Pi _{\rho_{X^{n}\left( m\right) },\delta}\hat{\Pi}_{\rho_{X^{n}\left( m-1\right) },\delta}\cdots\hat{\Pi}_{\rho_{X^{n}\left( 1\right) },\delta }\ \Pi_{\rho,\delta}^{n}\ \rho_{X^{n}\left( m\right) }\ \Pi_{\rho,\delta }^{n}\ \hat{\Pi}_{\rho_{X^{n}\left( 1\right) },\delta}\cdots\hat{\Pi} _{\rho_{X^{n}\left( m-1\right) },\delta}\Pi_{\rho_{X^{n}\left( m\right) },\delta}\right\} \right\}$
$\leq\epsilon+2\left[ \left( \epsilon+2\sqrt{\epsilon}\right) +M\ 2^{-n\left[ I\left( X;B\right) -2\delta\right] }\right] ^{1/2}.$

Thus, as long as we choose $M=2^{n\left[ I\left( X;B\right) -3\delta \right] }$, there exists a code with vanishing error probability.