# User:Ajraymond/PolarCodes

In information theory, a polar code is a linear block error correcting code developed by Erdal Arıkan [1]. It is the first code to provably achieve the channel capacity for symmetric binary-input, discrete, memoryless channels (B-DMC).

## History

Capacity-approaching codes such as LDPC codes and Turbo codes have garnered a large body of research due to their excellent error-correction performance, throughput, and latency characteristics.

Comparatively, polar codes are still in their infancy, having been introcuced in 2008. They differ from capacity-approaching schemes by provably achieving the capacity of several communication channels for infinite code lengths. Although initialy only defined for symmetric B-DMC channels, polar codes were later expanded to all discrete memoryless channels[2].

## Description

Linear block codes work by encoding an information vector ${\displaystyle \mathbf {u} }$ into a codeword ${\displaystyle \mathbf {x} }$ by multiplying it with a generator matrix ${\displaystyle \mathbf {F} }$, using ${\displaystyle \mathbf {x} =\mathbf {u} \cdot \mathbf {F} }$. This resulting vector is then transmitted over a communication channel. A decoder receives a noisy version of ${\displaystyle \mathbf {x} }$ called ${\displaystyle \mathbf {y} }$ from the output of the channel. This decoder then estimates the original information vector ${\displaystyle \mathbf {u} }$, yielding ${\displaystyle \mathbf {\hat {u}} }$.

An ${\displaystyle (N,k)}$ polar code is a code whose codewords contain ${\displaystyle k}$ bits of information represented in a vector of length ${\displaystyle N}$. Each of those ${\displaystyle N}$ bits encodes the result of a parity function of matrix ${\displaystyle \mathbf {F} }$ linking one or more bits of the information vector ${\displaystyle \mathbf {u} }$. Alternatively, polar codes can also be represented using a systematic approach[3], where ${\displaystyle k}$ bits of ${\displaystyle \mathbf {x} }$ are used to store information vector ${\displaystyle \mathbf {u} }$ while the remaining ${\displaystyle N-k}$ bits contain parity information.

Polar codes are based on the observation that specific bits in the input data tend to be better protected from noise than others. Arıkan observed that as the code length ${\displaystyle N}$ grows larger, individual bits in the input word tend to become either very well or very poorly protected. Polar codes are constructed by identifying those well-protected bit indices in the information vector ${\displaystyle \mathbf {u} }$ and using them to transmit information. Those idices are named information set while the remaining positions form the frozen set, which is usually set to a predetermined value known by both the encoder and the decoder.

Polar codes were initially constructed for the binary erasure channel through the use of the erasure probability of the decoded bits. Later research provided approximate construction methods for other communication channels[4].

Although their capacity-achieving characteristic was initially proved under successive cancellation decoding, the belief propagation algorithm was also shown to ???[citation needed].

## Code Construction

Originally, polar codes were constructed using a generator matrix created using the Kronecker power of the base matrix ${\displaystyle \mathbf {F_{2}} ={\begin{bmatrix}1&0\\1&1\end{bmatrix}}}$. This construction method yields polar codes whose lengths are powers of two.

For example, the generator matrix of an ${\displaystyle n=8}$ polar code is:
${\displaystyle \mathbf {F_{8}} =\mathbf {F_{2}^{\otimes {3}}} ={\begin{bmatrix}F_{2}&0_{2}&0_{2}&0_{2}\\F_{2}&F_{2}&0_{2}&0_{2}\\F_{2}&0_{2}&F_{2}&0_{2}\\F_{2}&F_{2}&F_{2}&F_{2}\end{bmatrix}}={\begin{bmatrix}1&0&0&0&0&0&0&0\\1&1&0&0&0&0&0&0\\1&0&1&0&0&0&0&0\\1&1&1&1&0&0&0&0\\1&0&0&0&1&0&0&0\\1&1&0&0&1&1&0&0\\1&0&1&0&1&0&1&0\\1&1&1&1&1&1&1&1\end{bmatrix}}}$

Polar codes can also be illustrated using a graph representation. In this case, the information vector ${\displaystyle \mathbf {u} }$ is presented on the left-hand side of the graph and the resulting decoded codeword ${\displaystyle \mathbf {x} }$ is obtained on the right-hand side. The ${\displaystyle \bigoplus }$ symbols represent XOR operations.

Later research proved that polarization could be obtained by constructing polar codes using any lower triangular base matrix[citation needed].

### Example

Consider a ${\displaystyle (8,4)}$ polar code with frozen indices ${\displaystyle \{u_{0},u_{1},u_{2},u_{4}\}}$ set to ${\displaystyle 0}$. The information bits ${\displaystyle [0\ 1\ 1\ 1]}$ are stored in the information vector ${\displaystyle \mathbf {u} =[0\ 1\ 1\ 0\ 1\ 0\ 0\ 0]}$. Using ${\displaystyle \mathbf {F_{8}} }$ as defined above, the corresponding codeword can be calculated as ${\displaystyle \mathbf {x} =[1\ 1\ 0\ 1\ 1\ 1\ 0\ 0]}$. (TODO: check order)

## Decoding

### Successive Cancellation Decoding

The successive cancellation decoding algorithm is based on the sum-product algorithm. It makes use of the equality and parity constraints introduced by the encoding graph to carry out a soft estimation ${\displaystyle \mathbf {\hat {u}} }$ of the information vector.

Successive cancellation decoding requires a constant number of operations, ${\displaystyle N\cdot \log N}$, in order to decode a received vector. However, the data dependency of the decoding graph severely limit the parallelism which can be exploited in the decoding process; in a fully-parallel implementation, this would still require ${\displaystyle 2N-2}$ time steps. This property means that the throughput achievable by this decoding algorithm is not dependent on the [signal-to-noise ratio] of the underlying channel.

The following figure describes the graph used to decode an ${\displaystyle N=8}$ polar code. A received vector ${\displaystyle \mathbf {y} }$ is presented on the left-hand side of the graph, and the resulting estimated information vector ${\displaystyle \mathbf {\hat {u}} }$ is obtained on the right-hand side. The ${\displaystyle \bigoplus }$ symbols represent the parity check function ${\displaystyle f}$ whereas ${\displaystyle \circ }$ represents the equality constraint function ${\displaystyle g}$.

Let the likelihood ratio ${\displaystyle l_{x}\triangleq P(x=0)/P(x=1)}$ and the log-likelihood ratio ${\displaystyle L_{x}\triangleq \ln(l_{x})}$.

#### Equations

The following equations are used in the successive cancellation decoding of polar codes, provided inputs are expressed as likelihood ratio:

{\displaystyle {\begin{aligned}f(l_{a},l_{b})&={\frac {1+l_{a}\cdot l_{b}}{l_{a}+l_{b}}}\\g(l_{a},l_{b},{\hat {f}})&=l_{b}\cdot l_{a}^{1-2\cdot {\hat {f}}}\end{aligned}}}

Equivalent equations can be derived in the logarithmic domain

{\displaystyle {\begin{aligned}L_{f}(L_{a},L_{b})&=2\cdot \tanh ^{-1}(\tanh(L_{a}/2)\cdot \tanh(L_{b}/2))\\L_{g}(L_{a},L_{b},{\hat {f}})&=L_{a}^{1-2\cdot {\hat {f}}}\cdot L_{b}.\end{aligned}}}

The equation for ${\displaystyle L_{f}}$ can in turn be simplified through the use of the min-sum approximation[5] used in LDPC codes:

${\displaystyle L_{f}(L_{a},L_{b})\approx {\text{sign}}(L_{a})\cdot {\text{sign}}(L_{b})\cdot {\text{min}}(|L_{a}|,|L_{b}|)}$

#### Derivation

Assume a binary phase-shift keying modulation scheme where binary value ${\displaystyle 0}$ is encoded as ${\displaystyle 1}$ and binary value ${\displaystyle 1}$, by ${\displaystyle -1}$. A threshold detector is used in this scheme to determine the binary value associated with a soft information value according to the following rule:

${\displaystyle {\text{Threshold}}(X)={\begin{cases}{\text{binary }}0&{\text{when }}X\geq 0\\{\text{binary }}1&{\text{otherwise}}.\end{cases}}}$

To simplify notation, let ${\displaystyle {\hat {X}}}$ refer to the binary value associated with soft information value ${\displaystyle X}$. Thus,

${\displaystyle {\hat {X}}={\begin{cases}{\text{binary }}0&{\text{when }}P(X\geq 0)\\{\text{binary }}1&{\text{otherwise}}.\end{cases}}}$

The following derivations are based on the following graph, where ${\displaystyle \oplus }$ is a party operator and ${\displaystyle \circ }$, an equality operator. Soft information ${\displaystyle (a,b)}$ is presented on the left-hand side whereas the result of both operators are presented on the right-hand side of the graph.

a-----+-----f
|
b-----o-----g

##### Parity-Check Constraint

Let the likelihood ratio of function ${\displaystyle f}$ be:

${\displaystyle l_{f}(a,b)\triangleq P({\hat {f}}=0)/P({\hat {f}}=1).}$

According to the parity constraint (${\displaystyle \oplus }$), the hard decisions on both inputs ${\displaystyle (a,b)}$ and the output ${\displaystyle f}$ must satisfy ${\displaystyle {\hat {a}}\oplus {\hat {b}}\oplus {\hat {f}}=0}$.

There are two possibilities for ${\displaystyle {\hat {a}}}$ and ${\displaystyle {\hat {b}}}$ to satisfy this constraint for ${\displaystyle {\hat {f}}=0}$. Therefore, the probability that ${\displaystyle P({\hat {f}}=0)}$ is

${\displaystyle P({\hat {f}}=0)=P({\hat {a}}=0)\cdot P({\hat {b}}=0)+P({\hat {a}}=1)\cdot P({\hat {b}}=1).}$

Similarly for ${\displaystyle f=1}$:

${\displaystyle P({\hat {f}}=1)=P({\hat {a}}=0)\cdot P({\hat {b}}=1)+P({\hat {a}}=1)\cdot P({\hat {b}}=0).}$

Returning to the definition of ${\displaystyle l_{f}}$:

{\displaystyle {\begin{aligned}l_{f}(a,b)&={\frac {P({\hat {a}}=0)\cdot P({\hat {b}}=0)+P({\hat {a}}=1)\cdot P({\hat {b}}=1)}{P({\hat {a}}=0)\cdot P({\hat {b}}=1)+P({\hat {a}}=1)\cdot P({\hat {b}}=0)}}\\&={\frac {1+{\frac {P({\hat {a}}=0)}{P({\hat {a}}=1)}}\cdot {\frac {P({\hat {b}}=0)}{P({\hat {b}}=1)}}}{{\frac {P({\hat {a}}=0)}{P({\hat {a}}=1)}}+{\frac {P({\hat {b}}=0)}{P({\hat {b}}=1)}}}}\\&={\frac {1+l_{a}\cdot l_{b}}{l_{a}+l_{b}}}.\end{aligned}}}
##### Equality Constraint

Similarly, equality equation ${\displaystyle g}$ is derived as follows:

${\displaystyle l_{g}(a,b,{\hat {f}})=P({\hat {g}}=0)/P({\hat {g}}=1).}$.

According to the equality constraint (${\displaystyle \circ }$), the hard decisions on both sides of the node must match. Two conditions for ${\displaystyle {\hat {a}}}$ and ${\displaystyle {\hat {b}}}$ satisfy this constraint for ${\displaystyle {\hat {g}}=0}$:

{\displaystyle {\begin{aligned}P({\hat {g}}=0|{\hat {f}}=0)&=P({\hat {a}}=0)\cdot P({\hat {b}}=0)\\P({\hat {g}}=0|{\hat {f}}=1)&=P({\hat {a}}=1)\cdot P({\hat {b}}=0).\end{aligned}}}

Similarly for ${\displaystyle {\hat {g}}=1}$:

{\displaystyle {\begin{aligned}P({\hat {g}}=1|{\hat {f}}=0)&=P({\hat {a}}=1)\cdot P({\hat {b}}=1)\\P({\hat {g}}=1|{\hat {f}}=1)&=P({\hat {a}}=0)\cdot P({\hat {b}}=1).\end{aligned}}}

Returning to the definition of ${\displaystyle l_{g}}$:

{\displaystyle {\begin{aligned}l_{g}(a,b|{\hat {f}}=0)&={\frac {P({\hat {a}}=0)\cdot P({\hat {b}}=0)}{P({\hat {a}}=1)\cdot P({\hat {b}}=1)}}=l_{a}\cdot l_{b}\\l_{g}(a,b|{\hat {f}}=1)&={\frac {P({\hat {a}}=1)\cdot P({\hat {b}}=0)}{P({\hat {a}}=0)\cdot P({\hat {b}}=1)}}={\frac {l_{b}}{l_{a}}}.\end{aligned}}}

Both equations can be combined, yielding:

${\displaystyle l_{g}(a,b,{\hat {f}})=l_{b}\cdot l_{a}^{1-2\cdot {\hat {f}}}.}$

## Relation to Reed-Muller codes

Polar codes can be viewed as a specific case of the Reed-Muller codes with parameters ..., where the frozen bits are not chosen according to their weight, but rather ... [citation needed].