# Decoding methods

In coding theory, decoding is the process of translating received messages into codewords of a given code. There have been many common methods of mapping messages to codewords. These are often used to recover messages sent over a noisy channel, such as a binary symmetric channel.

## Notation

${\displaystyle C\subset \mathbb {F} _{2}^{n}}$ is considered a binary code with the length ${\displaystyle n}$; ${\displaystyle x,y}$ shall be elements of ${\displaystyle \mathbb {F} _{2}^{n}}$; and ${\displaystyle d(x,y)}$ is the distance between those elements.

## Ideal observer decoding

One may be given the message ${\displaystyle x\in \mathbb {F} _{2}^{n}}$, then ideal observer decoding generates the codeword ${\displaystyle y\in C}$. The process results in this solution:

${\displaystyle \mathbb {P} (y{\mbox{ sent}}\mid x{\mbox{ received}})}$

For example, a person can choose the codeword ${\displaystyle y}$ that is most likely to be received as the message ${\displaystyle x}$ after transmission.

### Decoding conventions

Each codeword does not have an expected possibility: there may be more than one codeword with an equal likelihood of mutating into the received message. In such a case, the sender and receiver(s) must agree ahead of time on a decoding convention. Popular conventions include:

1. Request that the codeword be resent – automatic repeat-request.
2. Choose any random codeword from the set of most likely codewords which is nearer to that.
3. If another code follows, mark the ambiguous bits of the codeword as erasures and hope that the outer code disambiguates them

## Maximum likelihood decoding

Given a received codeword ${\displaystyle x\in \mathbb {F} _{2}^{n}}$ maximum likelihood decoding picks a codeword ${\displaystyle y\in C}$ that maximizes

${\displaystyle \mathbb {P} (x{\mbox{ received}}\mid y{\mbox{ sent}})}$,

that is, the codeword ${\displaystyle y}$ that maximizes the probability that ${\displaystyle x}$ was received, given that ${\displaystyle y}$ was sent. If all codewords are equally likely to be sent then this scheme is equivalent to ideal observer decoding. In fact, by Bayes Theorem,

{\displaystyle {\begin{aligned}\mathbb {P} (x{\mbox{ received}}\mid y{\mbox{ sent}})&{}={\frac {\mathbb {P} (x{\mbox{ received}},y{\mbox{ sent}})}{\mathbb {P} (y{\mbox{ sent}})}}\\&{}=\mathbb {P} (y{\mbox{ sent}}\mid x{\mbox{ received}})\cdot {\frac {\mathbb {P} (x{\mbox{ received}})}{\mathbb {P} (y{\mbox{ sent}})}}.\end{aligned}}}

Upon fixing ${\displaystyle \mathbb {P} (x{\mbox{ received}})}$, ${\displaystyle x}$ is restructured and ${\displaystyle \mathbb {P} (y{\mbox{ sent}})}$ is constant as all codewords are equally likely to be sent. Therefore, ${\displaystyle \mathbb {P} (x{\mbox{ received}}\mid y{\mbox{ sent}})}$ is maximised as a function of the variable ${\displaystyle y}$ precisely when ${\displaystyle \mathbb {P} (y{\mbox{ sent}}\mid x{\mbox{ received}})}$ is maximised, and the claim follows.

As with ideal observer decoding, a convention must be agreed to for non-unique decoding.

The maximum likelihood decoding problem can also be modeled as an integer programming problem.[1]

The maximum likelihood decoding algorithm is an instance of the "marginalize a product function" problem which is solved by applying the generalized distributive law.[2]

## Minimum distance decoding

Given a received codeword ${\displaystyle x\in \mathbb {F} _{2}^{n}}$, minimum distance decoding picks a codeword ${\displaystyle y\in C}$ to minimise the Hamming distance:

${\displaystyle d(x,y)=\#\{i:x_{i}\not =y_{i}\}}$

i.e. choose the codeword ${\displaystyle y}$ that is as close as possible to ${\displaystyle x}$.

Note that if the probability of error on a discrete memoryless channel ${\displaystyle p}$ is strictly less than one half, then minimum distance decoding is equivalent to maximum likelihood decoding, since if

${\displaystyle d(x,y)=d,\,}$

then:

{\displaystyle {\begin{aligned}\mathbb {P} (y{\mbox{ received}}\mid x{\mbox{ sent}})&{}=(1-p)^{n-d}\cdot p^{d}\\&{}=(1-p)^{n}\cdot \left({\frac {p}{1-p}}\right)^{d}\\\end{aligned}}}

which (since p is less than one half) is maximised by minimising d.

Minimum distance decoding is also known as nearest neighbour decoding. It can be assisted or automated by using a standard array. Minimum distance decoding is a reasonable decoding method when the following conditions are met:

1. The probability ${\displaystyle p}$ that an error occurs is independent of the position of the symbol.
2. Errors are independent events – an error at one position in the message does not affect other positions.

These assumptions may be reasonable for transmissions over a binary symmetric channel. They may be unreasonable for other media, such as a DVD, where a single scratch on the disk can cause an error in many neighbouring symbols or codewords.

As with other decoding methods, a convention must be agreed to for non-unique decoding.

## Syndrome decoding

Syndrome decoding is a highly efficient method of decoding a linear code over a noisy channel, i.e. one on which errors are made. In essence, syndrome decoding is minimum distance decoding using a reduced lookup table. This is allowed by the linearity of the code.[3]

Suppose that ${\displaystyle C\subset \mathbb {F} _{2}^{n}}$ is a linear code of length ${\displaystyle n}$ and minimum distance ${\displaystyle d}$ with parity-check matrix ${\displaystyle H}$. Then clearly ${\displaystyle C}$ is capable of correcting up to

${\displaystyle t=\left\lfloor {\frac {d-1}{2}}\right\rfloor }$

errors made by the channel (since if no more than ${\displaystyle t}$ errors are made then minimum distance decoding will still correctly decode the incorrectly transmitted codeword).

Now suppose that a codeword ${\displaystyle x\in \mathbb {F} _{2}^{n}}$ is sent over the channel and the error pattern ${\displaystyle e\in \mathbb {F} _{2}^{n}}$ occurs. Then ${\displaystyle z=x+e}$ is received. Ordinary minimum distance decoding would lookup the vector ${\displaystyle z}$ in a table of size ${\displaystyle |C|}$ for the nearest match - i.e. an element (not necessarily unique) ${\displaystyle c\in C}$ with

${\displaystyle d(c,z)\leq d(y,z)}$

for all ${\displaystyle y\in C}$. Syndrome decoding takes advantage of the property of the parity matrix that:

${\displaystyle Hx=0}$

for all ${\displaystyle x\in C}$. The syndrome of the received ${\displaystyle z=x+e}$ is defined to be:

${\displaystyle Hz=H(x+e)=Hx+He=0+He=He}$

To perform ML decoding in a binary symmetric channel, one has to look-up a precomputed table of size ${\displaystyle 2^{n-k}}$, mapping ${\displaystyle He}$ to ${\displaystyle e}$.

Note that this is already of significantly less complexity than that of a standard array decoding.

However, under the assumption that no more than ${\displaystyle t}$ errors were made during transmission, the receiver can look up the value ${\displaystyle He}$ in a further reduced table of size

${\displaystyle {\begin{matrix}\sum _{i=0}^{t}{\binom {n}{i}}\\\end{matrix}}}$

(for a binary code). The table is against pre-computed values of ${\displaystyle He}$ for all possible error patterns ${\displaystyle e\in \mathbb {F} _{2}^{n}}$.

Knowing what ${\displaystyle e}$ is, it is then trivial to decode ${\displaystyle x}$ as:

${\displaystyle x=z-e}$

For binary codes, if both ${\displaystyle k}$ and ${\displaystyle n-k}$ are not too big, and assuming the code generating matrix is in standard form, syndrome decoding can be computed using 2 precomputed lookup tables and 2 XORs only.[4]

Let ${\displaystyle z}$ be the received noisy codeword, i.e. ${\displaystyle z=x\oplus e}$. Using the encoding lookup table of size ${\displaystyle 2^{k}}$, the codeword ${\displaystyle z'}$ that corresponds to the first ${\displaystyle k}$ bits of ${\displaystyle z}$ is found.

The syndrome is then computed as the last ${\displaystyle n-k}$ bits of ${\displaystyle s=z\oplus z'}$ (the first ${\displaystyle k}$ bits of the XOR are zero [since the generating matrix is in standard form] and discarded). Using the syndrome, the error ${\displaystyle e}$ is computed using the syndrome lookup table of size ${\displaystyle 2^{n-k}}$, and the decoding is then computed via ${\displaystyle x=z\oplus e}$ (for the codeword, or the first ${\displaystyle k}$ bits of ${\displaystyle x}$ for the original word).

The number of entries in the two lookup tables is ${\displaystyle 2^{k}+2^{n-k}}$, which is significantly smaller than ${\displaystyle 2^{n}}$ required for standard array decoding that requires only ${\displaystyle 1}$ lookup. Additionally, the precomputed encoding lookup table can be used for the encoding, and is thus often useful to have.

## Information set decoding

This is a family of Las Vegas-probabilistic methods all based on the observation that it is easier to guess enough error-free positions, than it is to guess all the error-positions.

The simplest form is due to Prange: Let ${\displaystyle G}$ be the ${\displaystyle k\times n}$ generator matrix of ${\displaystyle C}$ used for encoding. Select ${\displaystyle k}$ columns of ${\displaystyle G}$ at random, and denote by ${\displaystyle G'}$ the corresponding submatrix of ${\displaystyle G}$. With reasonable probability ${\displaystyle G'}$ will have full rank, which means that if we let ${\displaystyle c'}$ be the sub-vector for the corresponding positions of any codeword ${\displaystyle c=mG}$ of ${\displaystyle C}$ for a message ${\displaystyle m}$, we can recover ${\displaystyle m}$ as ${\displaystyle m=c'G'^{-1}}$. Hence, if we were lucky that these ${\displaystyle k}$ positions of the received word ${\displaystyle y}$ contained no errors, and hence equalled the positions of the sent codeword, then we may decode.

If ${\displaystyle t}$ errors occurred, the probability of such a fortunate selection of columns is given by ${\displaystyle \textstyle {\binom {n-t}{k}}/{\binom {n}{k}}}$.

This method has been improved in various ways, e.g. by Stern[5] and Canteaut and Sendrier.[6]

## Partial response maximum likelihood

Partial response maximum likelihood (PRML) is a method for converting the weak analog signal from the head of a magnetic disk or tape drive into a digital signal.

## Viterbi decoder

A Viterbi decoder uses the Viterbi algorithm for decoding a bitstream that has been encoded using forward error correction based on a convolutional code. The Hamming distance is used as a metric for hard decision Viterbi decoders. The squared Euclidean distance is used as a metric for soft decision decoders.