Linear code

In mathematics and information theory, a linear code is an important type of block code used in error correction and detection schemes. Linear codes allow for more efficient encoding and decoding algorithms than other codes (cf. syndrome decoding).

Linear codes are applied in methods of transmitting symbols (e.g., bits) on a communications channel so that, if errors occur in the communication, some errors can be detected by the recipient of a message block. The "codes" in the linear code are blocks of symbols which are encoded using more symbols than the original value to be sent. A linear code of length n transmits blocks containing n symbols. For example, the "(7,4)" Hamming code is a binary linear code which represents 4-bit values each using 7-bit values. In this way, the recipient can detect errors as severe as 2 bits per block.^[1] As there are sixteen (16) distinct 4-bit values expressed in binary, the size of the (7,4) Hamming code is sixteen.

Formal definition

A linear code of length n and rank k is a linear subspace C with dimension k of the vector space $\mathbb {F} _{q}^{n}$ where $\mathbb {F} _{q}$ is the finite field with q elements. Such a code with parameter q is called a q-ary code (e.g., when q = 5, the code is a 5-ary code). If q = 2 or q = 3, the code is described as a binary code, or a ternary code respectively.

Remark: We want to give $\mathbb {F} _{q}^{n}$ the usual standard basis because each coordinate represents a "bit" which is transmitted across a "noisy channel" with some small probability of transmission error (please see the binary symmetric channel (BSC) for more details). If some other basis is used then this BSC model cannot be used and the Hamming metric (defined next) does not measure the number of errors in transmission, as we want it to.

Properties

As a linear subspace of $\mathbb {F} _{q}^{n}$ , the entire code C (which may be very large) may be represented as the span of a minimal set of codewords (known as a basis in linear algebra). These basis codewords are often collated in the rows of a matrix G known as a generating matrix for the code C. When G has the block matrix form G = (I_k | A), where I_k denotes the $k\times k$ identity matrix and A is some $k\times (n-k)$ matrix, then we say G is in standard form.

A matrix $H:\mathbb {F} _{q}^{n}\to \mathbb {F} _{q}^{n-k}$ whose kernel is C is called a check matrix of C (or sometimes a parity check matrix). If C is a code with a generating matrix G in standard form, G = (I_k | A), then H = (-A^t | I_n-k) is a check matrix for C.

The subspace definition also guarantees that the minimum Hamming distance d between any given codeword c₀ and the other codewords c ≠ c₀ is constant. Since the difference c − c₀ of two codewords in C is also a codeword (i.e., an element of the subspace C), and d(c, c₀) = d(c − c₀, 0), we see that

\min _{c\in C,\ c\neq c_{0}}d(c,c_{0})=\min _{c\in C,c\neq c_{0}}d(c-c_{0},0)=\min _{c\in C,c\neq 0}d(c,0)=d.

Nearest Neighbor Algorithm

The parameter d is closely related to the error correcting ability of the code. The following construction/algorithm illustrates this (called the nearest neighbor decoding algorithm):

Input: A "received vector" v in $\mathbb {F} _{q}^{n}$ .

Output: A codeword c in C closest to v.

   * Enumerate the elements of the ball of (Hamming) radius t about v, denoted B_t(v), where v is the received word. Set c="fail.
   * For each w in B_t(v), check if w in C. If so, put c = w and break to the next step; otherwise, discard w and move to the next element.
   * Return c.

Note: "fail is not returned unless t> (d-1)/2.. We say that a linear C is t-error correcting if there is at most one codeword in B_t(w), for each w in $\mathbb {F} _{q}^{n}$ .

Popular notation

Codes in general are often denoted by the letter C. A linear code of length n, of rank k (i.e., having k code words in its basis and k rows in its generating matrix), and of minimum Hamming weight d is referred to as an [n, k, d] code.

Remark. This [n, k, d] notation should not be confused with the (n, M, d) notation used to denote a non-linear code of length n, size M (i.e., having M code words), and minimum Hamming distance d.

Singleton bound

Lemma (Singleton bound): Every linear [n,k,d] code C satisfies $k+d\leq n+1$ .

A code C whose parameters satisfy k+d=n+1 is called maximum distance separable or MDS. Such codes, when they exist, are in some sense best possible.

If C₁ and C₂ are two codes of length n and if there is a permutation p in the symmetric group S_n for which (c₁,...,c_n) in C₁ if and only if (c_p(1),...,c_p(n)) in C₂, then we say C₁ and C₂ are permutation equivalent. In more generality, if there is an $n\times n$ monomial matrix $M:\mathbf {F} _{q}^{n}\to \mathbf {F} _{q}^{n}$ which sends C₁ isomorphically to C₂ then we say C₁ and C₂ are equivalent.

Lemma: Any linear code is permutation equivalent to a code which is in standard form.

Examples

Some examples of linear codes include:

Repetition codes
Parity codes
Cyclic codes
Hamming codes
Golay code, both the binary and ternary versions
Polynomial codes, of which BCH codes are an example
Reed-Solomon codes
Reed-Muller codes
Goppa codes

Uses

Binary linear codes (refer to formal definition above) are ubiquitous in electronic devices and digital storage media. For example the Reed-Solomon code is used to store digital data on a compact disc.

External links

q-ary Code Generator Program
Compute parameters of linear codes - an on-line interface for generating and computing parameters (e.g. minimum distance, covering radius) of linear error-correcting codes.
Code Tables: Bounds on the parameters of various types of codes, IAKS, Fakultät für Informatik, Universität Karlsruhe (TH)]. Online, up to date table of the optimal binary codes, includes non-binary codes.

Footnotes

^ Thomas M. Cover and Joy A. Thomas (1991). Elements of Information Theory. John Wiley & Sons, Inc. pp. 210–211. ISBN 0-471-06259-5. {{cite book}}: Check |isbn= value: checksum (help)

[Cover_and_Thomas-1] Thomas M. Cover and Joy A. Thomas (1991). Elements of Information Theory. John Wiley & Sons, Inc. pp. 210–211. ISBN 0-471-06259-5. {{cite book}}: Check |isbn= value: checksum (help)

[1]