# 6b/8b encoding

In telecommunications, 6b/8b is a line code that expands 6-bit codes to 8-bit symbols for the purposes of maintaining DC-balance in a communications system.

Each 8-bit output symbol contains 4 zero bits and 4 one bits, so the code can, like a parity bit, detect all single-bit errors.

The number of binomial coefficient 8-bit patterns with 4 bits set is $\tbinom 84$ = 70. Further excluding the patterns 11110000 and 00001111, this allows 68 coded patterns: 64 data codes, plus 4 additional control codes.

## Coding rules

The 64 possible 6-bit input codes can be classified according to their disparity, the number of 1 bits minus the number of 0 bits:

Ones Zeros Disparity Number
0 6 −6  1
1 5 −4  6
2 4 −2 15
3 3  0 20
4 2 +2 15
5 1 +4  6
6 0 +6  1

The 6-bit input codes are mapped to 8-bit output symbols as follows:

• The 20 6-bit codes with disparity 0 are prefixed with 10
Example: 000111 → 10000111
Example: 101010 → 10101010
• The 14 6-bit codes with disparity +2, other than 001111, are prefixed with 00
Example: 010111 → 00010111
• The 14 6-bit codes with disparity −2, other than 110000, are prefixed with 11
Example: 101000 → 11101000
• The remaining 20 codes: 12 with disparity ±4, 2 with disparity ±6, 001111, 110000, and the 4 control codes, are assigned to codes beginning with 01 as follows:
Type Input Output Type Input Output Complement −6 000000 01011001 +6 111111 01100110 01_xx__x −4 000001 01110001 +4 111110 01001110 01xx____ 000010 01110010 111101 01001101 000100 01100101 111011 01011010 01x____x 001000 01101001 110111 01010110 010000 01010011 101111 01101100 01_____xx 100000 01100011 011111 01011100 −2 110000 01110100 +2 001111 01001011 01____x__ Control K 000111 01000111 Control K 111000 01111000 K 010101 01010101 K 101010 01101010

Because we excluded the patterns 11110000 and 00001111,

Note that no data symbol contains more than four consecutive matching bits, or begins or ends with more than three identical. Thus, the longest run of identical bits that will be produced is 6. (I.e. this is a (0,5) RLL code, with a worst-case running disparity of +3 to -3.)

Any occurrence of 6 consecutive identical bits constitutes a comma sequence or sync mark or syncword; it identifies the symbol boundaries precisely. Those 6 bits straddle the inter-symbol boundary with exactly 3 of those identical bits at the end of one symbol, and 3 of those identical bits at the start of the following next symbol.