# Binary Reed–Solomon encoding

Binary Reed–Solomon coding (BRS), which belongs to a RS code, is a way of encoding that can fix node data loss in a distributed storage environment. It has maximum distance separable (MDS) encoding properties. Its encoding and decoding rate outperforms conventional RS coding and optimum CRS coding.

BRS coding

## Background

RS coding is a fault-tolerant encoding method for a distributed storage environment. Suppose we wish to distribute data across k individual devices for improved storage capacity or bandwidth, for example in a hardware RAID setup. Such a configuration risks significant data loss in the event of device failure. The Reed-Solomon encoding produces a storage coding system which robust to the simultaneous failure of any subset of m nodes. To do this, we adding m additional nodes to the system, for a total of n = k + m storage nodes.

Traditional RS encoding method uses the Vandermonde matrix as a coding matrix and its inverse as the decoding matrix. Traditional RS encoding and decoding operations are all carried out on a large finite domain.

Because BRS encoding and decoding employ only shift and XOR operations, they are much faster than traditional RS coding. The algorithm of BRS coding is proposed by the advanced network technology laboratory of Peking University, and it also released the open source implementation of BRS coding. In the actual environment test, the encoding and decoding speed of BRS is faster than that of CRS. In the design and implementation of distributed storage system, using BRS coding can make the system have the characteristics of fault tolerant regeneration.

## Principle

### BRS encoding principle

The structure of traditional Reed–Solomon codes is based on finite fields, and the BRS code is based on the shift and XOR operation. BRS encoding is based on the Vandermonde matrix, and its specific encoding steps are as follows:

1、Equally divides the original data blocks into k blocks, and each block of data has L-bit data, recorded as

${\displaystyle S=(s_{0},s_{1},...,s_{k-1})}$

where ${\displaystyle s_{i}=s_{i,0}s_{i,1}...s_{i,L-1}}$ , ${\displaystyle i=0,1,2,...,k-1}$ .

2、Builds the calibration data block ${\displaystyle M}$${\displaystyle M}$ has a total of ${\displaystyle n-k}$ blocks:

${\displaystyle M=(m_{0},m_{1},...,m_{n-k-1})}$

where ${\displaystyle m_{i}=\sum _{j=0}^{k-1}s_{j}(r_{j}^{i})}$ , ${\displaystyle i=0,1,...,n-k-1}$ .

The addition here are all XOR operation，where ${\displaystyle r_{j}^{i}}$ represents the number of bits of “0” added to the front of the original data block ${\displaystyle s_{j}}$.Thereby forming a parity data block ${\displaystyle m_{i}}$ . ${\displaystyle r_{j}^{i}}$is given by the following way:

${\displaystyle (r_{0}^{a},r_{1}^{a},...,r_{k-1}^{a})=(0,a,2a,...(k-1)a)}$

where ${\displaystyle a=0,1,...n-k-1}$ .

3、Each node stores data, nodes ${\displaystyle N_{i}(i=0,1,...,n-1)}$store the data as ${\displaystyle s_{0},s_{1},...,s_{k-1},m_{0},m_{1},...,m_{n-k-1}}$ .

### BRS encoding example

If now ${\displaystyle n=6,k=3}$ , there ${\displaystyle ID_{0}=(0,0,0)}$${\displaystyle ID_{0}=(0,1,2)}$${\displaystyle ID_{0}=(0,2,4)}$. The original data block are ${\displaystyle s_{i}=s_{0},s_{1},...,s_{L-1}}$ , where ${\displaystyle i=0,1,...,k-1}$ , The calibration data for each block are ${\displaystyle m_{i}=m_{i,0}m_{i,1}...mx_{i,L+i\times (k-1)-1}}$ ，where ${\displaystyle i=0,1,...,k-1}$ .

Calculation of calibration data blocks is as follows, the addition operation represents a bit XOR operation:

${\displaystyle m_{0}=s_{0}(0)\oplus s_{1}(0)\oplus s_{2}(0)}$, so ${\displaystyle m_{0}=(m_{0,0}m_{0,1}...m_{0,5})}$

${\displaystyle m_{1}=s_{0}(0)\oplus s_{1}(1)\oplus s_{2}(2)}$, so ${\displaystyle m_{1}=(m_{1,0}m_{1,1}...m_{1,7})}$

${\displaystyle m_{2}=s_{0}(0)\oplus s_{1}(2)\oplus s_{2}(4)}$, so ${\displaystyle m_{2}=(m_{2,0}m_{2,1}...m_{2,9})}$

### BRS decoding principle

In the structure of BRS code, we divide the original data blocks into ${\displaystyle k}$ blocks. They are ${\displaystyle S=(s_{0},s_{1},...,s_{k-1})}$. And encoding has been ${\displaystyle n}$ block calibration data blocks, there are ${\displaystyle M=(m_{0},m_{1},...,m_{n-k-1})}$.

During the decoding process, there is a necessary condition: The number of undamaged calibration data blocks have to be greater than or equal to the number of the original data blocks that missing, if not, it cannot be repaired.

The following is a decoding process analysis:

Might as well make ${\displaystyle n=6}$ , ${\displaystyle k=3}$ . Then

${\displaystyle m_{0}=s_{0}+s_{1}+s_{2}}$

${\displaystyle m_{1}=s_{0}+xs_{1}+x^{2}s_{2}}$

${\displaystyle m_{1}=s_{0}+x^{2}s_{1}+x^{4}s_{2}}$

Supposed ${\displaystyle s_{0}}$ is intact, ${\displaystyle s_{1},s_{2}}$ miss, choose ${\displaystyle m_{1}}$, ${\displaystyle m_{2}}$ to repair, make

${\displaystyle m_{1}^{*}=m_{1}+s_{0}}$

${\displaystyle m_{2}^{*}=m_{2}+s_{0}}$

Because ${\displaystyle m_{1}}$${\displaystyle m_{2}}$${\displaystyle s_{0}}$ are known, ${\displaystyle m_{1}^{*}}$${\displaystyle m_{2}^{*}}$ are known. So that

${\displaystyle s_{1,i-2}=m_{2,i}^{*}+s_{2,i-4}}$

${\displaystyle s_{2,i-2}=m_{1,i}^{*}+s_{1,i-1}}$

According to the above iterative formula, each cycle can figure out two bit values (${\displaystyle s_{1},s_{2}}$ can get a bit). Each of the original data block length (${\displaystyle L}$ bit), so after repeating ${\displaystyle L}$ times, We can work out all the unknown bit in the original data block. by parity of reasoning, we can completed the data decoding.

## Performance

Some experiments shows that, considering the encoding rate, BRS encoding rate is about 6-fold as much as RS encoding rate and 1.5-fold as much as CRS encoding rate in the single core processor, which meets the conditions that compare to RS encoding, its encoding speed upgrades no less than 200%.

Under the same conditions, for the different number of deletions, BRS decoding rate is about 4-fold as much as RS encoding rate, about 1.3-fold as much as CRS encoding rate, which meets the conditions that compare to RS encoding, the decoding speed promotes 100%.

## Applications

In the current situation, the application of distributed systems is commonly used. Using erasure code to store data in the bottom of the distributed storage system can increase the fault tolerance of the system. At the same time, compared to the traditional replica strategy, erasure code technology can exponentially improve the reliability of the system for the same redundancy.

BRS encoding can be applied to distributed storage systems, for example, BRS encoding can be used as the underlying data encoding while using HDFS. Due to the advantages of performance and similarity of the encoding method, BRS encoding can be used to replace the CRS encoding in distributed systems.

## Usage

There are open source codes to implement BRS encoding written in C and available on Github. In the design and implementation of a distributed storage system, we can use BRS encoding to store data and to achieve the system's own fault tolerance.

## References

• 1. H. Hou, K. W. Shum, M. Chen and H. Li, BASIC Regeneration Code: Binary Addition and Shift for Exact Repair, IEEE ISIT 2013.
• 2. Jun Chen, Hui Li, Hanxu Hou, Bing Zhu, Tai Zhou, Lijia Lu, Yumeng Zhang, A new Zigzag MDS code with optimal encoding and efficient decoding[C]//Big Data (Big Data), 2014 IEEE International Conference on. IEEE, 2014.