Consistent Overhead Byte Stuffing

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Consistent Overhead Byte Stuffing (COBS) is an algorithm for encoding data bytes that results in efficient, reliable, unambiguous packet framing regardless of packet content, thus making it easy for receiving applications to recover from malformed packets. Essentially, it changes all zero bytes into bytes that indicate the length of the next block till a zero (including the zero itself in the length).

Byte stuffing is a process that transforms a sequence of data bytes that may contain 'illegal' or 'reserved' values into a potentially longer sequence that contains no occurrences of those values. The extra length of the transformed sequence is typically referred to as the overhead of the algorithm. The COBS algorithm tightly bounds the worst-case overhead, limiting it to no more than one byte in 254. The algorithm is computationally inexpensive and its average overhead is low compared to other unambiguous framing algorithms.[1]

Packet framing and stuffing[edit]

When packet data is sent over any serial medium, a protocol is needed by which to demarcate packet boundaries. This is done by using a special bit-sequence or character value to indicate where the boundaries between packets fall. Data stuffing is the process that transforms the packet data before transmission to eliminate any accidental occurrences of that special framing marker, so that when the receiver detects the marker, it knows, without any ambiguity, that it does indeed indicate a boundary between packets.

COBS takes an input consisting of bytes in the range [0,255] and produces an output consisting of bytes only in the range [1,255]. Having eliminated all zero bytes from the data, a zero byte can now be used unambiguously to mark boundaries between packets. This allows the receiver to synchronize reliably with the beginning of the next packet, even after an error. It also allows new listeners, which might join a broadcast stream at any time, to reliably detect the beginning of the first complete packet in the received byte stream.

With COBS, all packets up to 254 bytes in length are encoded with an overhead of exactly one byte. For packets over 254 bytes in length the overhead is at most one byte for every 254 bytes of packet data. The maximum overhead is therefore roughly 0.4% of the packet size, rounded up to a whole number of bytes. COBS encoding has low overhead (on average 0.23% of the packet size, rounded up to a whole number of bytes) and furthermore, for packets of any given length, the amount of overhead is virtually constant, regardless of the packet contents.

Optional Zero Pair Elimination[edit]

An optional optimization that can reduce overhead for common payloads which contain pairs of zero bytes is to reduce the maximum encodable sequence length, freeing some codes to encode sequences terminated by pairs of zeros. In this case, bytes in the range [1,223] have the same meaning in as in the normal mode, the code 224 is used to encode a sequence of 223 bytes with no zero termination, and the remaining codes [225,255] encode sequences of length [1,30] terminated by a pair of zero bytes. This variation can achieve negative overhead (compression) for some sequences however it does complicate the en/decoding process.

Packet format[edit]

A COBS block consists of a sequence of non-zero bytes of length between 1 and 255. Call the first byte of a block the code byte and the (possibly empty) subsequence of the remaining bytes in the block the payload. The code byte is always the length of the block. If the code byte is 255, the block encodes the payload. If the code byte is not 255, the block encodes the result of appending a zero byte to the payload.

To encode a message, there are three steps:

  1. Append a zero byte to the message.
  2. Split the result into a list of pieces that can be encoded by COBS blocks. (Notice that there is always a unique way of doing this.)
  3. Encode each piece and concatenate the result.

To decode a message, there are three steps:

  1. Split the message into a list of COBS blocks. (Notice that there is always a unique way of doing this.)
  2. Decode each block and concatenate the result.
  3. Remove the trailing zero byte.

Example encodings (block payloads marked up in bold):

Plaintext Encoded with COBS
1. 0x00 0x01 0x01
2. 0x11 0x22 0x00 0x33 0x03 0x11 0x22 0x02 0x33
3. 0x11 0x00 0x00 0x00 0x02 0x11 0x01 0x01 0x01
4. 0x01 0x02 ... 0xFF 0xFF 0x01 0x02 ... 0xFE 0xFF 0x01

Implementation[edit]

/*
 * StuffData byte stuffs "length" bytes of
 * data at the location pointed to by "ptr",
 * writing the output to the location pointed
 * to by "dst".
 */
#define FinishBlock(X) (*code_ptr = (X), code_ptr = dst++, code = 0x01)
 
void StuffData(const unsigned char *ptr,
unsigned long length, unsigned char *dst)
{
  const unsigned char *end = ptr + length;
  unsigned char *code_ptr = dst++;
  unsigned char code = 0x01;
 
  while (ptr < end)
  {
    if (*ptr == 0)
      FinishBlock(code);
    else
    {
      *dst++ = *ptr;
      code++;
      if (code == 0xFF)
        FinishBlock(code);
    }
    ptr++;
  }
 
  FinishBlock(code);
}
 
/*
 * UnStuffData decodes "length" bytes of
 * data at the location pointed to by "ptr",
 * writing the output to the location pointed
 * to by "dst".
 */
 
void UnStuffData(const unsigned char *ptr,
unsigned long length, unsigned char *dst)
{
  const unsigned char *end = ptr + length;
  while (ptr < end)
  {
    int i, code = *ptr++;
    for (i=1; i<code; i++)
      *dst++ = *ptr++;
    if (code < 0xFF)
      *dst++ = 0;
  }
}
 
/*
 * Defensive UnStuffData, which prevents poorly
 * conditioned data at *ptr from over-running
 * the available buffer at *dst.
 */
void UnStuffData(const unsigned char *ptr,
unsigned long length, unsigned char *dst)
{
  const unsigned char *end = ptr + length;
  while (ptr < end)
  {
    int i, code = *ptr++;
    for (i=1; ptr<end && i<code; i++)
      *dst++ = *ptr++;
    if (code < 0xFF)
      *dst++ = 0;
  }
}

References[edit]

  1. ^ Cheshire, Stuart; Baker, Mary. "Consistent Overhead Byte Stuffing" (PDF). ACM. Retrieved November 23, 2010. 

External links[edit]