Talk:Prefix code

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Is UTF8 a prefix code in use today? If so, I think it should be included. 71.158.69.195 22:27, 20 March 2007 (UTC)

Comma-Free Codes[edit]

Comma-Free codes are not the same thing as prefix codes. Any set with 11 (or any code word comprised of only one bit for that matter) cannot be a comma-free code. This is shown in "Comma-Free Codes" by Golomb, Gordon and Welch in the Canadian Journal of Mathematics vol. 10, no. 2, pp. 202-209 (1958).

What definition of "comma" does Golomb use in that paper? --68.0.124.33 (talk) 14:21, 14 April 2008 (UTC)

Do I get it right that comma-free code is the same as self-synchronizing code? Jaan Vajakas (talk) 16:42, 23 October 2010 (UTC)

I don't think so, at least according to the definitions given here. The code with the single codeword 0101 is comma-free (the sequence 01010101 formed by two consecutive codewords does not have a codeword starting from its second symbol) but not self-synchronizing (it does have a codeword starting from its third symbol). —David Eppstein (talk) 18:45, 23 October 2010 (UTC)
The article says that code is comma-free if "the substring starting at the second symbol and ending at the second-last symbol does not contain any codewords". In your example, the aforementioned substring is 101010 and it does contain the codeword 0101 as a substring (but, indeed, not as a prefix). So if you are right then it should be specified in the article that "contains" means "contains as a prefix", to make the definition unambiguous. However, I don't quite see why such a definition would be useful, so I still rather believe that "contains" means "contains as a substring" (and hence comma-free = self-synchronizing). Jaan Vajakas (talk) 20:42, 23 October 2010 (UTC)
Probably you're right. —David Eppstein (talk) 22:31, 23 October 2010 (UTC)
I edited the pages Self-synchronizing code and Prefix code to reflect that self-synchronizing code = comma-free code, and redirected Comma-free code to Self-synchronizing code. Jaan Vajakas (talk) 19:28, 4 December 2010 (UTC)

With a prefix code, a receiver can identify each word without requiring a special marker between words[edit]

A very poor wording (although not a false statement), so the reader might imply that prefix property is a criterion of unique decoding of any coded string, while it is in fact a criterion of instant decoding, a slightly stronger requirement. It, sadly, was translated to several other languages.

N=2:

0  0?
01

N=3:

00 0?
001
01 0?

N=4:

000 0?
0001
001 0?
010 0?
0101

All coded strings
starting with 1
or containing
two adjacent 1
are invalid.

There are non-prefix uniquely decodable codes. These seem not have any practical significance, but are perfectly possible. Is the code {0, 01} uniquely decodable? Certainly yes, because it is "suffix code" (a prefix code read from right to left). Are coded strings instantly decodable, transmitted from left to left? Certainly not, but any coded string become decodable after receiving next (no more than) 1 bit. From any leftmost N bits (N>1) of a valid coded string we always can decode either all N bits (if the last bit is 1) or N−1 bits (if the last bit is 0). It is possible to construct even self-synchronized codes like this, which are not prefix codes. Any thoughts? Incnis Mrsi (talk) 12:54, 4 March 2011 (UTC)

Are these all prefix codes?[edit]

"If every word in the code has the same length, the code is called a fixed-length code, or a block code (though the term block code is also used for fixed-size error-correcting codes in channel coding). For example, ISO 8859-15 letters are always 8 bits long. UTF-32/UCS-4 letters are always 32 bits long. ATM packets are always 424 bits long. A block code of fixed length k bits can encode up to 2^{k} source symbols."

This implies that ISO 8859-15, UTF-32/UCS-4 and ATM packets are all prefix codes. ATM packets aren't, UTF-32/UCS-4 is and I don't know about ISO 8859-15. And in what sense are ATM packets a code as discussed in this article? Dougher (talk) 14:53, 28 December 2012 (UTC)

Doesn't it imply that they're fixed-length codes... AnonMoos (talk) 03:08, 29 December 2012 (UTC)

Weak definition[edit]

The lead uses the examples of {9,55,59} as a prefix code, and explains {5,9,55,59} is not "because 5 is a prefix of both 59 and 55". But then a code of {5,56,59} is uniquely determined, as is {5,59}, even though they share prefixes.

I'd fix this myself, but I am still unsure of the complete definition of prefix code, and I don't want to use an example that is also a bitfix code, or something similar. Can we get an expert? SamuelRiv (talk) 00:11, 5 May 2013 (UTC)

I was wondering the same thing. I'm going to change the first to just {9, 55}, though if {9, 55, 59} is actually a valid code, it would probably be better (though perhaps an explanation would be in order). --67.180.84.49 (talk) 20:00, 25 December 2013 (UTC)