Talk:Huffman coding

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Pain in the ass[edit]

It's a real pain in the ass how every mathematics-related article on wikipedia assumes you have a masters degree, at least. I understood Huffman coding as explained in my first undergrad book on data structures and algorithms, and reading this completely wiped out any intuition that I gained previously. — Preceding unsigned comment added by (talk) 05:52, 4 July 2012 (UTC)

I'm in the same boat. I know Huffman coding is simple, and I used to be able to do it, but this article was nowhere near the refresher I had hoped it would be. (talk) 19:25, 7 December 2012 (UTC)
If you can spell out the problems with the material, someone more well-versed in the subject may be able to make improvements to the article.
Sebastian Garth (talk) 19:44, 8 December 2012 (UTC)

basic example image is wrong?[edit]

hi! i think the example in the basic example, the one in the captation of the image is wrong. it shoud be

a3 110
a4 111

and not:

a3 111
a4 110

both the image is wrong and the table. i think one can't mix the way the 0/1 is set. PedroCarvalho (talk) 06:23, 28 November 2007 (UTC)

well, you can mix the way 0/1 is set, the output code will be good anyway, you just have to remember it when you decode it! anyway, you'd better not to change the way 0/1 is set, so I have fixed both image and caption. Alessio Damato 13:59, 4 December 2007 (UTC)
it still mixes the way 0/1 is set. In the first split it gives 0 to the least common (a1: 0.4), and 1 to the most common ({a2,a3,a4}: 0.6), but in the two other splits is gives 0 to the most common and 1 to the least common. -Thomas Nygreen (talk) 09:59, 9 May 2008 (UTC)
Huffman code generators can select 0/1 in any manner they choose (including randomly) and still generate a valid Huffman code. Nevertheless the diagram ought to be consistent with the pseudocode. Dcoetzee 16:30, 9 May 2008 (UTC)
Exactly like it says in the text below it, the hoffman coding should be done from left to right, taking the lower probability each time. This is NOT what the picture shows, so it is clearly wrong. Yomach (talk) 06:27, 26 June 2010 (UTC)
Isn't the frequency for the 'f' character also wrong? Should be 2 not 3? Moulsonp (talk) 10:54, 11 May 2012 (UTC)


The algorithm in "Basic technique" is ambiguous. It's not clear how the two queues are used. It seems that the second queue is never read from, only added to.

[BREAK INTO DISCUSSION: 30 July 2007 - Simply does not compute - I have been trying to make sense of this 'pseudo code' for a couple of hours now. I am no slouch, but I just can't get the algorithm from this description. Somebody - please post something more intelligible, then remove this comment. - TropicalCoder (Google me) EXIT DISCUSSION]

~ I believe I may have better solved the example "Sample-1". I may be wrong, but I found the algorithm ill explained in my notes so I visited wikipedia to better understand it... I tried to solve it in the most logical way to me... building a complete binary tree with apropriate depth and using the percentage costs as guides to choose which boundaries to choose.

What I got was this: There is 1 character with weight 30; 1 with weight 20; 3 * 15; 1 * 10 and 3 * 5 for a total sum of 120. Dividing all by 120 and using the percentage to define how much of the tree to take away you get that 30/120 = 25%... and you go on like that... using this technic I got:

Character Weight Coding
h 30 00
e 20 010
b 15 011
d 15 100
g 15 101
a 10 1100
c 5 1101
f 5 1110
i 5 1111

Total cost is exactly the same, but the coding varies in size between 2 and 4 bits, and in #Basic_technique it says that "It is generally beneficial to minimize the variance of codeword length." I believe that makes my solution better, so if anyone with more than a mere day's knowledge on the matter agrees, then by all means make the correction :) --nunocordeiro 02:02, 11 July 2006 (UTC)

You got the right answer for the wrong reasons. You used Shannon-Fano coding, which in this case happens yield a result identical (in terms of each symbol's weight) to the bottom-merge Huffman code, the Huffman code with the lowest variance and lowest maximum length (among optimal codes). In general, these are not the same, and Shannon-Fano is suboptimal. The example should be replaced by one that either yields only one Huffman code (again, in terms of symbol weights, so {0,1} is the same code as {1,0}) or explain bottom-merge Huffman coding. I'll put this on my to do list, but if someone wants to fix this, go ahead. Calbaer 02:10, 27 July 2006 (UTC)

Removed "*Animation of the Huffman Algorithm: Algorithm Simulation by Simon Mescal" since the link is dead. If it comes back up move back in.

What is meant by "cost" in this article?

  • Yeah, I was wondering too look, what I found [1], now I see. The cost is the cost to send each letter in the target alphabet. For example if we send letters as in Morse with "." & "_" we will spend more on "_". Gnomz007 15:07, 16 Nov 2004 (UTC)

Or to qualify a little more, the letter S in morse is three dots in rapid sucession. Three time periods. The letter 0 is three dashes. Each dash takes up at least two time periods. So, at least six time periods. Therefore, sending O is at least twice as expensive (in time!) as sending an S. You would think that a byte is always 8 characters long and so this has no meaning. However, most US english only uses about 7 characters. If you find that no 8-bit characters are being used in the file -- or if you find a way to mark them as such -- you can compress the file down to 7/8th its original size right there. Consider some of the unicode encodings, where most characters take 8 bits, but if you want an extended character, you can use extra bytes is you flag them. -- 18:55, 4 October 2005 (UTC)

Oh no! This is wrong, wrong, wrong (in the context of the article). For most applications, costs are probabilities, pure and simple. But they need not be, because they are just multiplicative factors (or, as the Basic technique section has it, weights) for the expression to be minimized. Adding to the confusion, the Huffman coding with unequal letter costs subsection uses cost in the way you two and the linked paper are discussing, as output costs! It seems as though an information theorist wrote most the article but a computer scientist wrote the Problem definition section. Anyway, I think most people who read the entire article would come away very confused. Looks like it's time for a rewrite (or at least a find/replace of the word 'cost').... Calbaer 02:22, 27 July 2006 (UTC)

Back-story on the Discovery of Huffman Coding[edit]

Dr. David A. Huffman was the Chairperson of the Information and Computer Science Department at University of California at Santa Cruz (UCSC) in the early 1970's. He taught an undergraduate course, "The Algebraic Foundations of Cybernetics" in 1971-73. During one lecture he taught "Minimum Redundancy Coding". A student noticed, in some textbooks, it was called "Huffman Coding" -- and asked him if that was the same principle.

Dr. Huffman replied by relating an interesting story.

As a student at MIT, David Huffman had been working on an unsolved, challenging problem in Information Theory. Instead of studying for the final, he had decided to devote all his free time to work on this problem. The night before the final, with no solution yet, in a moment of despair, he threw his large stack of notes into the trash. He described the next moment as a flash of insight in which the solution occurred to him.

He went on to explain that he was still working on many difficult, interesting problems that, when solved, could be represented in textbooks for centuries. He did not want such a "simple idea" as minimum reduncancy coding to bear his name. Students wondered and discused among themselves: was this humility, or hubris?

not one-to-one on finite messages[edit]

The Huffman coding is not one-to-one for finite messages. Suppose the message consists completely of the same character (e.g. a file full of 'aaaaaaaaaaaaaaaaaaaa...'); then that character would be assigned the empty bitstring; and the message would encode to the empty string. But such messages of different lengths would all encode to the empty string, hence the information about the length is lost. If you think about decoding such a message, you would also see the problem: you have an empty encoded message, but you also have a symbol whose code is the empty bitstring, so you don't know how many characters to write out.

This problem does not occur for infinite messages, of course; since there is only one infinite message consisting solely of the same character, for any given character.

Perhaps this should be mentioned in the article? --Spoon! (talk) 04:18, 12 January 2008 (UTC)

This is not entirely correct. The weights in the Huffman algorithm represent probabilities, not observed frequencies: so you can only assign the empty bitstring to a character if you know that such character will occur with certainty. Hence this is a degenerate case, where the only information being transmitted is the length of the string. In practice, if all you have is finite sample data, then you need to use something alike to Laplace's rule of succession to assign probabilities. —Preceding unsigned comment added by Classicalecon (talkcontribs) 11:50, 11 February 2008 (UTC)
Another thing to keep in mind is that the empty string is not part of a valid Huffman code. A valid Huffman code has no codeword of length less than one. If the most likely outcome is more than 40% likely, its codeword will only have one bit, but it cannot have less, even if it is 99.9999% likely. If it is 100% likely — or even just much more likely than 50% — then a practical compressor wouldn't use a Huffman code per symbol, but perhaps instead a run-length encoding on the entire sequence. If there are relatively few possible sequences, that run-length code could be a Huffman code. Calbaer (talk) 22:27, 11 February 2008 (UTC)

Length-limited Huffman coding[edit]

The section on length-limited Huffman coding needs to be clarified, since finding the optimal length-limited code given the initial weight distribution is trivial. In what sense is this an unsolved problem? —Preceding unsigned comment added by Classicalecon (talkcontribs) 10:56, 11 February 2008 (UTC)

How is it trivial? If the Huffman code has a length longer than the length limit, we have to throw away this code and find the optimal length-limited code using a different method, e.g., the Package-Merge algorithm. It might be trivial to find a length-limited code in general — a fixed-length code certainly limits length — but to find an optimal length-limited code is another matter. No method is known to solve this problem in linear or logarithmic time. Calbaer (talk) 22:27, 11 February 2008 (UTC)
The article section currently claims that it isn't linearithmic, but I don't see how that's the case. The package-merge algorithm finishes in O(nL) time. But if L is greater than log n in base phi, code length is not a limiting factor. (See Witten, Moffat, and Bell, Managing Gigabytes, ISBN 9780442018634, pp. 345-350, for a proof.) Otherwise, L ≤ log n, and O(nL) ≤ O(n log n). So why isn't that linearithmic? --Damian Yerrick (talk | stalk) 21:21, 24 April 2012 (UTC)
Because Huffman coding is used where the weights/probabilities/frequencies are not all identical. If the probability of the jth item were 2-j, then the optimal (length-unlimited) code would have two codewords of length n-1. If you wanted to limit lengths to n/2, then the Package-Merge method would take O(n2). Calbaer (talk) 20:33, 6 June 2014 (UTC)

Basic technique?[edit]

In the article it describe the basic technique like so:

  1. Start with as many leaves as there are symbols.
  2. Enqueue all leaf nodes into the first queue (by probability in increasing order so that the least likely item is in the head of the queue).
  3. While there is more than one node in the queues:
        1. Dequeue the two nodes with the lowest weight.
        2. Create a new internal node, with the two just-removed nodes as children (either node can be either child) and the sum of their weights as the new weight.
        3. Enqueue the new node into the rear of the second queue.
  4. The remaining node is the root node; the tree has now been generated.

However, it never says what the second queue is used for! From what I understand shouldn't this node be inserted back into the first queue and the queue sorted? I'm having real problems working out how to generate the Huffman tree (since it differs from a normal binary tree as the highest weight should only be a single bit). —Preceding unsigned comment added by (talk) 19:30, 6 May 2008 (UTC)

This description is correct but confusing. The step "Dequeue the two nodes with the lowest weight" is implicitly examining the fronts of both queues. A conceptually simpler implementation uses priority queues, but is not linear time. I will attempt to clarify the exposition. Dcoetzee 19:35, 6 May 2008 (UTC)

Alternate Pseudocode[edit]

I think a more programming-focused version of the pseudocode should be added to the article, as the current explanation is a bit difficult for programmers to implement and somewhat ambiguous. Archer1742 (talk) 10:29, 26 March 2009 (UTC)

Removal of links[edit]

Please explain your reason for mass removal of links from this article. Please do not simply refer to WP:LINKFARM. Explain why each link you removed was bad.Biophys (talk) 14:56, 9 April 2009 (UTC)

How does one decode?[edit]

I have a solid background in comp theory, but I can't see from this article, unless I might start writing the math from scratch, how one takes the encoding and reforms a string like the French phrase in the example. The article linked in prefix code [2], also woefully inexplicable. explains me the idea of using the tree plus the encoded string, and so now I understand. But why isn't there an algorithm, pseudocode, and/or illustration of decoding?

Also, some proof of optimality would be nice. SamuelRiv (talk) 00:01, 11 February 2011 (UTC)

Yes, I see what you mean. Well, I can put together at least a brief overview of the topic, I think. It'll probably need some additional expansion, of course... Sebastian Garth (talk) 17:24, 11 February 2011 (UTC)
I just read the section and it is very useful. The thing that was bothering me is that even though I know how the prefix tree works now, the article seems to assume not just familiarity, but working comfort with the prefix tree. As such, the beginning sections completely skip over a basic, diagrammed rundown of prefix tree compression, such that the formulation of the problem makes sense in context, the importance of representing characters with probabilities has actual meaning in terms of prefix code, and the later animation (which also should be reduced in size, btw) can be explainable quickly after the introduction.
Really what was so confusing about the first section is it is representing all characters by probabilities. Again, I have a lot of background, but my immediate question was how the probabilities are useful in any sense of encoding and decoding. Of course, they are (generally) not - they are useful strictly in optimizing the algorithm. Ideally, the prefix code article would be readable as well - by the way, this article introducing prefix coding is what I used to figure this out - but a summary could easily be done on Wikipedia so that one only needs a little comp sci to get through. SamuelRiv (talk) 19:04, 12 February 2011 (UTC)

This article strikes me as a little too math-y and abstract for most Wikipedia readers.

I think that it would be good to have a section on basic practical implementation issues so that people can see how Huffman coding can be very fast to encode and decode, even for large alphabets---the encoder constructs a flat encoding table from the Huffman tree, mapping input symbols to their encoded lengths and actual codes. The decoder uses an inverse table to map codes back to the input symbols. Once the frequencies are sorted and codes are assigned, the actual encoding or decoding can be very, very fast. I think explaining that would make the abstract stuff more concrete for many readers, especially programmers who aren't fluent in formal theory.

Some useful detail:

For small alphabets with a reasonable longest code length, the decoder can just use a dense table (1-dimensional array) with an entry for every possible bit pattern whose length is the length of the longest code.

Suppose, for example, that your maximum code length is 9, but you have mostly shorter codes like 0x101. To decode the next code, you can just take the next 9 bits (not knowing whether you need them all) and index into a 2^9 = 512-element decoding table. For short codes like Ox101 there are duplicate entries for all possible 9-bit patterns that start with that short pattern (101000001, 101000010, ... 101111111). No matter which of those bit patterns you use to index into the array, you get back the same code length (3) and encoded bit pattern (0x101), in fast constant time.

For large alphabets or other situations where you may have some longer codes, you can get most of the same benefit by using a small table like that and a conditional branch---take the next 9 bits, index into the table as above, and check the code length. If it's 9 or less, proceed as above. If it's not, that's because you're actually looking at the first 9 bits of a longer-than-9-bit code, so you branch to a somewhat slower routine for the longer codes.

This is generally extremely fast, because short codes occur much more frequently than longer codes---that is why they're shorter. In the common case it's just a handful of instructions, hitting a small table that fits in small super-high-speed Level 0 cache memory. (And with only one conditional branch, which is usually not taken and won't cause a pipeline stall on modern processors with branch predictors that notice things like that.) The routine to handle long codes can be pretty fast too, with time bounded by the code length, so that the total time to decode a file is proportinal to the entropy (encoded file size). (talk) 17:45, 1 May 2016 (UTC)

Ambiguity alert[edit]

Hey there, this is not 100% clear is it? Huffman avoided the major flaw of the suboptimal Shannon-Fano coding by building the tree from the bottom up instead of from the top down. So was the flaw created "by building the tree from the bottom up", or was the "bottom up" way the good method? Heh, if you have no idea about H. C. at first, you might be more confused than you were at the beginning. -andy (talk) 15:54, 27 October 2012 (UTC)

Second paragraph is poorly worded[edit]

Nuf' said. — Preceding unsigned comment added by (talk) 10:49, 25 August 2013 (UTC)

External links[edit]

My removal of the external links, none of which appear to meet our guideline on external links was reverted. Can I ask how the remaining links meet that guideline? Thargor Orlando (talk) 17:05, 21 January 2014 (UTC)

Asymmetric numeral systems (ANS) - combining speed of Huffman with accuracy of arithmetic?[edit]

There is lots of going on about this alternative to arithmetic coding, which have similar complexity to Huffman coding, but handles fractional bits of information - have accuracy like arithmetic coding.

Some implementations: Blogs:

Maybe it should be mentioned somewhere in the article? — Preceding unsigned comment added by (talk) 21:58, 4 February 2014 (UTC)

I have added some very brief subsection, but ANS rather requires a separate article. — Preceding unsigned comment added by (talk) 15:47, 28 February 2014 (UTC)

It had one, but that was deleted due to original research: Wikipedia:Articles_for_deletion/Asymmetric_binary_system. The subsection here seems to have been a way to get that content back in. No idea what's happened since then, but back then one compression expert mentioned that a fatal flaw of the system was the need to decode in reverse order [3]. That means no context-adaptivity (which the inventor of the code acknowledged). Since the primary advantage of arithmetic coding is, in practice, that adaptivity, I don't see the practical application of ANS, let alone its viability as a distinct subsection in the article of a completely different coding technique (without reliable sources, to boot). Calbaer (talk) 20:45, 6 June 2014 (UTC)
Not to say that ANS is useless, just that the evidence of notability has to be more than its inventor extolling its virtues and some folks implementing it in personal libraries. Calbaer (talk) 20:50, 6 June 2014 (UTC)
I removed it. The first comment here is from a Purdue domain, and Purdue is where the ANS inventor was located when this was posted. He has been using Wikipedia to try to promote this for years in violation of WP:SOAP, WP:FORUM, WP:NOR, WP:RS, and WP:notability. Calbaer (talk) 00:02, 7 June 2014 (UTC)
Maybe look at some current benchmarks (you can test yourself) e.g. , (zhuff is LZ4 + FSE). — Preceding unsigned comment added by (talk) 05:55, 7 June 2014 (UTC)
Maybe look at the Wikipedia policies and guidelines and follow them. It's been 6.5 years since Wikipedia:Articles_for_deletion/Asymmetric_binary_system, where several people tried to explain to you why this is not an appropriate place for this content. If you really feel it is, read the policies and guidelines, then say how it doesn't violate them. You clearly have either not bothered to read them, or have ignored them, repeatedly polluting several articles with your blatant and annoying self-promotion. This article, for example, had a whole section about your recent-but-arcane algorithm, while other well-established alternative algorithms such as LZW and arithmetic coding, used by most people in industrial societies every day of their lives, merited only a brief mention, and that just so folks could know there were commonly used alternatives to Huffman coding with various pros and cons. Your algorithm isn't among those well-established commonly used ones.
After 6.5 years, the method hasn't been published in a peer-reviewed paper, adopted by a commercial or open-source product (I'm not counting libraries), or otherwise garnered notability; a few excited hobbyists on the web do not make something "notable," nor does any benchmark, no matter how good. Data compression is a very small community, and I know a couple of the people you've worked and been in contact with. Your position in the community is not that of a crank or a hack. But, on Wikipedia, that's precisely what you come off as. Unfortunately, your material being here reminds me of the Wikipedia Seigenthaler biography incident: Both take advantage of a period of time in which no one who saw knew to fix the misplaced content. Under WP:AGF, I'll have to take this as a sign of willful ignorance rather than dishonesty. However, I suggest you no longer write about your own work. If it is as notable as you think, someone else inevitably will. And if they're familiar with Wikipedia, they'll do a better job: Writing for a general audience rather than a technical one, including reliable sources, making the notability of the work obvious. But, even then, it won't be in this article, because your compression method is not a form of Huffman coding. Calbaer (talk) 15:29, 7 June 2014 (UTC)
Also, since you seem to reply to critiques but not read policies and guidelines, I should make it clear that, even if your algorithm were presented in a peer reviewed paper or used in a mobile app, that alone does not merit its inclusion here. I cite the lack of these for reasons why it isn't notable, but notability requires a higher bar than that alone, and not every notable invention merits inclusion on Wikipedia. I have several optimal coding algorithms presented in peer-reviewed papers. They are somewhat relevant to this topic, but mentioning them would hinder, not help, someone who comes here trying to understand Huffman coding. And they wouldn't merit their own pages on Wikipedia, useful though I think they might be. Though it isn't in a policy or guideline that I know of, one thing you should ask yourself before contributing is: Does this help Wikipedia and its readers, or does this just help me? I'm not here to promote myself and my work, and you shouldn't either. Calbaer (talk) 17:11, 7 June 2014 (UTC)
You really don't like this guy. Anyway, don't you think that Wikipedia should somehow mention that there are now methods, like a few month old FSE (or maybe some of your codings you've mentioned?), which are both faster and providing better compression than Huffman - that we could save time/energy/disk space? — Preceding unsigned comment added by (talk) 22:29, 7 June 2014 (UTC)
Hi, I was wondering who would call Matt Mahoney (PAQ) and Bulat Ziganshin (FreeArc) as "a few excited hobbyists", or wouldn't like that, while comparing with FSE, prefix codes look really miserable now. — Preceding unsigned comment added by (talk) 23:22, 7 June 2014 (UTC)
I don't know who you think you're fooling. First you contributed self-promotion through a named Wikipedia account, then, when that article was deleted, you contributed to tangentially related articles as an anon through your Purdue IP. Then you followed up on the talk page via your U.S. home IP. After you moved back to Poland, you posted a few hours later from your Polish home IP. And you followed it up by saying I really don't like "that guy" from your Polish T-Mobile IP. It's all the same guy. You think you're posting anonymously and that people will think the other posts are not you, but the sockpuppeting is painfully obvious (and yet another violation of Wikipedia policy). Perhaps you'll try a VPN next, but by this point I think we can assume that any anonymous follow-up or follow-up from a newly created account is just you again.
Then you top it off by trying to dig up and post irrelevant information about me. I guess I knew going in that you were rude and deceptive, but to be so blatant about it does nonetheless surprise me. It especially surprises me because behavior on such a public forum holds the potential for seriously harming your professional reputation. Your work being unnotable is not a reflection of either a lack of intellect or any deficiency as a human being. But your behavior on Wikipedia is another matter.
Matt Mahoney told you that your code suffers from the fatal flaw of not being context-adaptive. Yet you cite him as a defender, evidence that your work is notable in the context of Wikipedia? Given what he and others have said, your code seems slower than prefix coding (which doesn't require multiplies) and lacks the main advantage of arithmetic coding (context adaptivity). It's fun for toy problems, but not as a replacement for current technologies. Matt is interested in this because he's interested in new methods. But just because another academic shows interest in your work does not make it notable. As you've been repeatedly told, even if your method were the best ever, that alone would not make it notable in this context. Wikipedia is not the forum for advancing research. You must know this, since you've been told this repeatedly for years. This is not an alternative to the peer review process. It is an encyclopedia.
The Huffman coding article needs a lot of work; it suffers from bloat and confuses novices, and stuff like this only adds to these problems. Please stop contributing to Wikipedia until you can be constructive rather than destructive, and can stop willfully violating the policies of the website. Calbaer (talk) 00:36, 8 June 2014 (UTC)

A few compressors have recently switched to ANS - maybe it is worth mentioning it somewhere: zhuff (huff->tANS), lzturbo (huff->tANS), LZA (range coding->rANS), ZSTD (only tANS) and they look quite interesting in Matt's benchmarks ( ): size for enwiki8, comp and decomp time (ns/byte):

lzturbo 1.2 –39 | 26,915,461 | 582 | 2.8

LZA 0.70b –mx9 –b7 –h7 | 27,111,239 | 378 | 10

WinRAR 5.00 –ma5 –m5 | 27,835,431 | 1004 | 31

WinRAR 5.00 –ma5 –m2 | 29,758,785 | 228 | 30

lzturbo 1.2 –32 | 30,979,376 | 19 | 2.7

zhuff 0.97 –c2 | 34,907,478 | 24 | 3.5

gzip 1.3.5 –9 | 36,445,248 | 101 | 17

pkzip 2.0.4 –ex | 36,556,552 | 171 | 50

ZSTD 0.0.1 40,024,854 | 7.7 | 3.8

WinRAR 5.00 –ma5 –m1 | 40,565,268 | 54 | 31 — Preceding unsigned comment added by (talk) 12:55, 30 January 2015 (UTC)

calbaer, you say that the primary advantage of arithmetic coding is for adaptivity, by which you seem to mean very fine-grained adaptation to changes in symbol frequency distributions.

My understanding is that the primary advantage of arithmetic coding is simply the ability to use fractional bits to encode symbols with non-power-of-two probabilities. For example, in PPM the biggest win of arithmetic coding is for encoding sequences of symbols with probabilities very close to 1, using significantly less than 1 bit per symbol. (As in compressing the string "United States of America"---once you've seen "United Sta" you can encode "tes" in less than a bit, and once you've seen "United States of A" you can likewise encode "merica" with less than one bit for all 6 symbols.) As Charles Bloom has pointed out, the other "probabilities" used in PPM are usually pretty bogus because they're too sparse to be statistically significant. (By the time you've seen enough occurrences of a context to do real statistics, that context is usually masked by new higher-order contexts with the same sparsity problem.) So in practice, the biggest win of arithmetic coding is often just to efficiently encode a probability that's very nearly 1.

It is common for the entropy coder to be faced with relatively stable statistics from a fixed model for a particular domain, or computed for large blocks of input, or as the output of an adaptive model that decorrelates away most of the instabilities. A good adaptive predictor may tend to make very different predictions depending on what it's seen lately, but output residuals that look a whole lot more like noise with a particular characteristic frequency distribution. That's especially true for an adaptively approximate predictor, which may predict only those bits it's likely to be able to predict, and regard the rest as incompressible noise, essentially splitting the data into the prediction, the residual (prediction error), and the too-random-to-bother-with details.

For use cases where very fine-grained adaptation isn't needed, *or* where it's handled by some decorrelating transform or model-switching ahead of the entropy coder, ANS has a lot of advantages over traditional range coding. It's faster to decode (the fastest way to decode Huffman codes is usually ANS-style), it's very easy to parallelize SIMD-style (or on GPU, but I don't know if anybody's actually done that yet), and it's easy to interleave outputs of different models with no metadata to mark the switch. (The reverse-order thing is not generally a killer, either, for a blockwise or sliding-window asymmetrical compressor. The compressor can do things in whatever order makes for the fastest decompression---see GLZA for an extreme example.)

That is why the cool kids are already doing it---including Charles Bloom, who invented PPMZ and LZP, and Yann Collett. ANS is used in most of the new fast asymmetrical compressors, both free software and commercial products(Collet's STD et al., Bloom & Giesen's Oodle LZNIB, etc.)---and others have made noises about switching when they get around to it (e.g., Kennon Conrad (GLZA), Richard Geldreich (LZham)).

I am not one of those people---I've never used ANS, yet, and don't have a dog in the fight---but it's pretty clear that this is an expert consensus. Nobody knows more about how to compress data well and decompress it fast than those guys. And they think that ANS is a win, if not the greatest thing since sliced bread---essentially the only interesting development in entropy coding per se in a long time, and one of the most useful for practical lossless compression algorithms.

It is unfortunate that most of those people do not (currently) prioritize refereed scientific publications, if that's keeping ANS out of Wikipedia. That is a big problem in lossless data compression---most of what makes practical compression systems work well is hidden in archivers and proprietary code, or is in widely-used free software but with zero publications in the refereed literature. (For example, LZMA was the most scientifically interesting and *useful* general-purpose lossless compression algorithm for a decade, and millions of people use it, but to this day almost nobody really understands why it works so well *except* for the people I mention here and a very few others. There are zero academic publications explaining LZMA, and the best descriptions are on Charles Bloom's blog.) (talk) 18:21, 29 April 2016 (UTC)

There are at least two peer-reviewed papers about ANS now: Additionally, it is currently the only entropy coding in experimental branch of Google VP10 video codec: (talk) 08:58, 1 May 2016 (UTC)

Calbaer, you said that "No idea what's happened since then, but back then one compression expert mentioned that a fatal flaw of the system was the need to decode in reverse order". Did you actually read the old thread you linked to? Nobody said anything about a fatal flaw---just things that it's not appropriate for (direct application to PPM or context mixers). The issue of reverse order was soon resolved in that thread. For the things that ANS is appropriate for, you're going to use frequencies for a reasonably large block, in which case the compressor can go through the block backwards, a little slower, so that the decompressor only has to go forwards, fast.

That means that ANS is good for the pretty much same things that *Huffman* is good for. If that's ANS's "fatal flaw", it seems to be Huffman's too---adaptive Huffman is too slow. (As the present article says.) If that's true, then the article should say so up front: that Huffman is obsolete and is of historical interest only, because you gotta have quick adaptation and arithmetic coding just wins. And maybe the Huffman article should be deleted. :)

But that is NOT true. Blockwise entropy coding is very useful in the most popular compression algorithms in the world: asymmetric LZ77-based compressors, gzip (DEFLATE) and LZMA, like where you entropy code LZ77 match offsets, match lengths, and/or literals between matches, or the output of some model of those.

And that is precisely where ANS is actually replacing Huffman---in high-performance asymmetrical LZ77- and LZMA-like algorithms like ZSTD and LZNIB.

You can't have it both ways---if ANS is useless and uninteresting, so is Huffman. But if Huffman is interesting---and it is--then ANS is *more* interesting, because it has one of the two main advantages of arithmetic coding (efficiency for probabilities not near a power of two or near one), and Huffman has neither.

You seem to argue that the precision of arithmetic coding (or ANS) is relatively unimportant, compared to the fast adaptation thing.

That is exactly the reverse of every description of arithmetic coding I've ever seen. (And I've seen a bunch.) Every time anybody starts talking about arithmetic coding, the very first thing they say is that it's a solution to Huffman's inability to use fractional bits, and is that really cool. And ANS has that over Huffman, too.

ANS will rise[edit]

>As you've been repeatedly told, even if your method were the best ever, that alone would not make it notable in this context. Wikipedia is not the forum for advancing research. You must know this, since you've been told this repeatedly for years. This is not an alternative to the peer review process. It is an encyclopedia.

Calbaer, this is scary, why such brutally restrictive policy, people should know that there is an superb algorithm - that's the main purpose of one encyclopedia, your policy delays the ... advent of one good news.

They say that a good thing never lasts

And then it has to fall

Those are the the people that did not

Amount to much at all

/Madonna 'Give It 2 Me' lyrics/

My point, good things eventually will make their way in this crazy world.

Sanmayce (talk) 08:28, 20 May 2015 (UTC)

Do you really think that current data compression specialists - people who have built their careers on Huffman will acknowledge its obsolescence? Generation needs to change ... (talk) 03:49, 24 May 2015 (UTC)
You might prefer Madonna, but I prefer Carl Sagan: "The fact that some geniuses were laughed at does not imply that all who are laughed at are geniuses. They laughed at Columbus, they laughed at Fulton, they laughed at the Wright brothers. But they also laughed at Bozo the Clown." The terms of notability are laid out at Wikipedia:Notability. Just because your invention isn't notable today doesn't mean it will never be. When and if it is notable, it will be its own article, not part of this one. (Incidentally, if you think people have built careers on Huffman coding, you really don't know much about the field. It's a fertile subject for investigation, but has never been one for the volume of grants or industry interest one would need for anyone having "built their careers on" it.) Calbaer (talk) 04:44, 2 June 2015 (UTC)
So how many compressors need to switch to ANS to make it notable? (current list: ) (talk) 12:07, 2 June 2015 (UTC)
The real question is, how many peer-reviewed articles describing ANS need to appear to make it notable?
Please, please, please: spend your time getting those articles written and accepted, rather than trying to promote your ideas here. Difficult as it may be to believe, Wikipedia is not the place for original research. That's just the way it is, and has to be. —Steve Summit (talk) 12:45, 19 June 2015 (UTC)
Peer-reviewed paper: "The use of asymmetric numeral systems as an accurate replacement for Huffman coding". Apple's new 3x faster compressor LZFSE: Lempel-Ziv + Finite State Entropy clearly points Yann Collet's tANS implementation: . CRAM v3 genetic data compressor of the European Bioinformatics Institute uses rANS ( ) ... (talk) 08:59, 7 August 2015 (UTC)

Another peer-reviewed paper, on synthesizing FPGA's for high-throughput ANS: (talk) 13:51, 2 May 2016 (UTC)

ISIT 2016 ( ): "On the Stationary Distribution of Asymmetric Binary Systems", Google VP10 ANS implementation: , Google WebP ANS implementation: , fast rANS: — Preceding unsigned comment added by (talk) 09:00, 13 June 2016 (UTC)
Apple has just made open source its currently default compressor (LZFSE, since iOS 9 and OS X 10.11) - its entropy coding is indeed tANS: (talk) 08:54, 17 June 2016 (UTC)
First book with chapter about ANS (of Colt McAnlis from Google: ): (talk) 07:24, 19 July 2016 (UTC)

Examples don't match?[edit]

Is it my imagination, or does the image in the Problem definition section (File:HuffmanCodeAlg.png) not match the Example in the Problem definition section? (Yet they use the same symbols a, b, c, d, and e, inviting the reader to try to correlate them.) —Steve Summit (talk) 12:41, 19 June 2015 (UTC)