[Solved]: Is there a generalization of Huffman Coding to Arithmetic coding

Problem Detail: In trying to understand the relationships between Huffman Coding, Arithmetic Coding, and Range Coding, I began to think of the shortcomings of fuffman coding to be related to the problem of fractional bit-packing. That is, suppose you have 240 possible values for a symbol, and needed to encode this into bits, you would be stuck with 8 bits per symbol, even though you do not need a “full” 8, as 8 can express 256 possible values per symbol. A solution to this problem is something I’ve seen referred to as “fractional bit packing”, where you are able “bitshift” by a non-power of two using multiplication. Just like multiplication of powers-of-two are shifting x * 2 == x << 1 and x * 4 == x << 2 and so on for all powers of two, so too you can “shift” with a non-power-of-2 by multiplying instead, and pack in fractional-bit-sized symbols. The problem is similar with huffman coding: you end up with codes that must be non-fractionally-bit-sized in length, and therefore it has this packing inefficiency. However, you can’t just use the solution of fracitonal-bit-packing, because this solution assumes fixed sized symbols. The question is, are there any papers or solutions to improve on huffman coding with a similar idea to fractional-bit-packing to achieve something similar to arithmetic coding? (or any results to the contrary).

Asked By : Realz Slaw

Answered By : Pseudonym

Let’s look at a slightly different way of thinking about Huffman coding. Suppose you have an alphabet of three symbols, A, B, and C, with probabilities 0.5, 0.25, and 0.25. Because the probabilities are all inverse powers of two, this has a Huffman code which is optimal (i.e. it’s identical to arithmetic coding). We will use the canonical code 0, 10, 11 for this example. Suppose our state is a large integer, which we will call $s$. You can think of endcoding as a function which takes the current state, and a symbol to encode, and returns the new state: $$hbox{encode}(s, A) = 2s$$ $$hbox{encode}(s, B) = 4s + 2$$ $$hbox{encode}(s, C) = 4s + 3$$ So let’s start with the state 11 (which is 1011 in binary), encode the symbol B. The new state is 46, which is 101110 in binary. As you can see, this is the “old” state with the sequence 10 added to the end. We have essentially “output” the bit sequence 10. So far, so good. Now think for a moment about how arithmetic coding works. If you put the probabilities over over a common denominator, the symbol A actually represents the range $[frac{0}{4},frac{2}{4})$, the symbol B represents the range $[frac{2}{4},frac{3}{4})$ and the symbol C represents the range $[frac{3}{4},frac{4}{4})$. Basically what we’re doing here is multiplying everything by the common denominator. Imagine that the state was actually in base 4. Encoding a symbol B is really outputting the digit 2 in that base, and encoding a symbol C is outputting the digit 3 in that base. However, symbol A is a little different, because it isn’t a whole digit in base 4. Instead, we can think of the alphabet as the set of symbols A_0, A_1, B, C, with equal probability. This, again, has an optimal Huffman code 00, 01, 10, 11. Or, again, we can think of this in base 4. To encode a symbol, we just do: $$encode(s, A_0) = 4s + 0$$ $$encode(s, A_1) = 4s + 1$$ $$encode(s, B) = 4s + 2$$ $$encode(s, C) = 4s + 3$$ So now it’s clear how to encode symbols B and C, but to encode a symbol A, we have a choice. Which of $A_0$ and $A_1$ should we use? Now here’s the clever idea: we steal one bit of information from the state s: $$s’ = leftlfloor frac{s}{2} rightrfloor$$ $$i = s bmod 2$$ and then $hbox{encode}(s’, A_i)$. Using our previous example, s=11, we find that s’=5 and i=1, and then $hbox{encode}(5, A_1) = 4times 5+1 = 21$. The new state is 10101 in binary. Now this doesn’t produce exactly the same bit output as Huffman coding, but it does generate an output which has the same length. And what I hope you can see is that this is also uniquely decodable. To decode a symbol, we take the remainder when divided by 4. If the value is 2 or 3, then the symbol is B or C respectively. If it’s 0 or 1, then the symbol is A, and then we can put the bit of information back by multiplying the state by 2 and adding either 0 or 1. The nice thing about this approach is that it extends naturally to fractional-bit encoding, when the numerator and/or denominator of the probabilities are not powers of two. Suppose we have two symbols, A and B, where the probability of A is $frac{3}{5}$ and the probability of B is $frac{2}{5}$. Then we can encode a symbol with: $$encode(s, A_0) = 5s + 0$$ $$encode(s, A_1) = 5s + 1$$ $$encode(s, A_2) = 5s + 2$$ $$encode(s, B_0) = 5s + 3$$ $$encode(s, B_1) = 5s + 4$$ To encode the symbol A, we take $s’ = leftlfloor frac{s}{3} rightrfloor$ and $i = s bmod 3$, and then $hbox{encode}(s’, A_i)$. This is equivalent to arithmetic coding. It’s actually a family of methods known as Asymmetric Numeral Systems, and was developed over the last few years by Jarek Duda. The meaning of the name should be obvious: to encode a symbol with probability $frac{p}{q}$, you conceptually steal a base-p digit from the state, and then add a base-q digit. The asymmetry comes from interpreting the state as a numeral in two different bases. The reason why it’s a family of coding methods is that what we’ve seen here is impractical by itself; it needs some modifications to deal with the fact that you probably don’t have infinite-precision integers to manipulate the state variable efficiently, and there are various ways that you can achieve this. Arithmetic coding, of course, has a similar issue with precision for its state. Practical variants include rANS (the “r” means “ratio”) and tANS (“table-driven”). ANS has a few interesting advantages over arithmetic coding, both practical and theoretical:

Unlike arithmetic coding, the “state” is a single word, rather than a pair of words.
Not only that, but an ANS encoder and its corresponding decoder have identical states and their operations are completely symmetric. This raises some interesting possibilities, such as that you can interleave different streams of encoded symbols and everything synchronises perfectly.
Practical implementations need, of course, to “output” information as you go, and not just collect it in a big integer to be written at the end. However, the size of the “output” can be configured in return for (usually modest) compression loss. So where arithmetic coders must output a bit at a time, ANS can output a byte or a nybble at a time. This gives you a direct tradeoff between speed and compression.
It appears to be about as fast on current-generation hardware as binary arithmetic coding, and hence competitive with Huffman coding. This makes it much faster than large-alphabet arithmetic coding and its variants (e.g. range coding).
It appears to be patent-free.

I don’t think I’m ever going to do arithmetic coding again.

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/49243 Ask a Question Download Related Notes/Documents