[Solved]: What is the algorithm for Shannon-Fano code? am I correct?

Problem Detail: I am wondering what is the true algorithm for the Shannon-Fano code? The the result I am getting based on the Algorithm in Wikipedia page contradicts the supposed/expected length of the produced code. According to the proof of Kraft’s inequality, $l_i = lceil log_2{frac{1}{P_i}} rceil= lceil -log_2{P_i} rceil$.

A Shannon–Fano tree is built according to a specification designed to define an effective code table. The actual algorithm is simple:

  1. For a given list of symbols, develop a corresponding list of probabilities or frequency counts so that each symbol’s relative frequency of occurrence is known.
  2. Sort the lists of symbols according to frequency, with the most frequently occurring symbols at the left and the least common at the right.
  3. Divide the list into two parts, with the total frequency counts of the left part being as close to the total of the right as possible.
  4. The left part of the list is assigned the binary digit 0, and the right part is assigned the digit 1. This means that the codes for the symbols in the first part will all start with 0, and the codes in the second part will all start with
  5. Recursively apply the steps 3 and 4 to each of the two halves, subdividing groups and adding bits to the codes until each symbol has become a corresponding code leaf on the tree.

For example: $S_1 rightarrow P_{S_1} = frac{4}{9} rightarrow Code = 0 mapsto$ And: $lceil log_2{frac{9}{4}}rceil = 2 neq 1$ Bad! $S_2 rightarrow P_{S_2} = frac{2}{9} rightarrow Code = 10 mapsto$ And: $lceil log_2{frac{9}{2}}rceil = 3 neq 2$ Bad! $S_3 rightarrow P_{S_3} = frac{2}{9} rightarrow Code = 110 mapsto$ But: $lceil log_2{frac{9}{2}}rceil = 3 = 3$ $S_4 rightarrow P_{S_4} = frac{1}{9} rightarrow Code = 111 mapsto$ And: $lceil log_2{frac{9}{1}}rceil = 4 neq 3$ Bad! We can see that the length of produced Shannon-Fano code is $1$, but it supposed to be $2$. Which mean this algorithm is not correct. What is correct algorithm then? Additional note: If we look at the example 1 of this document, we can see that the length of $A4$ is supposed to be $3$ not $4$. The same contradiction. Another contradiction in example 1 of this other document. I think it is clear what I am talking about. More additional note: Here is the page 45 of this textbook -Information and Coding Theory (Springer Undergraduate Mathematics Series) 2000th Edition enter image description here

Asked By : Node.JS

Answered By : Yuval Filmus

You are confusing “Shannon coding” from “Shannon–Fano coding” (terminology could vary across sources). Per Wikipedia, Shannon–Fano coding is the algorithm you mention, while Shannon coding is any coding assigning a symbol occurring with probability $p_i$ a codeword of length $ell_i = lceil log_2 frac{1}{p_i} rceil$. Per Wikipedia, Shannon–Fano coding always leads to codewords whose lengths are within one bit of $log_2 frac{1}{p_i}$. This is also a feature of Shannon coding, but the two need not be the same. In particular, Shannon–Fano coding always saturates the Kraft–McMillan inequality, while Shannon coding doesn’t.
Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/48465  Ask a Question  Download Related Notes/Documents