> To make it unambiguous we must make sure that no code word is a prefix of another code word.
Technically, this is not quite correct. The class of so-called uniquely decodable codes is unambigous, and a superset of the prefix codes. One simple example of a
uniquely decodable code is the reverse of a prefix code. For the example in the article that would be
a 1
b 00
c 10
While the code for a is a prefix of the code of c, one can still unambiguously decode any code sequence by processing it in reverse order. It would be interesting to see a uniquely decodable code that is neither a prefix code nor one in reverse.
> It would be interesting to see a [not gratuitously inefficient] uniquely decodable code that is neither a prefix code nor one in reverse.
This can be done by composing a prefix code with a suffix code:
A 0
B 01
C 11
a A 0
b BA 010
c BB 0101
d BC 0111
e C 11
{a=0,b=010,c=0101,d=0111,e=11}
This is trivially uniquely decodable by uniquely decoding 0->A/etc backward, then uniquely decoding A->a/etc foreward. It's equivalent in lengths to the optimal prefix code {a=0,b=110,c=1110,d=1111,e=10} so it's a (one of several) optimal code for the same probability distributions.
And it's neither prefix nor suffix itself, since a=0 and b=010. In fact, it can't in general be decoded incrementally at all, in either direction, since "cee...ee?" vs "bee...ee?" and "?cc...cca" vs "?cc...ccb" both depend on unbounded lookahead to distinguish a single symbol.
I'm not sure the optimality holds for any composition of a in-isolation-optimal prefix code with a in-isolation-optimal suffix code, but it did work for the most trivial cases (other than the degenerate 1-to-1 code) I could come up with.
> It would be interesting to see a uniquely decodable code that is neither a prefix code nor one in reverse.
More interesting than I thought. First the adversarial answer; sure (edit: ah, I see someone else posted exactly the same!):
a 101
b 1
But it's a bad code, because we'd always be better with a=1 and b=0.
The Kraft inequality gives the sets of code lengths that can be made uniquely decodable, and we can achieve any of those with Huffman coding. So there's never a reason to use a non-prefix code (assuming we are doing symbol coding, and not swapping to something else like ANS or arithmetic coding).
But hmmmm, I don't know if there exists a uniquely-decodable code with the same set of lengths as an optimal Huffman code that is neither a prefix code nor one in reverse (a suffix code).
It is neither prefix-free nor suffix-free. Yet every occurrence of 0 corresponds to an occurrence of b.
However, this is obviously inefficient. So I guess the question is whether there's an optimal code which is neither prefix-free nor suffix-free.
--------------
EDIT
I did some googling and found this webpage https://blog.plover.com/CS/udcodes.html where the author gives the following example of a uniquely decodable code:
a 0011
b 011
c 11
d 1110
I guess this is "almost" prefix-free since the only prefix is c of d. If a message starts wiht 1, you could find the first 0 and then look at whether there's an odd or even number of 1's. So I think I can see how it's uniquely decodable. However, my crypto knowledge is too rusty to remember how to show whether this is an optimal code for some probability distribution.
That code in the EDIT is suboptimal. It doesn't saturate the Kraft inequality. You could make every codeword two bits and still encode 4 symbols, so that would be strictly better.
That’s interesting. I guess this is not usually used because you may have a long string of bits that is ambiguous till you get to a disambiguating bit.
Something like
`100000000000000001`
In this case, where to know whether the first code was an `a` or a `c` you have to read all the way to where the zeroes end.
Technically, this is not quite correct. The class of so-called uniquely decodable codes is unambigous, and a superset of the prefix codes. One simple example of a uniquely decodable code is the reverse of a prefix code. For the example in the article that would be
While the code for a is a prefix of the code of c, one can still unambiguously decode any code sequence by processing it in reverse order. It would be interesting to see a uniquely decodable code that is neither a prefix code nor one in reverse.