> To make it unambiguous we must make sure that no code word is a prefix of anot...

a1369209993 · 2024-07-04T23:12:31 1720134751

> It would be interesting to see a [not gratuitously inefficient] uniquely decodable code that is neither a prefix code nor one in reverse.

This can be done by composing a prefix code with a suffix code:

    A   0
    B  01
    C  11
  a A  0
  b BA 010
  c BB 0101
  d BC 0111
  e C  11
  {a=0,b=010,c=0101,d=0111,e=11}

This is trivially uniquely decodable by uniquely decoding 0->A/etc backward, then uniquely decoding A->a/etc foreward. It's equivalent in lengths to the optimal prefix code {a=0,b=110,c=1110,d=1111,e=10} so it's a (one of several) optimal code for the same probability distributions.

And it's neither prefix nor suffix itself, since a=0 and b=010. In fact, it can't in general be decoded incrementally at all, in either direction, since "cee...ee?" vs "bee...ee?" and "?cc...cca" vs "?cc...ccb" both depend on unbounded lookahead to distinguish a single symbol.

I'm not sure the optimality holds for any composition of a in-isolation-optimal prefix code with a in-isolation-optimal suffix code, but it did work for the most trivial cases (other than the degenerate 1-to-1 code) I could come up with.

imurray · 2024-07-05T09:21:23 1720171283

Nicely done; thanks.

imurray · 2024-07-04T14:48:02 1720104482

> It would be interesting to see a uniquely decodable code that is neither a prefix code nor one in reverse.

More interesting than I thought. First the adversarial answer; sure (edit: ah, I see someone else posted exactly the same!):

    a 101
    b 1

But it's a bad code, because we'd always be better with a=1 and b=0.

The Kraft inequality gives the sets of code lengths that can be made uniquely decodable, and we can achieve any of those with Huffman coding. So there's never a reason to use a non-prefix code (assuming we are doing symbol coding, and not swapping to something else like ANS or arithmetic coding).

But hmmmm, I don't know if there exists a uniquely-decodable code with the same set of lengths as an optimal Huffman code that is neither a prefix code nor one in reverse (a suffix code).

If I was going to spend time on it, I'd look at https://en.wikipedia.org/wiki/Sardinas-Patterson_algorithm -- either to brute force a counter-example, or to see if a proof is inspired by how it works.

n4r9 · 2024-07-04T14:21:47 1720102907

It's a weird example, but what about

  a 1
  b 101

?

It is neither prefix-free nor suffix-free. Yet every occurrence of 0 corresponds to an occurrence of b.

However, this is obviously inefficient. So I guess the question is whether there's an optimal code which is neither prefix-free nor suffix-free.

--------------

EDIT

I did some googling and found this webpage https://blog.plover.com/CS/udcodes.html where the author gives the following example of a uniquely decodable code:

I guess this is "almost" prefix-free since the only prefix is c of d. If a message starts wiht 1, you could find the first 0 and then look at whether there's an odd or even number of 1's. So I think I can see how it's uniquely decodable. However, my crypto knowledge is too rusty to remember how to show whether this is an optimal code for some probability distribution.

imurray · 2024-07-04T14:52:07 1720104727

That code in the EDIT is suboptimal. It doesn't saturate the Kraft inequality. You could make every codeword two bits and still encode 4 symbols, so that would be strictly better.

n4r9 · 2024-07-04T14:58:39 1720105119

Ah of course. Thanks for the insight. About 15 years since I studied this stuff!

lazamar · 2024-07-04T08:52:15 1720083135

That’s interesting. I guess this is not usually used because you may have a long string of bits that is ambiguous till you get to a disambiguating bit.

Something like

`100000000000000001`

In this case, where to know whether the first code was an `a` or a `c` you have to read all the way to where the zeroes end.