> Interestingly, this is the best possible approach, whether we know the bias of...

atq2119 · on Sept 15, 2024

It's obviously false. If you know that the bias is 0.5, you can just pass the input stream through unmodified. You can also construct better codes for other biases.

ralegh · on Sept 15, 2024

Depends how you define best, I assume best in the article means ‘output is close to 50/50 and independent’, in which case the 01/10 solution is already optimal given the assumptions.

If you define best as the most efficient at converting inputs to outputs as well as the output being 50/50 then there are better ways such as your example.

Maro · on Sept 16, 2024

OP here. Thanks for the comment. I'm not a mathematician, will double-check my wording/phrasing on this.

Maro · on Sept 18, 2024

OP here.

Re-reading this bit, I agree my wording is very clunky. What I meant by "whether we know the bias of the input bit stream or not" was:

[For p≠1/2] whether you know the value of p or not, this is the best approach. Of course, p=1/2 is the trivial case where we can just pass through..

contravariant · on Sept 15, 2024

It's worded quite badly, but I think it's the best you can do given 2 independent bits, except for the rare case where 00 and 11 are equally likely.

bjornsing · on Sept 16, 2024

That’s a good counter example. :) Thanks.

LegionMammal978 · on Sept 15, 2024

In general, I think you could show that an 'arithmetic coding' approach is optimal, if the distribution of each bit given the previous bits is known. From there, you'd just have to show that the von Neumann approach for i.i.d. bits is no worse on average.

Darmani · on Sept 15, 2024

This is correct, and I came to the comments to say the same thing. It takes some work to implement without arbitrary-precision floating point, but arithmetic coding can make use of the full entropy in the input stream, whereas the approach in the article discards a lot of information.

bjornsing · on Sept 16, 2024

I had this thought too. But can you be sure that there won’t be any statistical patterns / correlations in the output bitstream with that approach?