It's obviously false. If you know that the bias is 0.5, you can just pass the input stream through unmodified. You can also construct better codes for other biases.
Depends how you define best, I assume best in the article means ‘output is close to 50/50 and independent’, in which case the 01/10 solution is already optimal given the assumptions.
If you define best as the most efficient at converting inputs to outputs as well as the output being 50/50 then there are better ways such as your example.
In general, I think you could show that an 'arithmetic coding' approach is optimal, if the distribution of each bit given the previous bits is known. From there, you'd just have to show that the von Neumann approach for i.i.d. bits is no worse on average.
This is correct, and I came to the comments to say the same thing. It takes some work to implement without arbitrary-precision floating point, but arithmetic coding can make use of the full entropy in the input stream, whereas the approach in the article discards a lot of information.
Would love to see a proof of that. It feels a bit unintuitive to me.
(But on second thought that might be because I’m thinking of cases of correlated bits.)