Von Neumann's solution is clever, but extremely inefficient in terms of how many...

femtobit · on March 16, 2022

Interestingly, your method is a generalization of Von Neumann's method with more than two coin flips. With TT and HH, the length of the list is one (i.e., no bit of information), and with TH and HT, the length is two (exactly one bit of information).

eru · on March 16, 2022

That's an interesting way to look at it, yes!

I don't claim originality here either; the method is pretty well known.

hnjst · on March 16, 2022

There is a generalization allowing asymptotically optimal entropy extraction from Markov chains traces that doesn't require transition weights knowledge (like Von Neumann trick doesn't require bias knowledge).

https://ieeexplore.ieee.org/document/1684974

Shameless plug of a paper on related topics: https://link.springer.com/article/10.1007/s00446-017-0311-5

aqme28 · on March 16, 2022

I don't see how this works.

> The index of that position is a uniform random integer in a range from 1 to the size of the table. Use your favourite method to extract unbiased bits from that unbiased integer.

Why is that position unbiased and random? If I'm using a coin that produces heads every single time, then I'm always going to land on table position 0, so therefore the table index is biased.

eru · on March 16, 2022

Well, if your coin always lands heads, your table will only have a single entry. So you won't extract any bits from it.

Von Neumann's method won't extract any randomness from this coin either.

Do keep in mind that, crucially, you only build the table _after_ you figured out how many heads or tails you got in this particular instance of 1000 flips.

The randomness we extract is not in how _many_ heads or tails we got, but how those heads and tails are shuffled in your sequence.

For the math, have a look at https://news.ycombinator.com/item?id=30697559

aqme28 · on March 16, 2022

I misinterpreted your algorithm. So you're creating a table of all permutations given the number of heads and tails you got, and then looking it up in the table.

This does feel like a generalization of Von Neumann's method from 2 tosses to 1000

eru · on March 16, 2022

Yes, you can view it like that. Basically we have 1001 tables, for each amount of heads that are possible.

If you are sufficiently clever, you can generalize it to an indefinite stream of tosses, and produce output as you go along. See some of the papers mentioned elsewhere in the comments.

jagged-chisel · on March 16, 2022

> Produce a table of all possible sequences of length 1,000 with k heads and 1,000-k tails.

This is more efficient than Von Neumann?

steerablesafe · on March 16, 2022

This method generates approximately

  log_2(sqrt(1/(2*pi*p*(1-p))) - 1000*log_2(p^p * (1-p)^(1-p))

bits of entropy from 1000 coin flips, where "p" is the probablity of flipping heads.

Neumann's method generates 1000/(p(1-p)) bits of entropy from 1000 coin flips. The theoretical maximum is

  -1000*log_2(p^p * (1-p)^(1-p))

but it requires knowing "p" exactly. This method is close to the theoretical maximum. I didn't plug in numbers, or analyise further, but it's far better than Neumann's method.

One obvious drawback is that you have to flip the coin a 1000 times to produce the first unbiased random bit, while Neumann's method starts producing bits much earlier.

eru · on March 16, 2022

> One obvious drawback is that you have to flip the coin a 1000 times to produce the first unbiased random bit, while Neumann's method starts producing bits much earlier.

Definitely. Though you could fix that problem relatively easily.

I think you might even be able to run von Neumann's method first, but store the coin flips; and then once you've got enough stored, extract a few more bits from the already used flips.

Perhaps like this:

When you do two flips, you add one of three possible tokens to your list:

'double-heads', 'double-tails' or 'mixed'.

Crucially, you only store 'mixed' and not whether it was 'head-tails' or 'tails-heads' because that information was already used to produce the von-Neuman-bit.

After your list has 1000 entries, you run an algorithm a bit like what I originally described to extract bits. The complication is that the table you construct has all possibilities of shuffling a fixed number of 'double-heads', 'double-tails' or 'mixed' tokens.

steerablesafe · on March 16, 2022

It feels like an awkward middle ground. Like this method has a limited entropy output for the first 2000 coin flips (first 1000 double-heads/double-tails/mixed entries), and then suddenly it adds back a ton of lost entropy.

An other commenter linked to some papers for asymptotically optimal entropy generation, I wonder if there is more of a streaming method there. It feels like there has to be, even maybe after a slow start. My naive intuition is that after 1000000 coin flips you have a good idea what p is, and then you can basically do arithmetic coding from there. Of course a theoretically correct method can't do exactly this, but it might asymptotically approach it.

eru · on March 17, 2022

Oh, if you want something practical, the approach that Linux's /dev/random takes is probably the one to go.

/dev/random being unbiased relies on some assumptions about cryptography; but in practice these assumptions are at least as well-founded as our assumption that our coin flips are independent.

Look at some of the papers mentioned in other comments on this submission. There are (near) optimal streaming methods.

dmurray · on March 16, 2022

More efficient if coin flips are your scarce resource, less efficient if computation is scarce. But nowhere near as inefficient as it sounds - you don't actually have to materialize the table.

aqme28 · on March 16, 2022

As with so many algorithms, he's trading compute time for memory and pre-compute time. Generating that table is costly, but once you have that table it's pretty efficient.

SilasX · on March 16, 2022

You don’t need to store a big table with the table method, though. As long as there’s a pattern to it, you can figure out your position algorithmically.

For example, if H=1 and T=0 then you do something like “any result above 11010 is a real heads, others are tails.”

Edit: never mind, that’s a different method than OP. (Doesn’t require exactly k heads.) Doesn’t it get the best of both worlds though?

Edit2: Double never-mind, the problem assumes you don’t know the level of bias. But I think the OP threw that out as well.

eru · on March 16, 2022

The method I described it a bit weird and round-about, exactly because it has to work around not knowing the bias.

Essentially, it generates a thousand flips, and then carefully extracts the randomness only out of how those flips are shuffled, but now how many heads there are.

However, the closer to 500/500 your distribution is, the more entropy the method I describes can extract. Conversely, if you randomly hit 1000 heads and 0 tails, the method can't extract anything, and you'll have to sample again.

In contrast, a method that knows the bias and can exploit it, can extract the same number of random bits, no matter what you actually flipped. Eg in the special case of a known unbiased coin, you always get 1000 random bits for 1000 flips. Even if those flips happen to be 1000 heads.

SilasX · on March 16, 2022

But the point remains, your method shouldn’t need a full table cached in advance, right? You should be able to use a formula to compute its “effective index”.

eru · on March 17, 2022

Yes, definitely. I just didn't mention that, because I thought it was obvious that we agree on that.

And instead of computing the "effective index" as an abstract number and then extract bits from that, you can work out directly how to generate the unbiased bits.

eru · on March 16, 2022

Yes. Though in practice you wouldn't want to generate that specific table for 1000 flips. It's far too big.

eru · on March 16, 2022

More efficient in how many flips you consume to produce an unbiased bit.

Less efficient in terms of runtime (if you implement it as naively as I have described here).

Someone · on March 16, 2022

> Produce a table of all possible sequences of length 1,000 with k heads and 1,000-k tails. > Look up the position of your actual realized sequence in that table. The index of that position is a uniform random integer in a range from 1 to the size of the table.

I don’t see why it would be. The normal way to produce such a table is as hhh…hh, hhh…ht, hhh…th, … ttt…ht, ttt…th, ttt…tt. If heads is more likely than tails, the index is more likely to be in the lower half than the upper half.

Reading https://blog.cloudflare.com/ensuring-randomness-with-linuxs-..., I also don’t think that’s how Linux’ entropy pool works.

okintheory · on March 16, 2022

The parent comment was saying that when you fix the number of heads to be k in advance, the set of head positions you get is random. That sounds correct.

eru · on March 16, 2022

> Reading https://blog.cloudflare.com/ensuring-randomness-with-linuxs-..., I also don’t think that’s how Linux’ entropy pool works.

Where does the description in that article differ materially from what I described in the parenthetical remark?

(Of course, it does differ from what I described in the main text of the comment.)

j7ake · on March 16, 2022

Cool idea! I wonder what you mean concretely by “extract unbiased bits from unbiased integer”?

Let’s say my experiment results in an index of 42, how do you convert that integer into information about a string of unbiased coin flips?

steerablesafe · on March 16, 2022

Arithmetic coding.

https://en.wikipedia.org/wiki/Arithmetic_coding

eru · on March 16, 2022

Arithmetic coding is the industrial strength way to do this, yes!

However, you can do much simpler and still be pretty efficient (and unbiased).

Suppose your range was numbers between 1 to 49 (inclusive).

If you get a number between 1 to 32, you interpret that as 5 bits. If you got a number between 33 to 48, you interpret that as 4 bits (because you have 16 == 2^4 possibilities between 33 to 48). If you hit 49, you just drop everything and sample again.

So if you got 42, like j7ake suggested, that would be 42 - 32 = 10 = 0b1010.

You can generalize this scheme to other numbers of possibilities.