Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>assuming you have a perfect cryptographic hash function

Except there is no such thing (not in the sense you need it) - it too would be a perfect source of randomness, and you're just pushing the problem down the road.

As an example, NIST publication 800-90B recommends multiple randomness extractors, one of which is SHA. They recommend using twice the amount of entropy in as the entropy out to get random "enough" bits. Thus even SHA is not going to mix bits well enough as you want.

(The things called perfect hash functions in the literature map N items into N slots with no collisions, which is not what is needed here).

As a perhaps surprising counterexample to any simple solution, here [1] is a math paper with a proof that there can be no optimal algorithm that is best for all values of coin bias.

Here's [2] a cool walkthough on some of the ideas in theory that have been investigated.

[1] https://web.eecs.umich.edu/~qstout/abs/AnnProb84.html [2] http://www.eecs.harvard.edu/~michaelm/coinflipext.pdf



> Except there is no such thing (not in the sense you need it)

???

Okay, lets choose the simplest hash function of them all: a 1-bit XOR (resulting in a 1-bit "hash").

Lets say I flip the 75% biased coin 100 times, count heads as 1 and tails as 0. I XOR all the results together. What's the probability of XOR(all) == 1, and what's the probability that XOR(all) == 0?

You'll find that its quite close to 50/50, which is the limit to the amount of entropy you can pull from a 1-bit hash function.

---------

Now lets see for smaller values:

1 flip: 75% chance of 1, 25% of 0. Which is the limit of information so far.

2 flips: ... you know what? Imma write a program and get back to ya a bit later. I probably will edit my post.

------

2 flips: 62% chance of Parity0 3 flips: 43% chance of Parity0 4 flips: 53% chance of Parity0

Etc. etc. etc.

By 16 flips, its 49.9985% chance of Parity0, pretty close to unbiased. It gets pretty hard to distinguish this coin from an unbiased one.

--------

I'm not sure if you need a cryptographic hash actually. Its just that cryptographic hashes are close enough to random that its easier to use.

Now I'm curious if a CRC32 would efficiently extract 16-bits of entropy from biased coins. CRC32 is clearly not a cryptographic hash, but its pretty good as an "avalanche" due to the nature of GF-arithmetic.


>It gets pretty hard to distinguish this coin from an unbiased one.

All you're doing is trying to reinvent the Law of Large Numbers - nothing in this post shows a perfect hash function, and in fact is demonstrating my point - by you putting biased data into a hash you're getting biased data out.

To completely remove the bias you need infinite flips. And thus you have improved nothing.

Next, if you're going to flip 100 times to get 1 bit out, simply use the Neumann method - it does work, uses only a few flips on average.

Here's a proof why your method doesn't work: Let p = prob of getting a 1, q = 1-p is prob of getting a 0. Then XORing n of these together is a 1 iff there are an odd number of 1 bits. This happens Sum_{j=1 to N by odd} nCj p^j (1-q)^(n-j) of the time. This simplifies to (1-(1-2p)^n)/2. For this to be exactly 1/2, which is no bias, p must be exactly 1/2. There is no other value that doesn't fail.

This argument can be extended to whatever hash functions you want. You're not going to get the outcome you desire no matter how you structure it.

>I'm not sure if you need a cryptographic hash actually.

It matters not the hash. None will do what you want.

This is why when crpyto uses weak PRNGs at the front, even after all the powerful crypto in the middle, they are weakened. You're trying to do what decades of cryptographers know is impossible.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: