Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That works and is perfectly uniform (if the people are i.i.d.), but requires quizzing around 20, 40, 60 or more people for a number before you can deliver one, while the algorithm described only requires one or two.

EDIT: Not quite that many, more like around 9, 18, 27 or so, see below.



I'm not sure I understand.

Humans aren't like weighted dice. For example, it's certainly conceivable that a few pairs of these humans misunderstand the goal such that they have a 0% chance of ever choosing numbers that change the respective ordering. Add to that the higher probability that many of these pairs of humans will just toggle their orders each round.

Edit: clarification-- the humans don't even have to misunderstand the rule, just the upshot of the process.


Maybe you don't even need to ask them to think of a number. The person whose name is sorted before the other person's name is person 1. And they compare their lengths instead of thinking of a number?


That gives you a single bit of entropy from those two people instead of a steady stream, though.


That's a good point, I guess you have to also randomly choose the humans?


I don't think so, if they were all lined up in alphabetical order and you took them in pairs from that ordering, it would be just as good.


But then you'll always get the same output from the same set of humans. It's a PRNG where the set of humans is the seed.


Nonsense. It requires roughly 1+log(10) bits in expectation and has a geometric tail.


Eh? 4 bits minimum to get a uniform 1-16, and each bit requires requires 2 numbers that are not identical. (Ah yeah, I thought that in turn requires a bit more than 4 numbers, but it requires only a bit more than 2 numbers (probability that numbers coincide is not 1/2, as with bits, but 1/10, or a bit more than that due to non-uniformity)).

Thus, I recant "around 20, 40, 60 or more" and replace it by "around 9, 18, 27 or more" numbers. Probability that you have to reject any resulting hex digit is obviously 3/8.


Rejection sampling isn't the best way to go.

With b bits you are subdividing the (0,1) interval into 2^b regions, and only need more bits if one of the 9 multiples of 0.1 land in the interval you've picked. As b increases this probability drops exponentially.

It's a fair point that the number of "numbers" depends on how low the entropy of the source is, but in the link the probability of collision wasn't massive.


But the original post described rejection sampling (once you got your bits from the larger/smaller trick).


Absolutely! I read the von Neumann extractor and presumed too much. For fun, you can reduce the numbers asked by getting a batch of k people, and if all numbers are dissimilar you get a draw from k!

The OP is silly though; if you know the distribution there are way better techniques. They don't seem to worry about how they measure it.


Parents algorithm always works, the one described has a massive failure mode - people slowly learn that they are likely to say 7 and self-censor by picking another number. Or maybe in a different culture the distribution changes (say China and the numbers 4, 8).


> the one described has a massive failure mode

Well, if you're really interested in generating random numbers using a group of people, then yes, this may be a problem.

But you should take this as an illustration of the more general problem of generating uniform random numbers from a distribution that is not.

If you think about it this way, author's solution can be applied in a number of practical scenarios. For example, if you want to generate a hash values for objects using properties that are not uniformly distributed. Or if you want to generate random numbers from a physical artifact that is not perfectly uniform.


I would guess that in China the probabilities of 4 and 8 goes down. Asking people to pick randomly drives them away from special numbers.


In the US 7 is a lucky # and a statistical outlier in that it was more frequently chosen in the article above.


You might be right, but my guess is that 7's reputation for luck isn't famous enough to make a difference, and in fact it's common because other numbers seem too "unrandom".

We could test by asking people to pick numbers less than 100. I bet people would focus on odd numbers, especially those greater than 50, not ending in 5 or not having both digits the same.


I decided to do that.

Data and light analysis is here: https://docs.google.com/spreadsheets/d/1Dh0wiTCRkBhckWGXtjZg...

I definitely overpaid on Mechanical Turk per response, given how lightning quickly the data came in. (I decided to pay $0.10/HIT for 50 responses and got 68 responses in 8m26s.) I suspect that offering a nickel would have gotten the survey filled in under half an hour still...


Maybe 7 cents?


Excellent point. The algorithm in the article is predicated on a specific distribution and iid, GP algorithm is predicated only on iid.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: