That works and is perfectly uniform (if the people are i.i.d.), but requires qui...

jancsika · on June 30, 2019

I'm not sure I understand.

Humans aren't like weighted dice. For example, it's certainly conceivable that a few pairs of these humans misunderstand the goal such that they have a 0% chance of ever choosing numbers that change the respective ordering. Add to that the higher probability that many of these pairs of humans will just toggle their orders each round.

Edit: clarification-- the humans don't even have to misunderstand the rule, just the upshot of the process.

mongol · on June 30, 2019

Maybe you don't even need to ask them to think of a number. The person whose name is sorted before the other person's name is person 1. And they compare their lengths instead of thinking of a number?

pdpi · on June 30, 2019

That gives you a single bit of entropy from those two people instead of a steady stream, though.

hcs · on June 30, 2019

That's a good point, I guess you have to also randomly choose the humans?

dmurray · on June 30, 2019

I don't think so, if they were all lined up in alphabetical order and you took them in pairs from that ordering, it would be just as good.

pteraspidomorph · on June 30, 2019

But then you'll always get the same output from the same set of humans. It's a PRNG where the set of humans is the seed.

frankmcsherry · on June 30, 2019

Nonsense. It requires roughly 1+log(10) bits in expectation and has a geometric tail.

FabHK · on June 30, 2019

Eh? 4 bits minimum to get a uniform 1-16, and each bit requires requires 2 numbers that are not identical. (Ah yeah, I thought that in turn requires a bit more than 4 numbers, but it requires only a bit more than 2 numbers (probability that numbers coincide is not 1/2, as with bits, but 1/10, or a bit more than that due to non-uniformity)).

Thus, I recant "around 20, 40, 60 or more" and replace it by "around 9, 18, 27 or more" numbers. Probability that you have to reject any resulting hex digit is obviously 3/8.

frankmcsherry · on June 30, 2019

Rejection sampling isn't the best way to go.

With b bits you are subdividing the (0,1) interval into 2^b regions, and only need more bits if one of the 9 multiples of 0.1 land in the interval you've picked. As b increases this probability drops exponentially.

It's a fair point that the number of "numbers" depends on how low the entropy of the source is, but in the link the probability of collision wasn't massive.

FabHK · on June 30, 2019

But the original post described rejection sampling (once you got your bits from the larger/smaller trick).

frankmcsherry · on June 30, 2019

Absolutely! I read the von Neumann extractor and presumed too much. For fun, you can reduce the numbers asked by getting a batch of k people, and if all numbers are dissimilar you get a draw from k!

The OP is silly though; if you know the distribution there are way better techniques. They don't seem to worry about how they measure it.

rbmktechik · on June 30, 2019

Parents algorithm always works, the one described has a massive failure mode - people slowly learn that they are likely to say 7 and self-censor by picking another number. Or maybe in a different culture the distribution changes (say China and the numbers 4, 8).

bnegreve · on June 30, 2019

> the one described has a massive failure mode

Well, if you're really interested in generating random numbers using a group of people, then yes, this may be a problem.

But you should take this as an illustration of the more general problem of generating uniform random numbers from a distribution that is not.

If you think about it this way, author's solution can be applied in a number of practical scenarios. For example, if you want to generate a hash values for objects using properties that are not uniformly distributed. Or if you want to generate random numbers from a physical artifact that is not perfectly uniform.

OscarCunningham · on June 30, 2019

I would guess that in China the probabilities of 4 and 8 goes down. Asking people to pick randomly drives them away from special numbers.

Ill_ban_myself · on June 30, 2019

In the US 7 is a lucky # and a statistical outlier in that it was more frequently chosen in the article above.

OscarCunningham · on June 30, 2019

You might be right, but my guess is that 7's reputation for luck isn't famous enough to make a difference, and in fact it's common because other numbers seem too "unrandom".

We could test by asking people to pick numbers less than 100. I bet people would focus on odd numbers, especially those greater than 50, not ending in 5 or not having both digits the same.

sokoloff · on June 30, 2019

I decided to do that.

Data and light analysis is here: https://docs.google.com/spreadsheets/d/1Dh0wiTCRkBhckWGXtjZg...

I definitely overpaid on Mechanical Turk per response, given how lightning quickly the data came in. (I decided to pay $0.10/HIT for 50 responses and got 68 responses in 8m26s.) I suspect that offering a nickel would have gotten the survey filled in under half an hour still...

Ill_ban_myself · on July 6, 2019

Maybe 7 cents?

FabHK · on June 30, 2019

Excellent point. The algorithm in the article is predicated on a specific distribution and iid, GP algorithm is predicated only on iid.