I'm a bit confused about this method. So the standard argument against such a pr...

stephencanon · on Sept 2, 2021

That's exactly right. It's not uniform, but the deviation from uniform is so small that there's no measurement you can perform that can observe the difference. If you need an formally uniform distribution for an information-theoretic proof or something similar, you wouldn't use this termination condition.

infogulch · on Sept 2, 2021

> no measurement you can perform that can observe the difference

That seems like a very particular phrasing, do you mean something specific by that?

thaumasiotes · on Sept 2, 2021

It's in the OP:

> this new algorithm opens an intriguing possibility: we can compute just 64 extra bits, and have a probability of 1 - 2^{-64} of terminating. This is so close to certainty that we can simply stop unconditionally without introducing any measurable bias (detecting any difference would require about 2^128 samples, which is prohibitive).

Using the "just use 64 extra bits" approach, there is a 1 in 2^64 chance that the result you compute will be wrong, and correspondingly a 99.9999999999999999946% chance that the result you compute will be right. That is the amount of bias in the algorithm.

bmm6o · on Sept 2, 2021

It's made clearer in the PR:

"detecting any difference [from uniform] would require about 2^128 samples, which is prohibitive"

TimonKnigge · on Sept 2, 2021

I see -- thank you for clarifying. So for a formally uniform distribution, is rejection sampling your only option ?

stephencanon · on Sept 2, 2021

Not at all! You can simply compute additional product bits (you don’t need to go bit-by-bit) until the “can this possibly carry out” test (which is just a comparison) says “no”. The probability that you need more bits falls off exponentially, so this happens very quickly.

What you lose by doing this is each value no longer consumes a bounded number of bits from the source, which is how it’s able to be “truly” uniform.

ogogmad · on Sept 2, 2021

Apologies if you already knew this:

It feels as if this is still related to rejection sampling, in spite of not being quite that. If the algorithm samples some bits, and still hasn't made a decision, then it's as if all the previous samples have been "rejected". The difference seems to be that this algorithm has a memory of previously rejected samples, which causes it to be less likely to reject subsequent samples. "Adaptive rejection sampling" appears to be relevant: https://en.wikipedia.org/wiki/Rejection_sampling#Adaptive_re...

crdrost · on Sept 2, 2021

Yes, although the algorithm here just accepts a certain small amount of bias rather than reject. But the "corrected" version of this says that if I am rolling a 1dN the first random byte will have N-1 different "rejection" criteria, the next byte and every byte after will only have 1 "rejection" criterion, and so on.

Something that would be more interesting: if you had a random number primitive which could generate 1 with probability 1/2, 2 with probability 1/6, 3 with probability 1/12, and so on (p = 1/(N * (N+1))), that primitive would allow you to construct random irrational numbers uniformly sampled from [0, 1) as continued fractions. Since the rationals ℚ are a non-dense subset of ℝ, ignore them as measure-0.

Such a primitive would allow you to do a sort of rejection-sampling with only a finite number of rejections before you are guaranteed an answer, as a continued fraction for a rational has a finite expansion.

tomp · on Sept 3, 2021

Q is a dense subset of R, but it's countable, so it has measure 0.

senderista · on Sept 3, 2021

Nit: the rationals are dense in R.

crdrost · on Sept 2, 2021

It helps to work out an example, maybe.

So let's try to roll a 1d5. We generate a real number p uniformly in [0, 1). In terms of base-2 decimal notation we should return

    1, if p < 0.0011 0011 0011 0011...
    2, if p < 0.0110 0110 0110 0110...
    3, if p < 0.1001 1001 1001 1001...
    4, if p < 0.1100 1100 1100 1100...
    5, otherwise

These repetitions are conveniently 4 bits long and could be written in hexadecimal, though of course they don't have to be (a 1d7 repeats after 3 bits, a 1d11 repeats after 10 bits).

Well, there's a problem. Our computers don't have infinitely long precise decimals like this, instead we generate a finite stream of bits. But suppose we generate those streams in "chunks". Let's say we generate one byte b at a time, this says for the first byte,

    if b  < 0x33, return 1.
    if b == 0x33, we'll need another byte b2 to decide.
        if b2 < 0x33, return 1.
        if b2 == 0x33, repeat with a new b2
        if b2 > 0x33, return 2.
    if b  < 0x66, return 2.
    if b == 0x66, we'll need another byte b2 to decide.
        if b2 < 0x66, return 2.
        if b2 == 0x66, repeat with a new b2
        if b2 > 0x66, return 3.
    if b  < 0x99, return 3
    [ if b == 0x99, we'll need another byte ]
    if b  < 0xCC, return 4
    [ if b == 0xcc, we'll need another byte ]
    otherwise, return 5.

You can kind of call this rejection sampling but notice that the post-rejection loop has different bounds than the non-rejection loop, it only has a 1-in-256 chance of repeating and not the 1-in-64 chance of rejection sampling.

I want to say that the probability of observing something hinky using K samples of a 1dN die with 2^B bits is not as simple as sqrt(K)_= 2^B but something more pessimistic like maybe K = 2^(B+1) / N^4 or so? So if you're generating histograms with 2^32 iterations of a million-sided die you actually don't have a huge safety margin here, 2^130 / 2^112 is only one in 2^18 or so, one in 250,000 runs could maybe correctly detect the bias. But it's still presumably "good enough" for all that.

Laremere · on Sept 3, 2021

Thanks for this explanation, it made it click for me. The benefit of doing far less additional sampling in the rejection case is very intuitive, especially when the numbers are very large. It also seems this technique could be extended to other scenarios, such as 1/pi and other weird non-uniform bucket sizes.

Someone · on Sept 3, 2021

You have to read that “more intriguing still” paragraph as being in the same scope as the preceding one, which says

> we can compute just 64 extra bits, and have a probability of 1 - 2⁻⁶⁴ of terminating. This is so close to certainty that we can simply stop unconditionally without introducing any measurable bias (detecting any difference would require about 2¹²⁸ samples, which is prohibitive)

So, it’s about a variation on the unbiased algorithm that is slightly biased, but gains some other desirable properties.