So the standard argument against such a procedure is that if you generate N random bits, each of {0,1}^N bitstrings is equiprobable and therefore no mapping of {0,1}^N to {0..U-1} can map an equal number of bitstrings to each output.
A priori the method seems to work around this by conditionally sampling more bits, so that your domain is not one of fixed-length bitstrings. But then there is this paragraph:
> More intriguing still, this algorithm can be made unconditional by removing the early out, so that every value computed requires word size + 64 bits from the stream, which breaks the loop-carried dependency for fast generators, unlocking vectorization and parallelization where it was previously impossible. This is an especially powerful advantage when paired with bitstream generators that allow skip-ahead such as counter-based generators or PCG.
But now we are mapping {0, 1}^N into {0..U-1} right? So this mapping ought not be uniform? In fact I'm not sure why the early-termination method even works, because for a fixed U we know the maximum depth of the generated-bitstring-tree, so we can just pad it with imaginary random bits to arrive at the contradiction that U does not divide 2^N.
I imagine I missed something, what is it?
EDIT: thinking about it more, if you sample finitely many bits then each possible output has a probability that is a dyadic rational (fractions of the form a/2^k) which does not include all integer reciprocals.
That's exactly right. It's not uniform, but the deviation from uniform is so small that there's no measurement you can perform that can observe the difference. If you need an formally uniform distribution for an information-theoretic proof or something similar, you wouldn't use this termination condition.
> this new algorithm opens an intriguing possibility: we can compute just 64 extra bits, and have a probability of 1 - 2^{-64} of terminating. This is so close to certainty that we can simply stop unconditionally without introducing any measurable bias (detecting any difference would require about 2^128 samples, which is prohibitive).
Using the "just use 64 extra bits" approach, there is a 1 in 2^64 chance that the result you compute will be wrong, and correspondingly a 99.9999999999999999946% chance that the result you compute will be right. That is the amount of bias in the algorithm.
Not at all! You can simply compute additional product bits (you don’t need to go bit-by-bit) until the “can this possibly carry out” test (which is just a comparison) says “no”. The probability that you need more bits falls off exponentially, so this happens very quickly.
What you lose by doing this is each value no longer consumes a bounded number of bits from the source, which is how it’s able to be “truly” uniform.
It feels as if this is still related to rejection sampling, in spite of not being quite that. If the algorithm samples some bits, and still hasn't made a decision, then it's as if all the previous samples have been "rejected". The difference seems to be that this algorithm has a memory of previously rejected samples, which causes it to be less likely to reject subsequent samples. "Adaptive rejection sampling" appears to be relevant: https://en.wikipedia.org/wiki/Rejection_sampling#Adaptive_re...
Yes, although the algorithm here just accepts a certain small amount of bias rather than reject. But the "corrected" version of this says that if I am rolling a 1dN the first random byte will have N-1 different "rejection" criteria, the next byte and every byte after will only have 1 "rejection" criterion, and so on.
Something that would be more interesting: if you had a random number primitive which could generate 1 with probability 1/2, 2 with probability 1/6, 3 with probability 1/12, and so on (p = 1/(N * (N+1))), that primitive would allow you to construct random irrational numbers uniformly sampled from [0, 1) as continued fractions. Since the rationals ℚ are a non-dense subset of ℝ, ignore them as measure-0.
Such a primitive would allow you to do a sort of rejection-sampling with only a finite number of rejections before you are guaranteed an answer, as a continued fraction for a rational has a finite expansion.
So let's try to roll a 1d5. We generate a real number p uniformly in [0, 1). In terms of base-2 decimal notation we should return
1, if p < 0.0011 0011 0011 0011...
2, if p < 0.0110 0110 0110 0110...
3, if p < 0.1001 1001 1001 1001...
4, if p < 0.1100 1100 1100 1100...
5, otherwise
These repetitions are conveniently 4 bits long and could be written in hexadecimal, though of course they don't have to be (a 1d7 repeats after 3 bits, a 1d11 repeats after 10 bits).
Well, there's a problem. Our computers don't have infinitely long precise decimals like this, instead we generate a finite stream of bits. But suppose we generate those streams in "chunks". Let's say we generate one byte b at a time, this says for the first byte,
if b < 0x33, return 1.
if b == 0x33, we'll need another byte b2 to decide.
if b2 < 0x33, return 1.
if b2 == 0x33, repeat with a new b2
if b2 > 0x33, return 2.
if b < 0x66, return 2.
if b == 0x66, we'll need another byte b2 to decide.
if b2 < 0x66, return 2.
if b2 == 0x66, repeat with a new b2
if b2 > 0x66, return 3.
if b < 0x99, return 3
[ if b == 0x99, we'll need another byte ]
if b < 0xCC, return 4
[ if b == 0xcc, we'll need another byte ]
otherwise, return 5.
You can kind of call this rejection sampling but notice that the post-rejection loop has different bounds than the non-rejection loop, it only has a 1-in-256 chance of repeating and not the 1-in-64 chance of rejection sampling.
I want to say that the probability of observing something hinky using K samples of a 1dN die with 2^B bits is not as simple as sqrt(K)_= 2^B but something more pessimistic like maybe K = 2^(B+1) / N^4 or so? So if you're generating histograms with 2^32 iterations of a million-sided die you actually don't have a huge safety margin here, 2^130 / 2^112 is only one in 2^18 or so, one in 250,000 runs could maybe correctly detect the bias. But it's still presumably "good enough" for all that.
Thanks for this explanation, it made it click for me. The benefit of doing far less additional sampling in the rejection case is very intuitive, especially when the numbers are very large. It also seems this technique could be extended to other scenarios, such as 1/pi and other weird non-uniform bucket sizes.
You have to read that “more intriguing still” paragraph as being in the same scope as the preceding one, which says
> we can compute just 64 extra bits, and have a probability of 1 - 2⁻⁶⁴ of terminating. This is so close to certainty that we can simply stop unconditionally without introducing any measurable bias (detecting any difference would require about 2¹²⁸ samples, which is prohibitive)
So, it’s about a variation on the unbiased algorithm that is slightly biased, but gains some other desirable properties.
So the standard argument against such a procedure is that if you generate N random bits, each of {0,1}^N bitstrings is equiprobable and therefore no mapping of {0,1}^N to {0..U-1} can map an equal number of bitstrings to each output.
A priori the method seems to work around this by conditionally sampling more bits, so that your domain is not one of fixed-length bitstrings. But then there is this paragraph:
> More intriguing still, this algorithm can be made unconditional by removing the early out, so that every value computed requires word size + 64 bits from the stream, which breaks the loop-carried dependency for fast generators, unlocking vectorization and parallelization where it was previously impossible. This is an especially powerful advantage when paired with bitstream generators that allow skip-ahead such as counter-based generators or PCG.
But now we are mapping {0, 1}^N into {0..U-1} right? So this mapping ought not be uniform? In fact I'm not sure why the early-termination method even works, because for a fixed U we know the maximum depth of the generated-bitstring-tree, so we can just pad it with imaginary random bits to arrive at the contradiction that U does not divide 2^N.
I imagine I missed something, what is it?
EDIT: thinking about it more, if you sample finitely many bits then each possible output has a probability that is a dyadic rational (fractions of the form a/2^k) which does not include all integer reciprocals.