Notice that nowadays, unlike 2 years ago, people usually recommend to use the last technique I presented there in the last paragraph before the Conclusion. Which is to generate a random value that is big enough, so that n-log(p) > 128 so that the bias will be too small to be exploitable in practice. It's simpler and unlike rejection sampling ensures your code cannot fall into an infinite loop in case your PRNG is broken. (I'd argue you might want your code to fail in that case anyway, but YMMV.)
The other virtue of this technique is that some of the more popular fast rejection sampling methods (e.g. Lemire's "nearly divisionless") leak a small amount of information via timing side channel, because the divisionless fast path is _not_ unbiased.
If you have a random stream of bits, and you use rejection sampling to extract a value from that stream, then you don't reveal any information about the value. At most, you reveal information about the stream prior to the value you chose, but each bit of a secure RNG should be unrelated to all prior bits, so this is not an issue.
But you do? You expose some information on the range of the value via the time it took to sample assuming for example that the attacker knows the rejection sampling method in use.
That said, there are few situations where the modulus being used is not a public parameter of a protocol, and it is very difficult to perform operations with a secret modulus in constant-time, as your comment points out.
You'll always be able to get an approximate guess of the size of the modulus too, since larger moduli will need more registers to represent data.
We don't consider "leaks the length of a SHA256 hash" to be a valid timing attack in most protocols for similar reasons (i.e. it's public knowledge).
When developers encounter timing attacks in their code, they often invent really dumb ways to side-step the length "leaking".
This might be understandable if it was a MAC then Encrypt protocol with PKCS padding (hello lucky13), but instead this comes up in the context of "validate this HMAC-SHA256 tag for our JWT-like protocol".
Lemire's technique is really nice, in general a good thing to learn about, since it's a bit mind bending how it's playing with intervals. Sadly last time I benchmarked it in code on x86-64 for cryptographic purposes, it wasn't faster than rejection sampling, or just using a large value and a modulo reduction: in all cases what is actually taking a lot of time is the call to get good quality randomness out of a CSPRNG, the rest being negligible in comparison.
Everything here is focused on cryptography, but there's other common issues that can fall into this trap too. For example, naive implementations of a hash function for a basic array + linked list hash table. If you generate a hash, and then modulo that to pick a hash bucket, you could end up with a biased distribution across the buckets, and customers complaining your performance varies wildly based on the input value.
Isn't that why most tutorials on writing hash tables that use this bucketing method recommend using table sizes that are powers of two? I guess most tutorials don't exactly explain why, though, so someone who doesn't understand could decide to not use a power of two without understanding why that's bad.
There's another reason why (mathematically related) to use a power of 2 as well: it makes the modulo a simple and fast mask, instead of a slow division.
If you need to generate multiple such random numbers, an alternative way to resolve modulo bias with minimal entropy waste is to batch the random number generation. For example, you can generate 6 random integers in [0, 100) by generating a random integer in [0, 100^6) and performing modulo reductions on the result to get your 6 integers. 100^6 is slightly less than 256^5, so if your RNG works in units of bytes, then you can use 5 bytes to generate 6 integers in [0, 100) instead of 6 bytes.
> You shouldn’t rely on these claims, because even 1 bit of bias on a 256 bit nonce value can be enough to attack certain cryptographic schemes!
This seems like a very high bar for a random generator to clear. It also raises a question: would using a larger nonce size actually increase risk, if the additional bits were biased?
The claim "even 1 bit of bias on a 256 bit nonce value can be enough to attack certain cryptographic schemes" is true but there have been no practical attacks against 256 bit secured schemes. And the example of the mod 107 issue only reduces the amount of bits from 6.74bits to 6.71bits (a reduction of 0.4%) so hardly worth worrying about in the real world.
I remember very old versions of man random [1] warning against that sort of thing and recommending to pick the high (or low) bits of the random value. Probably the piece of advice was not correct.
This is more likely a memory of the warning concerning the low quality of the randomness in the low bits (e.g. in LCGs where it always alternates in the lowest bit), or the fact the high bits of rand(3) were often zero due to a small RAND_MAX.
In the example of reducing a byte-sized random value modulo 107, the bias is that 0 can be generated by three different possible inputs (0, 107 and 214), while 42 can only be generated by two (42 and 149; 256 is just out of range), so 0 ends up being 50% more common than 42 in the long run.
With your proposed scheme, 0 can be generated by three possible inputs again (0, 1 and 2), while 1 can only be generated by two (3 and 4), so 0 ends up being 50% more common than 1 in the long run.
It does but it does not! java.util.Random is not a CSPRNG at all and is terrible, so even tho the nextInt() method is using rejection sampling, it's still producing biased values and also completely fails to be "unpredictable" because java.util.Random is weak and predictable.
Interesting! How come this wasn't fixed? Nobody noticed, or is it that java.util.Random is not meant for serious cryptographic use?
I know there are other parts of the Java standard lib that are so terrible [1] that people for years have recommended not using them, like anything with dates and timezones...
---
[1] or used to, haven't kept up with the latest Java versions. Maybe they fixed it.
Historically one of the first uses for computers was Monte Carlo simulations. In such simulations it's often important to be able to deterministically recreate sequences of "random" numbers. Thus, in almost all computer programming languages, "Random" means deterministic random values. This is a wide convention followed by practically every programming language.
If you want cryptographically secure random numbers, you typically call a function with a different name, one that has "secure" or "crypto" in a name somewhere (e.g., in the function or containing module/package).
This is a convention from the 1950s and has been consistent ever since. That naming-convention ship sailed before many of us were born.
And as a counterpoint, if you use a well-chosen deterministic sequence, it can make some Monte Carlo methods produce better results faster than true randomness would (quasirandom sequences and all that).
So, interestingly, "bad randomness" =/= deterministic.
(not that you were claiming that, just adding this subtlety to the mix)
> is it that java.util.Random is not meant for serious cryptographic use?
Indeed it never was. SecureRandom (a subclass of Random in a different package, grouped with other cryptography related functionality) was introduced in Java 1.1 in 1997.
> I know there are other parts of the Java standard lib that are so terrible [1] that people for years have recommended not using them, like anything with dates and timezones...
The original Java date classes (also from the 1.1 era) were functional and correct, but badly designed. A modern time API was introduced in Java 8 in 2014.
> The original Java date classes (also from the 1.1 era) were functional and correct, but badly designed. A modern time API was introduced in Java 8 in 2014.
I'm pretty sure they were neither correct nor functional. I've read pretty detailed examples of everything they got fundamentally wrong, which was a lot. Wrong, not as "this is cumbersome to use" but as in "this fundamentally misunderstands what dates are at a conceptual level, and produces wrong results". Unfortunately, I would have to google it now.
All I can find right now is Jon Skeet's summary [1], which leans more to the "avoid using java.util.Date because it's one giant pitfall" side of the argument. Though it does mention some fundamental problems with it. However, this is not the article I found back in the day, which was both more comprehensive and more critical.
> A modern time API was introduced in Java 8 in 2014.
I seem to remember even at that time people were still not recommending Java standard date classes and functions, but I may be misremembering. Or were all the replacements like joda time from a previous era?
> Or were all the replacements like joda time from a previous era?
Joda time essentially is the new standard Java date/time implementation. Joda was used as an out-of-stdlib proving ground to try out ideas and get feedback. The new Java date/time API is based on the learnings from Joda time.
My understanding from use of the new (I shouldn't say "new"; it's been there since Java 8) date/time classes are that they're pretty good at representing dates, times, calendars, etc., but I wouldn't claim to be an expert on such matters.
The Jon Skeet article you're citing mentions only java.util.Date, and it is in fact absolutely fair to say that this class "fundamentally misunderstands what dates are at a conceptual level" in all of its functionality that goes beyond being a thin wrapper around a number of milliseconds.
But note that java.util.Date is from Java 1.0. The mistake was recognized and in Java 1.1 we got java.util.Calendar and friends, which is, as far as I know, does qualify as "functional and correct" but was ugly and annoying to use, as well as missing many useful features (like the concept of a timespan, or even just a date without a time-of-day).
(I assure you I'm not picking on you or your answer!)
I seem to remember the article which I unfortunately cannot conjure up right now demolished java.util.Calendar as well (again, as in "wrong", not as in "difficult to use"). I'm pretty confident of this because I never used Java 1.x at work, and when I read the article it was relevant to my day job.
The contract of Random requires determinism, so the algorithm can never be changed.
> If two instances of Random are created with the same seed, and the same sequence of method calls is made for each, they will generate and return identical sequences of numbers. In order to guarantee this property, particular algorithms are specified for the class Random. Java implementations must use all the algorithms shown here for the class Random, for the sake of absolute portability of Java code
It isn't terrible, it is just not suitable for cryptographic applications. I suspect most of the time that people want a random number up to some value, it is not for cryptographic purposes.
Mind you, I didn't call it terrible. However, the comment I was replying to -- coincidentally the author of TFA on modulo bias we are discussing -- did:
> java.util.Random is not a CSPRNG at all and is terrible, so even tho the nextInt() method is using rejection sampling, it's still producing biased values and also completely fails to be "unpredictable" because java.util.Random is weak and predictable.
and
> [...] using fast, but bad random generators such as Java's Random was shown to be an issue multiple times in the past already for things such as Monte Carlo, and so on, not just for things related to security.
Of course, "CSPRNG" means "cryptographically secure", but his comment made me think he considers it a terrible implementation regardless.
>his comment made me think he considers it a terrible implementation regardless.
Different people have different standards for what is "terrible". There are much better PRNGs that are just as fast. You should probably avoid java.util.Random if you care about the quality of the randomness it produces.
It's good enough for games that don't involve real money.
Well, the person who claims it's terrible is the author of the very thorough article we are discussing.
> You should probably avoid java.util.Random if you care about the quality of the randomness it produces.
That's pretty much the definition of "bad" ;) If you don't care about quality, I assume you could use a string of numbers without any randomness at all.
As for games, it depends. I'm sure one could make a guess it's bad for some kinds of non-betting games and simulations as well.
As you said: Java's Random is not meant for serious cryptographic usage, it's meant to be super fast.
You have SecureRandom instead for cryptographic usage, and it's noticeably slower.
Interestingly, using fast, but bad random generators such as Java's Random was shown to be an issue multiple times in the past already for things such as Monte Carlo, and so on, not just for things related to security.