Hacker News new | past | comments | ask | show | jobs | submit login
The bell has tolled for rand() (2014) (frih.net)
71 points by methanic on June 6, 2017 | hide | past | favorite | 48 comments



> For reasons I can’t fathom, the C standard mandates that rand() must always start as if it were seeded with 1. I can’t imagine why anyone thought that was a good idea. What is the sense of making a random number facility that – by default – isn’t random?

It's not a random number facility, it's a pseudorandom number facility. And, unless you've explicitly seeded it from an unpredictable source, the results are consistent across runs with otherwise similar contexts, which is a desirable feature in many of the domains PRNGs are used for (it's undesirable when you need an RNG instead of a PRNG, but then so is using a PRNG like C-standard rand() in that case.)


It goes without saying the determinism of PRNGs can be a very useful property. But if you need that property, you if can explicitly seed it from a random source, you can also explicitly seed it with 1 if you wanted.

The PRNG/RNG distinction you make is slightly false. Even if you're doing cryptography usually don't want to use a true source of entropy directly, you're better off feeding several sources of entropy into a PRNG. That's what both /dev/random and /dev/urandom give you (despite some common misconceptions).

What most developers really want is a non-deterministic (randomly seeded) PRNG. When you actually want the PRNG to be deterministic, you usually also need to seed it explicitly anyway (e.g. having multiple runs in the same app or allowing the user to specify a seed to generate a map in a game). The implicit is useful for very few developers (if any), and confusing for the vast majority of junior developers out there.


I'm not sure whether it makes sense for C++ to be deprecating common C standard library functions, before they've been deprecated and replaced in C. `rand(3)` is certainly very outdated and poorly-specified, but not inherently buggy regardless of usage scenario like `gets(3)` or C++ `std::auto_ptr`. And anyone using `rand(3)` for cryptographic purposes is much too far gone to be helped.


Someone converted our codebase into a multi threaded version, didn't realize that rand() is not thread safe as multiple threads were generating same random number from time to time.

PS: that someone was me


It's an easy mistake to make. rand() is built around hidden, global mutable state. It's fundamentally a bad design, and it's the same reason I dislike the std::randint proposal.


std::randint is supposed to use thread-local storage for its state. At least according to cppreference: http://en.cppreference.com/w/cpp/experimental/randint


It's not the thread-safety itself that's the issue. The thing is that when you can see the state, it's obvious what can influence what. When you hide the state, there's more documentation to read and interactions are less obvious. "Show me your tables..."


Some rand() implementation are really really bad for pretty much anything.

Obviously, it is not cryptographically secure but these RNGs are a class of their own and are typically much slower than needed for other applications.

The problem with rand() is that it may be unseeded (same result each time you run the program), have a very short period, very short range, and fail even basic randomness tests. Fill an image with rand() results and it may not look random at all.

It may not be good enough for monte-carlo simulations, fuzz testing, probabilistic algorithms and structures, etc... Some rand() implementations are too poor even for casual games.

A good example of a terrible RNG causing trouble is in the Monster Hunter 3 Ultimate game. There are some randomly generated items in it that cannot be accessed because of the RNG. Players call it "charm tables" but it is just a result of a really poor RNG.


I see no reason not to deprecate things... removing them is another matter entirely.


The point that rand() detractors are trying to make, is that rand() is also bad for non-cryptographic purposes as well. Its badness vary with the implementation, but with the implicit seeding, thread-safety issues and bad distribution on many implementations it should be generally avoided for all usage scenarios that want random numbers.

If you don't really want random numbers, but just "some numbers, I don't care if they're random or not", rand() is fine, but using "int myNumber = 42" is faster and safer.


It doesn't make sense at all. std::<c-lib-function> is supposed to just be a way of accessing the C library from C++. C++ shouldn't even be specifying it; just citing it as a normative reference. (That said, C++ does a few things with some of the functions, like useful polymorphism on char versus unsigned char in some of the string ones and such.)


OpenBSD decided to violate the standard and return non-deterministic results:

http://man.openbsd.org/rand.3


Doing the safer and slower thing seems better on the surface than just giving up and not compiling, and the article doesn't talk about this option. Could anyone chime in with some problems with OpenBSD's decision?


There's lots of uses for random but predictable numbers, e.g. adding random fuzz to a test suite run that you'd like to be able to repeat.

A weak random number generator that usually powers rand() is also usually faster and takes less CPU to run.

Both of these things were thought to be more important in more innocent times when C was standardized.

OpenBSD's decision makes sense on balance, it gives some security to otherwise insecure programs by default, the trade-off is breaking programs written against the C standard which make assumptions about the documented rand() behavior.


> There's lots of uses for random but predictable numbers, e.g. adding random fuzz to a test suite run that you'd like to be able to repeat.

One thing to note though, different platforms don't necessarily implement the rand() function the same way, the standard does not define an algorithm for it, so unless you want your tests to run differently on every different C library it might not be the best idea to begin with.


Running the test randomly is fine - indeed, desirable for fuzz tests (some eventual run might uncover a new issue). The requirement here is to have reproducibility - i.e. if the test fails, it should be able to log some kind of state, like a seed, that allows you to reproduce that same failure exactly. Obviously you'd need to also use the same environment, but that's usually not a problem.


For portable code you will not know if rand() will give you good or bad random though. A new name or no rand() at all would allow you to not get crap at least.


This is an article from 2014, previous discussion here: https://news.ycombinator.com/item?id=9037151


There are a couple comments from that discussion that mirror my general take.

The standard should be defining something that is good for the general case, without a lot of levers. KISS! That means giving people 4 different PRNG algorithms, plus a hardware interface, etc, is not the right direction. Your average programmer shouldn't have to make a decision more complex than "secure" or "fast" and the API should clearly make the distinction.

In this case, the linux kernel finds about the right match with /dev/random and /dev/urandom. If you want something more complex then you can pick up a 3rd party library and spend 6 months learning about random numbers, how to test their randomness, seed a CSPRNG, etc. Otherwise leave the implementation choices to the experts and make it easy for the common programmer.


On Linux /dev/random and /dev/urandom are rather oddly misshapen abstractions. Neither device has sensible behaviour. /dev/random blocks unnecessarily after seeding, and /dev/urandom doesn't block even if it hasn't been seeded yet. Contrast that with FreeBSD, which provides exactly the behaviour I'd want: block until the RNG has been seeded, then never block again.


> /dev/random blocks unnecessarily after seeding

That's because it's supposed to be a random-number-generator, not a pseudo-random-number-generator. It needs real entropy.


I am not a cryptographer, but my impression is that the points at which /dev/random stops are not well-justified. It doesn't directly pass out entropy, but instead feeds it through a cryptographically-secure PRNG first, making it a rather arbitrary line that's drawn to decide when entropy is 'exhausted'.

There was a brief HN discussion on the topic when FreeBSD made their change: https://news.ycombinator.com/item?id=8550457


Right, but the point is that the entropy can be reused.

I mean, maybe you say that my next 4,096 bits of entropy need to be completely independent from the last 4,096, but ib the usual case, I doubt it.


Thanks—added above.


"gets()" "sprintf", "strcat", and the other buffer-overflow friendly C library functions should have been deprecated in the 1990s. It's not too late. Force maintainers to write

    #include <unsafe-deprecated-c-library-functions>
if they really want them.


Make a #pragma for deprecated functions and then harass people that use them if they want with a -w flag.


> The idea is that there should be a thread local random number engine – possibly hidden, but it might actually be exposed to the user; that’s still being decided. That will most likely be a Mersenne Twister – because that’s just the best all-round pseudo-random number generator, and it generates excellent “random” sequences.

I think this is a nice article, but this part stood out to me. I've heard many times here on HN that it's one of the worst non-cryptographic pseudo-random number generators in common use.

I couldn't find the specific article, but this Q/A had some good information and links: https://cs.stackexchange.com/questions/50059/why-is-the-mers...


10 years ago the Mersenne Twister was good enough, nowadays not anymore. I wrote this: https://cs.stackexchange.com/questions/50059/why-is-the-mers...


> rand() is specified to always be seeded the same way.

> For reasons I can’t fathom, the C standard mandates that rand() must always start as if it were seeded with 1. I can’t imagine why anyone thought that was a good idea. What is the sense of making a random number facility that – by default – isn’t random?

This seems like a good decision to me. There are plenty of use cases for arbitrary values that are consistent across runs (for instance, repeatable unit tests). If the application calls for non-determinism, let the application author do it themselves from an appropriate seed value.

(I maybe somewhat biased from my experiences threading random number generator state in Haskell programs.)


I would argue the opposite - while there are many cases where you want values to be consistent across runs, I don't think they're the majority of the casual use of rand(). And especially for something that has "random" right there in the name, it is a very strange and counter-intuitive default.

Furthermore, even for those cases where you do want consistency, I doubt you'd want a hardcoded seed value. Most of such cases allow the seed to be parametrized between runs without recompiling (via command line arguments, configs etc), and so they'll still need to invoke srand to set the seed.

On the other hand, saying that without srand, the default is random, would actually allow modern implementations to e.g. initialize from /dev/random. Right now there's basically no cross-platform way to set a truly random seed in ISO C.


> If the application calls for non-determinism, let the application author do it themselves from an appropriate seed value.

It would be helpful if the c standard included a source for an appropriate seed value if that's the policy (which I think is wrong, given the naming it should be on those wanting reproducible behavior to flip a switch or set a fixed seed. Slow default is better than insecure default)


To my mind this is ludicrous. Rand() is useful for many common educational assignments. It's also useful for many applications which want pseudo-random numbers for purposes unrelated to security.

I recently needed some pseudo-randoms, and discovered that I now have to stick a bunch of boilerplate in the code to obtain them. Is there really a difference between me sticking the boilerplate in vs having a function called rand() that does the boilerplate for me?


When I was taught C in the early 2010's these caveats were added as a way to show us the limitations inherit in data type ranges and make us wary of all libraries we would take at face value. I think that lecture was one of the most memorable of my college education. While I understand its limitations, would someone explain why the author is so adamant that classes should not teach it as a lesson on the practical limitations of programming?


That is not the author's claim. The author's claim is that even knowing its limitations, it's still not good to use in even simple situations. This is a different claim from yours, which is that it's useful to demonstrate limitations imposed by API design.


Well then why not allow it to be used in practical exercises? What good is a lecture without practical application?

Edit: The author does state "Responsible teachers and tutors should stop teaching it to new C++ programmers"; I misinterpreted his point.


C++11's random number facility isn't as good as all that.

Here are some of the pain points I'm aware of:

There's no way to use the random device to uniformly seed the internal state of an arbitrary pseudorandom generator. In fact, the usual thing to write will seed with only 32 bits of randomness. (this has been noted by others in this thread)

Some of the floating-point functionality such as std::uniform_real_distribution and generate_canonical have trouble where the text specifies half-open intervals like [0,1) but the (pseudo)code in practice gives closed intervals like [0,1]. Usually this can only be observed when generating values of type 'float'. This makes me generally suspect that the real number distributions in the standard library are probably poorly specified in general. (in fact, the original report that led to issue 2524 was in about the exponential distribution, which per the standard must be defined in terms of generate_canonical) http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-active.html#...

Finally, there is support for some frankly weird things in the basic random engines: you can specify a random engine that only returns the numbers 37 and 38, for instance. It turns out that e.g., std::linear_congruential_engine, including std::minstd_rand, can have a minimum possible output value of 1 rather than 0, which in turn makes the random engine's min() method necessary. But why not require engines to generate integral values on the closed interval [0..n], and make it the responsibility of linear_congruential_engine to subtract 1 before returning? A random number generator with (2^31)-2 distinct possible values is going to make for much weirder statistical anomalies than one with period 2^x when someone uses them carelessly (and what other way is there to use random numbers?).


Another flaw with generate_canonical<float> is that if you generate, say, 2^40 of the things, you should expect to get quite a few values smaller than 2^-32; but with minstd_rand, or mt19937 I believe you will not ever see values smaller than 2^-32 or so. A similar problem exists with double, though it takes a much longer-running problem to prove it (ain't nobody got time to generate 2^61 random numbers)


Normally reusing the standard implementation is the right way to do things. But in the case of C's rand(), it's so awful that PHP's rand() using it under the hood was considered a serious flaw in PHP, and has now been replaced with Mersenne Twister. (Which is itself not a particularly good PRNG, but it's so much better than rand() you'd be forgiven for ignoring that.)


Catching up with 1980's Lisp:

http://clhs.lisp.se/Body/f_mk_rnd.htm


Oh look, yet another person seeding mt19937 poorly. I see them like clockwork. There was a lot of good discussion about this the last time I mentioned it to someone completely different, so here's that before I copy-paste it: https://www.reddit.com/r/programming/comments/6e10b8/when_ra...

---

Your code

    thread_local static std::random_device rd{};
    thread_local static std::mt19937 engine{rd()};
does not properly seed a mt19937.

It is unfortunate how people keep trying to use `std::mt19937` and, unsurprisingly given how hard it is to use, how they (almost)[0] inevitably fail. Do yourself a favour and use a randomness library that doesn't hate you[1], is simple to seed correctly, produces decent quality randomness, isn't horribly bloated and has more features to boot. It's 2017[2]. Let the Mersenne Twister die.

[0]: https://codereview.stackexchange.com/q/114066/40768

[1]: http://www.pcg-random.org/ or http://xoroshiro.di.unimi.it/

[2]: Well, late 2014 in the article, but close enough.


I notice that the PCG paper, while seemingly drafted in 2015 or earlier, still has not appeared in ACM TOMS. I thought the jokey/insider-y style is kind of fun, but VERY different from the usual voice of the journal. I wonder what the cause of the delay is?


For the lazy reader:

The issue is that rd() returns at most a 64-bit int, whilst mt19937 has 19937 bits of state. This means you are very restricted.

Note that, for example linux's /dev/random uses 4069 bits of entropy.


Thanks for this comment. I learned something!


Are there any good random numbers libraries for C?


Providing good random numbers should be done by the OS (kernel and libc). Don't try to fake it in userland.

   Please wiggle your mouse for 1 minute so we can gather entropy for you AES key.
REALLY?


I always though that was supposed to feed /dev/urandom.


/dev/urandom is a perfectly good source for random numbers. Because it's fed from the kernel which is gathering all kinds of entropy, including mouse movement interrupt timings. Bad implementations not considered.

One problem with /dev/urandom is you always have to open it, create a filedescriptor just to get to some random number!?


Fast? Unpredictable based on previous output if the seed is truly random? Here is a read I enjoyed on the subject: http://www.cs.yale.edu/homes/aspnes/pinewiki/C(2f)Randomizat...




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: