> We discuss how to configure and use turbid, which is a Hardware Random Number Generator (HRNG), also called a True Random Generator (TRNG). It is suitable for a wide range of applications, from the simplest benign applications to the most demanding high-stakes adversarial applications, including cryptography and gaming. It relies on a combination of physical process and cryptological algorithms, rather than either of those separately. It harvests randomness from physical processes, and uses that randomness efficiently. The hash saturation principle is used to distill the data, so that the output is virtually 100% random for all practical purposes. This is calculated based on physical properties of the inputs, not merely estimated by looking at the statistics of the outputs. In contrast to a Pseudo-Random Generator, it has no internal state to worry about. In particular, we describe a low-cost high-performance implementation, using the computer’s audio I/O system.
If someone can access it, my university notes where I looked into LCGs such as this discusses that in Numerical Recipes, 1992 C Edition, a clever LCG can be used to generates random numbers between 0 and 1. Using a 32 bit int generated by a fast LCG modulo 2^32. The cleverness is that you treat this value as a float. It uses a bit mask to mask only the mantissa of a float-32, and then is OR'd with a value to set the exponent to 127. Subtracting 1 from this results in a random value between 0 and 1, using only simple or bitwise operations!
This is well-known, and almost trivial.
But because I am curious I looked this up in NR 3rd edition. Section 7.1.5 says
> The steps above that convert a 64-bit integer to a double precision floating-point value involves both a non-trivial type conversion and a 64-bit floating multiply. They are performance bottlenecks. One can instead directly move the random bits into the right place in the double word with a union structure, a mask, and some 64-bit logical operations; but in our experience this is not significantly faster
It goes on to describe a lagged Fibonacci generator which generates values directly as floating point.
This is a widely used method to map random integers to floating point numbers, but it has the disadvantage of wasting 1 bit of float mantissa precision.
On modern CPUs, its computational advantage over full-precision mapping methods, such as multiplication by a float, is not always clear [1].
The issue with this is approach that the interval [1,2) has much fewer representable values than [0,1) in IEEE-754 floats, so the range of generatable values is only a fraction of the representable values.
On modern hardware, you should instead use a count-leading-zeroes or count-trailing-zeroes instruction on a uniform bit pattern to directly generate the exponent. This is what is done in the Zig standard library:
You are right of course, but I would contend that generating as well sub-normal floats is in 99.9% of the cases a needless cost. The reason is that, more often than not, all this extra precision will be immediately lost in subsequent operations.
A very common situation is for instance to plug the random number in a log, in which case you need to use log(1-r) rather than log(r) to avoid an infinite at r=0. The problem is, by doing this simple subtraction you have already lost all the subnormal precision.
As pointed out in the comments of the post, this is an LCG[1] PRNG. The Wikipedia article has a good discussion on how to choose parameters, with examples of use in the wild from various popular projects over the last many decades.
And its first reference is to what I consider the definitive reference on the subject, at least for the state of the art in 1968: the Art of Computer Programming. Vol. 2: Seminumerical Algorithms.
I also think the only additions that might be needed to that are mentions of newer algorithms that are comparatively simple, but better. The subject itself is pretty well-trodden.
Knuth does mention the extremely simple additive random number generator
Xn = (Xn-24 + Xn-55) mod m
(And says that this can be used directly with floating point values to produce floats in the range [0,1))
It uses 55 words of state (initialized to not all be even), needs only about ten simple instructions to implement (in particular, no multiplications or divisions), has period of at least 2^55-1, and Knuth said it’s very good in practice, isn’t known to be bad, and ends with (in the 1980 second edition of volume 2, with the first edition being from 1968):
The only reason it is difficult to recommend sequence (7) wholeheartedly is that there is very little theory to prove that it does or does not have desirable randomness properties; essentially all we know for sure is that the period is very long, and this is not enough”
I can’t find anything more recent about this RNG. Has it been forgotten, or was it shown to be bad in some important sense?
In the third edition (1998) there are a few more notes, most importantly:
"Lagged Fibonacci generators have been used successfully in many situations since 1958, so it came as a shock to discover in the 1990s that they actually fail an extremely simple, non-contrived test for randomness" with reference to exercise 3.3.2-31 asking "prove that if we generate 79 consecutive random bits ... starting at a random point in the period, the probability is more than 51% that there are more 1s than 0s".
There's a recommendation to generate X numbers and only use the first Y an order of magnitude smaller, citing Lüscher's RANLUX, but I don't know how much that holds up either.
FYI if you only have the 2nd edition of volume 2, Knuth provides the full text of the changes from the 2nd to 3rd edition electronically: https://www-cs-faculty.stanford.edu/~knuth/taocp.html (search for "Errata for Volume 2 (2nd ed.)")
Since it's too late to edit the earlier post, this is the specifics around discarding (p571):
Lüscher's discarding technique can be used to avoid the bias towards 1s. For example, with lags 55 and 24, no deviation for randomness is observed for random walks of length 1001 when the numbers are generated in batches of 165, if only the first 55 numbers of each batch are used.
And this is when comments in code are important! Any random numbers without a source are immediate suspect to me, especially in something that needs to be secure. It will save your coworkers and peers time trying to figure out why it's there.
It’s an incantation that’s propagated for 50+ years because it’s minimal and effective. Over time, it’s been fully distilled to those properties.
Since comments aren’t essential to being minimal and effective, they don’t survive the distillation.
Think of it like a clever gist that got pasted and shared a hundred times. Even if the original source had explained every step in great detail, with inline comments and deep explanatory discourses and citations to prior art and etc, they’d eventually get trimmed away as fat as people repeatedly prune it down to some “important” bits pasted into their own copies and then later share those trimmed copies, ad infinitum.
Since it's an LCG, it does not matter how it was derived as long as the randomness properties are known. Such as cycle length, identical initial state set and dispersion properties. Perhaps also performance.
These should be documented.
It's a rather weak PRNG of short cycle, so the suspicion is that it's made for particular dispersion properties, such as for a hash table of particular data and size or other bucketing algorithm.
It is interesting -- 5/9 of the books listed there are clearly numerics textbooks, so it isn't surprising they talked about a non-cryptographic PRNG. Curious about the other 4. But maybe this shows up as a "here's why you should be careful what type of RNG you are using" type example.
When you find this algorithm with or without any comments from where the numbers come from in a point that should be secure, you should be more than suspect. In any way, this is not a secure PRNG.
For PRNGs like this, constants are often chosen by guess and check. This algorithm (an LCG) has a bit more theory, so these might not have been chosen that way, but the author of the code probably didn't have any insight either.
Seed 23478978945674839578275983475348957348597345984375345834753498 produces 0.4444444444444444, am I insanely lucky in my keyboard spam or is this a common output?
The person in the comments said the period is only ~233k, which is correct, because you are always taking modulo 233280. But it's also easy to verify experimentally:
let state = 6, iters = 0;
do {
state = (state * 9301 + 49297) % 233280;
iters++;
} while(state !== 6);
console.log("period is", iters, "iters");
//period is 233280 iters
> so that the multiplication and addition can be done on a signed 32-bit integer without overflow
Except they can't: any seed in the range 230888 thru 233279 will produce a signed integer overflow when multiplied by 9301 (eg 9301*230888 = 0x80001608).
That depends on the language. For example BASIC used to have only 16-bit or 32-bit signed integers (depending on the processor; plus strings and floating point), so this particular RNG can be useful for its portability.
Such dialects of BASIC also offered defined overflow behavior (it's really only C/C++ and other things targeting MCUs that don't - and even three also only relatively recently that "undefined overflow" meant something more practically dangerous than "implementation defined") so it was still not a concern.
In case people don't know this story, or worse, believe the conspiracy theories, here's what happened with the development of DES:
> "According to Steven Levy, IBM Watson researchers discovered differential cryptanalytic attacks in 1974 and were asked by the NSA to keep the technique secret. Coppersmith explains IBM's secrecy decision by saying, "that was because [differential cryptanalysis] can be a very powerful tool, used against many schemes, and there was concern that such information in the public domain could adversely affect national security." Levy quotes Walter Tuchman: "[t]hey asked us to stamp all our documents confidential... We actually put a number on each one and locked them up in safes, because they were considered U.S. government classified. They said do it. So I did it".
Bruce Schneier observed that "It took the academic community two decades to figure out that the NSA 'tweaks' actually improved the security of DES."
In case people don't know this story, here's what happened with the development of Dual_EC_DBRG:
> Sometime before its first known publication in 2004, a possible kleptographic backdoor was discovered with the Dual_EC_DRBG's design, with the design of Dual_EC_DRBG having the unusual property that it was theoretically impossible for anyone but Dual_EC_DRBG's designers (NSA) to confirm the backdoor's existence. Bruce Schneier concluded shortly after standardization that the "rather obvious" backdoor (along with other deficiencies) would mean that nobody would use Dual_EC_DRBG.[4] The backdoor would allow NSA to decrypt for example SSL/TLS encryption which used Dual_EC_DRBG as a CSPRNG.[5]
> Members of the ANSI standard group, to which Dual_EC_DRBG was first submitted, were aware of the exact mechanism of the potential backdoor and how to disable it,[6] but did not take sufficient steps to unconditionally disable the backdoor or to widely publicize it. The general cryptographic community was initially not aware of the potential backdoor, until Dan Shumow and Niels Ferguson's publication, or of Certicom's Daniel R. L. Brown and Scott Vanstone's 2005 patent application describing the backdoor mechanism.
> In September 2013, The New York Times reported that internal NSA memos leaked by Edward Snowden indicated that the NSA had worked during the standardization process to eventually become the sole editor of the Dual_EC_DRBG standard,[7] and concluded that the Dual_EC_DRBG standard did indeed contain a backdoor for the NSA.[8] As response, NIST stated that "NIST would not deliberately weaken a cryptographic standard."[9] According to the New York Times story, the NSA spends $250 million per year to insert backdoors in software and hardware as part of the Bullrun program.[10] A Presidential advisory committee subsequently set up to examine NSA's conduct recommended among other things that the US government "fully support and not undermine efforts to create encryption standards".[11]
Ah, nice. I was thinking along the lines of the 1960s, as that's when the seeds in the algorithm above were created. So I assumed the comment was about the NSA messing with it back then. Didn't think about recent events.
This is what bugs me about the stackexchange model. Most of the comments are guesses by people who really don't know. The "correct" answer doesn't explain anything, it just points to some sources. The highest ranked comment that comes closest, attempts to explain the nature of the numbers (the modulous is the period, and the primes stretch out the cycle) and fires off a link to wikipedia.
Then it appears on HN, and more people half guessing, but better content in general.
Based on yesterday's post, BYTE magazine would have actually given an answer with both why and how.
EDIT: Granted, the link DOES go to Wikipedia which has a great explanation, but in general, providing a link and not answer is a poor model because links decay so quickly.
It took me a while to understand your comment, until I realized there are two interpretations of the question of where these numbers come from:
• One is a historical question: who first used these numbers and when? who copied from where?
• The other is a conceptual question: why might someone choose these numbers, what do they "mean", how do we interpret these constants?
The former is purely a question of facts and there is nothing to "explain the nature of the numbers", and this is what the answerer on Stack Exchange seems to have interpreted it as. Under this interpretation saying things like "the modulus is the period" is not really germane to the question; what's required is to search sources and try to find the chronologically earliest one. (The answer doesn't seem to have done a definitive job, but it's a reasonable start.)
This is like synchronic vs diachronic in linguistics: in the latter interpretation you mainly just want to understand the random-number generator.
Reading the question it seems pretty clear to me they're interested primarily in the history: "Who invented this algorithm and tested the distribution? Is there a paper or something to cite?"
Yes that's what I thought too (which is why I found the complaint hard to understand until I thought of the alternative interpretation). But clearly at least the person I was replying to understood it differently, and it was also the top comment on this thread for a while at the time I posted my comment, so others may have thought the same too. Even at the question, on a comment on the answer the asker mentions "why these numbers are special" and comments about them being prime etc, so it's possible they didn't have a clear idea what they wanted either.
One of my favorite answers was about programming languages written by a Jörg W Mittag. It was a comparison between shells like bash and proper programming languages and featured a huge table comparing syntax, showing how languages are supposed to scale up in complexity while shells are supposed to scale down.
Never found it again on the site. Years ago I remember digging up a link to the question but got the 404 page when I followed it.
I think I found the answer you mean! https://stackoverflow.com/questions/3637668/why-are-scriptin... — the topic matches, there's a table comparing syntax (though I wouldn't call it huge), and it includes the idea of scaling up to large programs vs scaling down to 10 character programs.
Yeah, that's the one. It must have been years since I tried looking for it. I didn't know it had been restored... Thanks for looking it up! I can see the programmer jokes question as well.
> This is what bugs me about the stackexchange model.
> providing a link and not answer is a poor model because links decay so quickly.
Providing a link and not answer is not the stackexchange model. On the Stack Exchange network, such answers are often (not always, of course) commented with a request to expand the answer.
That might be true in many places, and I agree with the general sentiment that the answer should provide an answer and not just a link to an answer. But wikipedia links are generally pretty stable. It’s not some random blog post.
Oh, good point. OP was looking for original source, papers referring to, etc. Too late to edit, but minus points for me for not comprehending the original question.
I don't think you were far off what the asker was looking for. In a comment, they say "I was kind of hoping to be able to point to some specific research as to why these numbers are special".
What's NR, what's a necronomicon and what's klaatu, barads and nikto.
I googled all these things and I can't still make any sense of your comment except that it's intentionally confusing with some popular fiction references?
"Klaatu barada nikto" is a code phrase from The Day the Earth Stood Still to stop the homicidal antagonists from killing you.
It's doubly famous because Sam Raimi recycled it in "Army of Darkness" as an incantation to dispel the curse of the Necronomicon (book of the dead, portrayed as the arch book of black magic in any number of stories), which our hapless anti-hero Ash fucks up and thus unleashes the armies of the dead on a roughly medieval and wholly unprepared world.
If you like campy horror movies, there is so much camp in Army of Darkness that there's barely any room for the horror. It is technically a sequel but it works as a standalone movie. I watched it years before I finally sat down and watched The Evil Dead.
NR is apparently Numerical Recipes, a book. I think they mean that the constants are not well understood, like the phrase "Klaatu barada nikto" was not understood, but still had an effect.
In addition to what the other answers have said, the Necronomicon is the book of dark magic in the Lovecraft mythos, from which it has been copied into some other mythoi.
Yeah, I'm pretty sure this isn't the same kind of thing as Carmack's (edit: not Carmack's) legendary inverse square root function [1], which has that 0x5F3759DF constant (edit: magic number):
float Q_rsqrt( float number )
{
long i;
float x2, y;
const float threehalfs = 1.5F;
x2 = number \* 0.5F;
y = number;
i = \* ( long \* ) &y; // evil floating point bit level hacking
i = 0x5f3759df - ( i >> 1 ); // what the fuck?
y = \* ( float \* ) &i;
y = y \* ( threehalfs - ( x2 \* y \* y ) ); // 1st iteration
// y = y \* ( threehalfs - ( x2 \* y \* y ) ); // 2nd iteration, this can be removed
return y;
}
Not Carmack's, it says so in the link you provided.
> The algorithm was originally attributed to John Carmack, but an investigation showed that the code had deeper roots in the hardware and software side of computer graphics. Adjustments and alterations passed through both Silicon Graphics and 3dfx Interactive, with the original constant being derived in a collaboration between Cleve Moler and Gregory Walsh, while Gregory worked for Ardent Computing in the late 1980s.[3] Walsh and Moler adapted their version from an unpublished paper by William Kahan and K.C. Ng circulated in May 1986.
I think the comment author meant "a magical constant that has been found by someone by empirical method without a nice paper pointing why it's a good value"
(even though I think now the constant above has a paper on why it's actually so good)
This was an example of a seemingly random magic number appearing in code for a very clever hack. I think the OP was assuming there was something similar at work with the PRNG magic numbers.
> We discuss how to configure and use turbid, which is a Hardware Random Number Generator (HRNG), also called a True Random Generator (TRNG). It is suitable for a wide range of applications, from the simplest benign applications to the most demanding high-stakes adversarial applications, including cryptography and gaming. It relies on a combination of physical process and cryptological algorithms, rather than either of those separately. It harvests randomness from physical processes, and uses that randomness efficiently. The hash saturation principle is used to distill the data, so that the output is virtually 100% random for all practical purposes. This is calculated based on physical properties of the inputs, not merely estimated by looking at the statistics of the outputs. In contrast to a Pseudo-Random Generator, it has no internal state to worry about. In particular, we describe a low-cost high-performance implementation, using the computer’s audio I/O system.