Original source of `(seed * 9301 and 49297) % 233280` random algorithm? (2014)

walterbell · on Aug 6, 2022

John Denker's 2013 paper/code for using sound cards as input to a HRNG, https://www.av8n.com/turbid/

> We discuss how to configure and use turbid, which is a Hardware Random Number Generator (HRNG), also called a True Random Generator (TRNG). It is suitable for a wide range of applications, from the simplest benign applications to the most demanding high-stakes adversarial applications, including cryptography and gaming. It relies on a combination of physical process and cryptological algorithms, rather than either of those separately. It harvests randomness from physical processes, and uses that randomness efficiently. The hash saturation principle is used to distill the data, so that the output is virtually 100% random for all practical purposes. This is calculated based on physical properties of the inputs, not merely estimated by looking at the statistics of the outputs. In contrast to a Pseudo-Random Generator, it has no internal state to worry about. In particular, we describe a low-cost high-performance implementation, using the computer’s audio I/O system.

snet0 · on Aug 6, 2022

If someone can access it, my university notes where I looked into LCGs such as this discusses that in Numerical Recipes, 1992 C Edition, a clever LCG can be used to generates random numbers between 0 and 1. Using a 32 bit int generated by a fast LCG modulo 2^32. The cleverness is that you treat this value as a float. It uses a bit mask to mask only the mantissa of a float-32, and then is OR'd with a value to set the exponent to 127. Subtracting 1 from this results in a random value between 0 and 1, using only simple or bitwise operations!

ralphb · on Aug 6, 2022

This is well-known, and almost trivial. But because I am curious I looked this up in NR 3rd edition. Section 7.1.5 says

> The steps above that convert a 64-bit integer to a double precision floating-point value involves both a non-trivial type conversion and a 64-bit floating multiply. They are performance bottlenecks. One can instead directly move the random bits into the right place in the double word with a union structure, a mask, and some 64-bit logical operations; but in our experience this is not significantly faster

It goes on to describe a lagged Fibonacci generator which generates values directly as floating point.

binarycoffee · on Aug 6, 2022

This is a widely used method to map random integers to floating point numbers, but it has the disadvantage of wasting 1 bit of float mantissa precision.

On modern CPUs, its computational advantage over full-precision mapping methods, such as multiplication by a float, is not always clear [1].

[1] https://github.com/rust-random/rand/issues/416

10000truths · on Aug 6, 2022

The issue with this is approach that the interval [1,2) has much fewer representable values than [0,1) in IEEE-754 floats, so the range of generatable values is only a fraction of the representable values.

On modern hardware, you should instead use a count-leading-zeroes or count-trailing-zeroes instruction on a uniform bit pattern to directly generate the exponent. This is what is done in the Zig standard library:

https://github.com/ziglang/zig/pull/10428

binarycoffee · on Aug 6, 2022

You are right of course, but I would contend that generating as well sub-normal floats is in 99.9% of the cases a needless cost. The reason is that, more often than not, all this extra precision will be immediately lost in subsequent operations.

A very common situation is for instance to plug the random number in a log, in which case you need to use log(1-r) rather than log(r) to avoid an infinite at r=0. The problem is, by doing this simple subtraction you have already lost all the subnormal precision.

montroser · on Aug 6, 2022

As pointed out in the comments of the post, this is an LCG[1] PRNG. The Wikipedia article has a good discussion on how to choose parameters, with examples of use in the wild from various popular projects over the last many decades.

[1]: https://en.wikipedia.org/wiki/Linear_congruential_generator

Someone · on Aug 6, 2022

And its first reference is to what I consider the definitive reference on the subject, at least for the state of the art in 1968: the Art of Computer Programming. Vol. 2: Seminumerical Algorithms.

I also think the only additions that might be needed to that are mentions of newer algorithms that are comparatively simple, but better. The subject itself is pretty well-trodden.

Knuth does mention the extremely simple additive random number generator

  Xn = (Xn-24 + Xn-55) mod m

(And says that this can be used directly with floating point values to produce floats in the range [0,1))

It uses 55 words of state (initialized to not all be even), needs only about ten simple instructions to implement (in particular, no multiplications or divisions), has period of at least 2^55-1, and Knuth said it’s very good in practice, isn’t known to be bad, and ends with (in the 1980 second edition of volume 2, with the first edition being from 1968):

The only reason it is difficult to recommend sequence (7) wholeheartedly is that there is very little theory to prove that it does or does not have desirable randomness properties; essentially all we know for sure is that the period is very long, and this is not enough”

I can’t find anything more recent about this RNG. Has it been forgotten, or was it shown to be bad in some important sense?

hcs · on Aug 6, 2022

It's a lagged Fibonacci generator https://en.wikipedia.org/wiki/Lagged_Fibonacci_generator

In the third edition (1998) there are a few more notes, most importantly:

"Lagged Fibonacci generators have been used successfully in many situations since 1958, so it came as a shock to discover in the 1990s that they actually fail an extremely simple, non-contrived test for randomness" with reference to exercise 3.3.2-31 asking "prove that if we generate 79 consecutive random bits ... starting at a random point in the period, the probability is more than 51% that there are more 1s than 0s".

The solution references this paper on the tests: https://arxiv.org/abs/cond-mat/9406054

There's a recommendation to generate X numbers and only use the first Y an order of magnitude smaller, citing Lüscher's RANLUX, but I don't know how much that holds up either.

Someone · on Aug 6, 2022

Thanks! I focused too much on “additive” as a search term. I even looked at https://en.wikipedia.org/wiki/List_of_random_number_generato..., and would have found that better name if I had searched for “Mitchell” or “Moore” there, not “additive”.

hcs · on Aug 6, 2022

FYI if you only have the 2nd edition of volume 2, Knuth provides the full text of the changes from the 2nd to 3rd edition electronically: https://www-cs-faculty.stanford.edu/~knuth/taocp.html (search for "Errata for Volume 2 (2nd ed.)")

Since it's too late to edit the earlier post, this is the specifics around discarding (p571):

Lüscher's discarding technique can be used to avoid the bias towards 1s. For example, with lags 55 and 24, no deviation for randomness is observed for random walks of length 1001 when the numbers are generated in batches of 165, if only the first 55 numbers of each batch are used.

fuglede_ · on Aug 7, 2022

Fun fact: .NET used Knuth's simple form for their PRNG. Except for some reason they messed up and used Xn-34 instead of Xn-24: https://github.com/dotnet/runtime/issues/23198

overshard · on Aug 6, 2022

And this is when comments in code are important! Any random numbers without a source are immediate suspect to me, especially in something that needs to be secure. It will save your coworkers and peers time trying to figure out why it's there.

swatcoder · on Aug 6, 2022

That’s true, but this thing is not code any more.

It’s an incantation that’s propagated for 50+ years because it’s minimal and effective. Over time, it’s been fully distilled to those properties.

Since comments aren’t essential to being minimal and effective, they don’t survive the distillation.

Think of it like a clever gist that got pasted and shared a hundred times. Even if the original source had explained every step in great detail, with inline comments and deep explanatory discourses and citations to prior art and etc, they’d eventually get trimmed away as fat as people repeatedly prune it down to some “important” bits pasted into their own copies and then later share those trimmed copies, ad infinitum.

This is that, but 50 years out.

AstralStorm · on Aug 6, 2022

Since it's an LCG, it does not matter how it was derived as long as the randomness properties are known. Such as cycle length, identical initial state set and dispersion properties. Perhaps also performance.

These should be documented.

It's a rather weak PRNG of short cycle, so the suspicion is that it's made for particular dispersion properties, such as for a hash table of particular data and size or other bucketing algorithm.

SideQuark · on Aug 6, 2022

>These should be documented.

They're trivial to look up, and any modern source would likely outlive the game of telephone of trying to keep such a comment intact correctly.

bee_rider · on Aug 6, 2022

It is interesting -- 5/9 of the books listed there are clearly numerics textbooks, so it isn't surprising they talked about a non-cryptographic PRNG. Curious about the other 4. But maybe this shows up as a "here's why you should be careful what type of RNG you are using" type example.

krater23 · on Aug 6, 2022

When you find this algorithm with or without any comments from where the numbers come from in a point that should be secure, you should be more than suspect. In any way, this is not a secure PRNG.

pclmulqdq · on Aug 6, 2022

For PRNGs like this, constants are often chosen by guess and check. This algorithm (an LCG) has a bit more theory, so these might not have been chosen that way, but the author of the code probably didn't have any insight either.

nailer · on Aug 7, 2022

Or just name your constants.

https://en.m.wikipedia.org/wiki/Magic_number_(programming)

Waterluvian · on Aug 6, 2022

“No magic numbers.”

eru · on Aug 6, 2022

You'd want https://en.wikipedia.org/wiki/Nothing-up-my-sleeve_number

Waterluvian · on Aug 6, 2022

That’s clever!

JKCalhoun · on Aug 6, 2022

Security through obscurity?

Maybe not.

kloch · on Aug 6, 2022

Here's a good article about this type of PRNG, with other sample seeds:

https://www.pcg-random.org/posts/does-it-beat-the-minimal-st...

uo21tp5hoyg · on Aug 7, 2022

Seed 23478978945674839578275983475348957348597345984375345834753498 produces 0.4444444444444444, am I insanely lucky in my keyboard spam or is this a common output?

pxeger1 · on Aug 7, 2022

Why did the + in the original title become "and"? Is this one of HN's stranger auto substitutions?

swayvil · on Aug 6, 2022

I think it fits the rng period to fit max 32 bit (4 bil). Somebody in the comments said similar so I feel good about this.

jffry · on Aug 6, 2022

The person in the comments said the period is only ~233k, which is correct, because you are always taking modulo 233280. But it's also easy to verify experimentally:

  let state = 6, iters = 0;
  do {
    state = (state * 9301 + 49297) % 233280;
    iters++;
  } while(state !== 6);
  console.log("period is", iters, "iters");
  //period is 233280 iters

If you instead took modulo 2^32, I believe the period would indeed be the maximal period, according to this article: https://en.wikipedia.org/wiki/Linear_congruential_generator#...

bonzini · on Aug 6, 2022

He means that the constants are chosen so that the multiplication and addition can be done on a signed 32-bit integer without overflow.

a1369209993 · on Aug 6, 2022

> so that the multiplication and addition can be done on a signed 32-bit integer without overflow

Except they can't: any seed in the range 230888 thru 233279 will produce a signed integer overflow when multiplied by 9301 (eg 9301*230888 = 0x80001608).

bonzini · on Aug 6, 2022

You're right, that makes no sense. Also a-1 (9300) should not have factors in common with m (233280) but it has a lot...

morelisp · on Aug 6, 2022

There's no reason to avoid overflow (nor any reason to make it signed since that would also be a concern these days) though.

bonzini · on Aug 6, 2022

That depends on the language. For example BASIC used to have only 16-bit or 32-bit signed integers (depending on the processor; plus strings and floating point), so this particular RNG can be useful for its portability.

morelisp · on Aug 6, 2022

Such dialects of BASIC also offered defined overflow behavior (it's really only C/C++ and other things targeting MCUs that don't - and even three also only relatively recently that "undefined overflow" meant something more practically dangerous than "implementation defined") so it was still not a concern.

bonzini · on Aug 6, 2022

Sure but signed and unsigned division are different.

eru · on Aug 7, 2022

There's no division in the algorithm, is there?

bonzini · on Aug 7, 2022

There's modulo which is the remainder of the division.

eru · on Aug 7, 2022

Oh, true!

aaaaaaaaaaab · on Aug 6, 2022

Probably the NSA...

russellbeattie · on Aug 6, 2022

In case people don't know this story, or worse, believe the conspiracy theories, here's what happened with the development of DES:

> "According to Steven Levy, IBM Watson researchers discovered differential cryptanalytic attacks in 1974 and were asked by the NSA to keep the technique secret. Coppersmith explains IBM's secrecy decision by saying, "that was because [differential cryptanalysis] can be a very powerful tool, used against many schemes, and there was concern that such information in the public domain could adversely affect national security." Levy quotes Walter Tuchman: "[t]hey asked us to stamp all our documents confidential... We actually put a number on each one and locked them up in safes, because they were considered U.S. government classified. They said do it. So I did it".

Bruce Schneier observed that "It took the academic community two decades to figure out that the NSA 'tweaks' actually improved the security of DES."

https://en.m.wikipedia.org/wiki/Data_Encryption_Standard

CrazyStat · on Aug 7, 2022

In case people don't know this story, here's what happened with the development of Dual_EC_DBRG:

> Sometime before its first known publication in 2004, a possible kleptographic backdoor was discovered with the Dual_EC_DRBG's design, with the design of Dual_EC_DRBG having the unusual property that it was theoretically impossible for anyone but Dual_EC_DRBG's designers (NSA) to confirm the backdoor's existence. Bruce Schneier concluded shortly after standardization that the "rather obvious" backdoor (along with other deficiencies) would mean that nobody would use Dual_EC_DRBG.[4] The backdoor would allow NSA to decrypt for example SSL/TLS encryption which used Dual_EC_DRBG as a CSPRNG.[5]

> Members of the ANSI standard group, to which Dual_EC_DRBG was first submitted, were aware of the exact mechanism of the potential backdoor and how to disable it,[6] but did not take sufficient steps to unconditionally disable the backdoor or to widely publicize it. The general cryptographic community was initially not aware of the potential backdoor, until Dan Shumow and Niels Ferguson's publication, or of Certicom's Daniel R. L. Brown and Scott Vanstone's 2005 patent application describing the backdoor mechanism.

> In September 2013, The New York Times reported that internal NSA memos leaked by Edward Snowden indicated that the NSA had worked during the standardization process to eventually become the sole editor of the Dual_EC_DRBG standard,[7] and concluded that the Dual_EC_DRBG standard did indeed contain a backdoor for the NSA.[8] As response, NIST stated that "NIST would not deliberately weaken a cryptographic standard."[9] According to the New York Times story, the NSA spends $250 million per year to insert backdoors in software and hardware as part of the Bullrun program.[10] A Presidential advisory committee subsequently set up to examine NSA's conduct recommended among other things that the US government "fully support and not undermine efforts to create encryption standards".[11]

https://en.wikipedia.org/wiki/Dual_EC_DRBG

russellbeattie · on Aug 7, 2022

Ah, nice. I was thinking along the lines of the 1960s, as that's when the seeds in the algorithm above were created. So I assumed the comment was about the NSA messing with it back then. Didn't think about recent events.

sbf501 · on Aug 6, 2022

This is what bugs me about the stackexchange model. Most of the comments are guesses by people who really don't know. The "correct" answer doesn't explain anything, it just points to some sources. The highest ranked comment that comes closest, attempts to explain the nature of the numbers (the modulous is the period, and the primes stretch out the cycle) and fires off a link to wikipedia.

Then it appears on HN, and more people half guessing, but better content in general.

Based on yesterday's post, BYTE magazine would have actually given an answer with both why and how.

EDIT: Granted, the link DOES go to Wikipedia which has a great explanation, but in general, providing a link and not answer is a poor model because links decay so quickly.

svat · on Aug 6, 2022

It took me a while to understand your comment, until I realized there are two interpretations of the question of where these numbers come from:

• One is a historical question: who first used these numbers and when? who copied from where?

• The other is a conceptual question: why might someone choose these numbers, what do they "mean", how do we interpret these constants?

The former is purely a question of facts and there is nothing to "explain the nature of the numbers", and this is what the answerer on Stack Exchange seems to have interpreted it as. Under this interpretation saying things like "the modulus is the period" is not really germane to the question; what's required is to search sources and try to find the chronologically earliest one. (The answer doesn't seem to have done a definitive job, but it's a reasonable start.)

This is like synchronic vs diachronic in linguistics: in the latter interpretation you mainly just want to understand the random-number generator.

jefftk · on Aug 7, 2022

Reading the question it seems pretty clear to me they're interested primarily in the history: "Who invented this algorithm and tested the distribution? Is there a paper or something to cite?"

svat · on Aug 7, 2022

Yes that's what I thought too (which is why I found the complaint hard to understand until I thought of the alternative interpretation). But clearly at least the person I was replying to understood it differently, and it was also the top comment on this thread for a while at the time I posted my comment, so others may have thought the same too. Even at the question, on a comment on the answer the asker mentions "why these numbers are special" and comments about them being prime etc, so it's possible they didn't have a clear idea what they wanted either.

spullara · on Aug 6, 2022

I'd be willing to bet that a Wikipedia link will outlast a StackExchange answer.

matheusmoreira · on Aug 6, 2022

It absolutely will. Some of my favorite answers on Stack Overflow have been deleted and there seems to be no archive for such lost data.

cxcorp · on Aug 6, 2022

I usually save my bookmarks on the Wayback Machine just so there's at least one copy out there.

nerdponx · on Aug 6, 2022

Stackoverflow itself retains all deleted questions and answers. If you have enough points you can see them.

matheusmoreira · on Aug 7, 2022

How many points? Right now I have 16.7k.

nerdponx · on Aug 8, 2022

Less than that for sure. Maybe you need to enable some setting, like on HN?

kragen · on Aug 6, 2022

Which ones were they?

matheusmoreira · on Aug 7, 2022

One of my favorite answers was about programming languages written by a Jörg W Mittag. It was a comparison between shells like bash and proper programming languages and featured a huge table comparing syntax, showing how languages are supposed to scale up in complexity while shells are supposed to scale down.

Never found it again on the site. Years ago I remember digging up a link to the question but got the 404 page when I followed it.

svat · on Aug 7, 2022

I think I found the answer you mean! https://stackoverflow.com/questions/3637668/why-are-scriptin... — the topic matches, there's a table comparing syntax (though I wouldn't call it huge), and it includes the idea of scaling up to large programs vs scaling down to 10 character programs.

(The question was deleted in July 2014 and undeleted in December 2020: https://stackoverflow.com/posts/3637668/revisions , so right now anyone can see it. BTW with ≥10000 reputation one can see deleted posts, per https://stackoverflow.com/help/privileges and https://stackoverflow.com/help/privileges/moderator-tools — e.g., as you mentioned you have ~16k points, try the much-mirrored "programmer joke" deleted question (linked from e.g. https://meta.stackexchange.com/questions/76584).)

matheusmoreira · on Aug 10, 2022

Yeah, that's the one. It must have been years since I tried looking for it. I didn't know it had been restored... Thanks for looking it up! I can see the programmer jokes question as well.

kragen · on Aug 7, 2022

Aw. I really enjoy his answers in general.

I wonder if the answer survives in an old data dump or in the Archive?

pizza234 · on Aug 6, 2022

> This is what bugs me about the stackexchange model.

> providing a link and not answer is a poor model because links decay so quickly.

Providing a link and not answer is not the stackexchange model. On the Stack Exchange network, such answers are often (not always, of course) commented with a request to expand the answer.

nerdponx · on Aug 6, 2022

And it's not just accepted but encouraged to comment on old questions and answers that are either wrong or not up to current standards.

phalangion · on Aug 6, 2022

> links decay quickly

That might be true in many places, and I agree with the general sentiment that the answer should provide an answer and not just a link to an answer. But wikipedia links are generally pretty stable. It’s not some random blog post.

eru · on Aug 7, 2022

Wikipedia can change of course, but they are very good about preserving their history.

bluedino · on Aug 6, 2022

The title question was the original source. The answer was a list of places it was found, including the earliest.

sbf501 · on Aug 6, 2022

Oh, good point. OP was looking for original source, papers referring to, etc. Too late to edit, but minus points for me for not comprehending the original question.

Nition · on Aug 6, 2022

I don't think you were far off what the asker was looking for. In a comment, they say "I was kind of hoping to be able to point to some specific research as to why these numbers are special".

bluedino · on Aug 6, 2022

Didn't help that he asked a couple more questions at the end

Who invented this algorithm and tested the distribution? Is there a paper or something to cite?

That would have made another question.

a-dub · on Aug 6, 2022

NR has long been seen as a sort of necronomicon for scientific computer programming.

i don't think it's SE's fault as much as it is a historical rough edge between two fields (computer programming and applied mathematics).

the first constant is klaatu, the second is barada and the third is nikto.

(NR also tends to be fairly rigorous in terms of citations, for the curious)

ascar · on Aug 6, 2022

What's NR, what's a necronomicon and what's klaatu, barads and nikto.

I googled all these things and I can't still make any sense of your comment except that it's intentionally confusing with some popular fiction references?

hinkley · on Aug 6, 2022

"Klaatu barada nikto" is a code phrase from The Day the Earth Stood Still to stop the homicidal antagonists from killing you.

It's doubly famous because Sam Raimi recycled it in "Army of Darkness" as an incantation to dispel the curse of the Necronomicon (book of the dead, portrayed as the arch book of black magic in any number of stories), which our hapless anti-hero Ash fucks up and thus unleashes the armies of the dead on a roughly medieval and wholly unprepared world.

If you like campy horror movies, there is so much camp in Army of Darkness that there's barely any room for the horror. It is technically a sequel but it works as a standalone movie. I watched it years before I finally sat down and watched The Evil Dead.

Natsu · on Aug 6, 2022

Wasn't it also used in Ghost Busters at some point?

webstrand · on Aug 6, 2022

NR is apparently Numerical Recipes, a book. I think they mean that the constants are not well understood, like the phrase "Klaatu barada nikto" was not understood, but still had an effect.

kragen · on Aug 6, 2022

In addition to what the other answers have said, the Necronomicon is the book of dark magic in the Lovecraft mythos, from which it has been copied into some other mythoi.

svnpenn · on Aug 6, 2022

It's common hacker news trope for someone to do acronym diarrhea, even with rarely or never used abbreviations. It's quite annoying.

Dylan16807 · on Aug 6, 2022

NR is in the answer in the link.

I feel like SE being stack exchange is pretty clear here.

The remaining reference is not an acronym or abbreviation, nor do I think it looks like one.

adhesive_wombat · on Aug 6, 2022

I assume Numerical Recipes.

0xTJ · on Aug 6, 2022

The question isn't asking about how it works. It asks about where that algorithm comes from.

sbf501 · on Aug 7, 2022

Please read the other comments before adding a redundant one.

loldk · on Aug 6, 2022

Insecure devs always obfuscate and try to hide the whole truth. It is a massive problem in tech.

humanistbot · on Aug 6, 2022

Yeah, I'm pretty sure this isn't the same kind of thing as Carmack's (edit: not Carmack's) legendary inverse square root function [1], which has that 0x5F3759DF constant (edit: magic number):

[1] https://en.wikipedia.org/wiki/Fast_inverse_square_root

  float Q_rsqrt( float number )
  {
   long i;
   float x2, y;
   const float threehalfs = 1.5F;
  
   x2 = number \* 0.5F;
   y  = number;
   i  = \* ( long \* ) &y;                           // evil floating point bit level hacking
   i  = 0x5f3759df - ( i >> 1 );                     // what the fuck? 
   y  = \* ( float \* ) &i;
   y  = y \* ( threehalfs - ( x2 \* y \* y ) );      // 1st iteration
   // y  = y \* ( threehalfs - ( x2 \* y \* y ) );   // 2nd iteration, this can be removed
  
   return y;
  }

silisili · on Aug 6, 2022

Not Carmack's, it says so in the link you provided.

> The algorithm was originally attributed to John Carmack, but an investigation showed that the code had deeper roots in the hardware and software side of computer graphics. Adjustments and alterations passed through both Silicon Graphics and 3dfx Interactive, with the original constant being derived in a collaboration between Cleve Moler and Gregory Walsh, while Gregory worked for Ardent Computing in the late 1980s.[3] Walsh and Moler adapted their version from an unpublished paper by William Kahan and K.C. Ng circulated in May 1986.

secondcoming · on Aug 6, 2022

What has this got to do with seeding a RNG?

allan_s · on Aug 6, 2022

I think the comment author meant "a magical constant that has been found by someone by empirical method without a nice paper pointing why it's a good value"

(even though I think now the constant above has a paper on why it's actually so good)

humanistbot · on Aug 6, 2022

This was an example of a seemingly random magic number appearing in code for a very clever hack. I think the OP was assuming there was something similar at work with the PRNG magic numbers.