Hacker News new | past | comments | ask | show | jobs | submit login
Pseudo-Random vs. True Random (boallen.com)
77 points by llambda on June 9, 2013 | hide | past | favorite | 47 comments



It is of course impossible to get true randomness from a deterministic algorithm, but many pseudo-random generators (PRG) are often good enough for practical applications, for example statistical simulations.

Bad PRGs can be ruled out by tests like the one in the article, that is, showing that the exhibit some regularities that would be unlikely to get from a truly random source. An infamous example is RANDU [1], whose output is concentrated on low-dimensional subsets.

Smart people have put together batteries of statistical tests to assess the quality of PRG, the most famous being the Diehard test [2]. The idea is that if a PRG passes all the tests it is indistinguishable from a truly random source for all practical purposes.

Good generators such as the widely used Mersenne Twister pass most of its tests. There are even better generators that pass more tests, such as Marsaglia's Xorshift [3], which is faster and simpler than the Mersenne Twister, and surprisingly not very well known.

For anyone interested in PRGs, I strongly recommend reading Marsaglia's original paper [4], it is a good example of how to design a principled PRG and it just requires some elementary linear algebra on finite fields.

[1] http://en.wikipedia.org/wiki/RANDU

[2] http://en.wikipedia.org/wiki/Diehard_tests

[3] http://en.wikipedia.org/wiki/Xorshift

[4] http://www.jstatsoft.org/v08/i14/paper


The Diehard tests are great, unfortunately Marsaglia is no longer with us. I recommend the Dieharder tests, which reimplement the Diehard tests but are licensed under the GPL and are actively maintained.

http://www.phy.duke.edu/~rgb/General/dieharder.php


There's a paper by Panneton and l'Ecuyer (linked from the Wikipedia article about Xorshift) that seems to indicate that simple Xorshift generators fail some statistical tests very badly. (Not the ones in Marsaglia's Diehard suite -- he tested them with those -- but others.)

I think a better approach is another recommended by Marsaglia: take a few different RNGs working on different principles, and just combine them by (say) adding them up.

Or, of course, just use something like the Mersenne Twister, which is more complicated than (e.g.) Marsaglia's KISS but there are enough good implementations out there that it doesn't really matter.


Thought I read that the Mersenne Twister has a major vulnerability. It's possible from a small number of samples (<1000) to predict the future values. Hmmm, this is the best link I can find http://b10l.com/?p=24


Of course it has this "vulnerability": 624 of those values are MT's internal random seed.

RNGs have two primary functions: to provide statistically random values (for simulations, say) and to provide unpredictable values (mostly for crypto). These are different goals, and many RNGs are good at one but not the other. That's why we have both RNGs and Crypto RNGs.

Like many good RNGs for simulation, Mersenne Twister is not, and never has been, a Crypto RNG. It is explicitly in the first category.


Reminds me of those test for cyphers that test if your cypher gives you something close to a random function. If it does you're good.


I run an online dice roller for pen&paper roleplaying gamers, it does use mt_rand() and the results are from what I can tell random enough (no discernable periodicities, no preferred numbers, averages and median are where you expect them, and so on).

However, every so often I get a bug report professing some bias of the RNG. Turns out, people want the numbers to be not random but uniformly distributed even over small sample sizes of about 10 to 20 dice rolls. If the same number comes up three times in ten D20 rolls, they assume it must be a bug. The same happens with physical dice, by the way, I've seen players discard "unlucky" dice like that.


Didn't Apple have a similar problem with the ipod? People complained that it was not random because it would sometimes pick the same track twice or would pick tracks from the same artist more often than people expected.

I believe they fixed it by changing the odds every time a track was played so it would seem to be making a more random selection, although technically it is now less random.


Of course, people might not actually expect random playback when they hit shuffle. Changing odds sounds like a good approach.

edit: similar to how some players generate a random playlist (every track on listed once, random order). Then on loop, either create an entirely new list, or simply loop.


Yes! IMO they should keep track of songs played in the shuffle mode over the sessions and try to uniform it instead of providing randomness.


> Didn't Apple have a similar problem with the ipod? People complained that it was not random because it would sometimes pick the same track twice or would pick tracks from the same artist more often than people expected.

I always thought this was by design: the "shuffle" algorithm wasn't designed to be random, but would instead occasionally group songs from the same artist or genre together (and/or favor songs w/ higher play counts_ - i.e. maintain a "theme" for a couple of songs before changing to something different)

I can't find any supporting evidence for this, however, and a lot of the discussions around their algorithm are hearsay.


These randomness discussions always remind me of Games by Email's mass dice roller Dice-o-matic: http://gamesbyemail.com/News/DiceOMatic


My guess would be that typical pen&paper dice would have biases in excess of typical software PRNG implementations (although I haven't exactly researched this topic). You're unlikely to detect any biases with a sample size of a few hundred throws.


Starting with Ivy Bridge processors, Intel has included the RdRand instruction which uses random processor noise to generate random numbers.

http://en.wikipedia.org/wiki/RdRand

Pretty cool in my opinion; at least for applications where you don't need to repeat the same set of numbers.


How fast is this instruction? And how to access it with a C program in Linux?


I used RDRAND to seed a SSE Fractal Flame generator (which is a stochastic system which needs 4 random numbers per thread per loop iteration, sometimes more depending on the variations used). RDRAND has a maximum throughput of something like 500MB/s, and it takes approximately 150 clocks per invocation. I was never able to hit that performance wall with my fractal program. My fractal program is also significantly faster than other CPU implementations, though that is probably due to the intrinsic vectorization more than the random number source. I also got significantly better looking fractals then what I would get with most of the PRNG that I tried.

For more check the "performance" section of this article

http://software.intel.com/en-us/articles/intel-digital-rando...

Note: if you need more than 500MB/s you can uses RDRAND (or RDSEED in Broadwell, when it comes out) to seed a PRNG. I was doing this at first, but the performance of the system didn't improve enough for it to be worth the added complexity.

And if you want to access it in linux from C you can either:

* Use inline assembly, just be sure to check the Zero flag after calling RDRAND, because if RDRAND fails (you're exceeding the 500MB/s) the zero flag isn't set, so you have to just keep calling RDRAND until it is.

* Use intrinsics (easier, immintrin.h), here's how I did it in my program (bug reports welcome, I'm only a freshman in college who had lots of free time and a fascination with fractals)

https://github.com/aarongolliver/FractalFlameMicahTaylorEdit...

http://software.intel.com/sites/products/documentation/studi...


Do you have any screenshots or samples of the fractals? Just curious.


RDRAND can do > 500MB/s when invoked by 8 threads running in parallel: http://software.intel.com/en-us/articles/intel-digital-rando... the theoretical maximum is 800MB/s

I wrote an x86-64 asm impl as part of my lightweight Java crypto library (https://github.com/wg/crypto) would be easy to drop into any C program: https://github.com/wg/crypto/blob/master/src/main/asm/rdrand...

Intel released an open source library too, though in tests my asm impl was faster ;-) http://software.intel.com/en-us/tags/20757


In Linux you can use /dev/random and /dev/urandom


Do those use that CPU instruction if it is available? That would make the blocking /dev/random a lot faster, wouldn't it?


Depends on your kernel. But with the right kernel, yes.


I'm wondering how this piece of n00bishness is at the very top in HN.

The only reason there is a perceivable diference (in the presence of patterns) is that rand() is not very solid (it's optimised for speed, they recommend mt_rand() where a stronger generator is needed, and you can still do much better).

PHP's mt_srand is a Mersenne Twister, which is pretty good http://en.wikipedia.org/wiki/Mersenne_twister


> I'm wondering how this piece of n00bishness is at the very top in HN.

Probably because it was interesting enough to stand out against the NSA/PRISM articles that have dominated the front page for the past few days.

Also, was the term 'n00bishness' necessary? I don't see any cause for attacking the author.


That's exactly what the article says.


It does now. It didn't when I posted that. Or I might have missed it before?

Still, tries to sell "pseudo" vs "true" as if this image showed the difference.


If you like this, look at lcamtuf's "Strange Attractors" visualizations of TCP sequence numbers (old, but great use of visualization).

http://lcamtuf.coredump.cx/oldtcp/tcpseq.html


I remember wondering about how could Randomness be a thing on my old calculator when I was something like 10. I talked about it later on to my friends and no one would understand me when I said that a computer couldn't give you true Rand() functions. No one would understand. So, because there was no wikipedia and I knew no one with enough knowledge I just decided that Rand() on calculators and computers was actually taken from a table that was initialized through a uniform distribution of numbers.

I guess I was wrong.


Occasionally you come across tables in very old statistics textbooks for generating random numbers that are just lists. You close you eyes, stand up and drop a pen to select a number.


If you like neat pictures of random bits take a look at page 6 of Playing Hide and Seek with Stored Keys by Shamir et al[1]

   In this paper we consider the problem of efficiently locating
   cryptographic keys hidden in gigabytes of data, such as the
   complete file system of a typical PC. We describe efficient
   algebraic attacks which can locate secret RSA keys in long
   bit strings, and more general statistical attacks which
   can find arbitrary cryptographic keys embedded in large
   programs.  These techniques can be used to apply lunchtime
   attacks on signature keys used by financial institutes, or
   to defeat authenticode type mechanisms in software packages.

   Keywords: Cryptanalysis, lunchtime attacks, RSA, authenticode, key hiding.
[1 http://www.cs.jhu.edu/~astubble/600.412/s-c-papers/keys2.pdf


Could be better titled "PHP's rand() is utterly broken".


* for some definitions of utterly


If you want to learn what's actually going on be sure to follow the link at the end of the post:

http://cod.ifies.com/2008/05/php-rand01-on-windows-openssl-r...


In case anyone's curious (as I was) about Javascript's Math.random(), here's an implementation I wrote and executed in Chrome on a 300x300 div: http://i.imgur.com/63M2Adb.png


For something more in-depth about the types of randomness, Brian Hayes has a superb article:

http://bit-player.org/2011/a-slight-discrepancy


OK so what is random.org doing that they can claim it's a true random number generator (TRNG)? I looked at the page but didn't see any explanation, maybe they don't say, but as others have said there's no way it can be a TRNG. edit: I see the explanation on random.org

I'm not a mathematician or computer programmer but wouldn't a TRNG require infinite memory, infinite storage, infinite time to generate etc.? One number generated may be 2 but the next number may be negative infinity.

I work in a casino as a slot tech and sometimes even though I know it's not true some patrons can almost convince you there are patterns.


They use atmospheric noise to add entropy to their RNG. The definition of "True" in this case is rather loose, but once you get enough bits of entropy you can feed it into a PRNG or conditioners to give the output the qualities you want.

Here is some history of random.org http://www.random.org/history/

and the wiki page http://en.wikipedia.org/wiki/Random.org

the wiki page for atmospheric noise briefly discusses it's applications to "high quality" RNG http://en.wikipedia.org/wiki/Atmospheric_noise

Unfortunately, the source code to the RNG is close-source, and they say they aren't going to release it. But my guess is they use that random noise as a seed to a PRNG. Their FAQ has quite a lot of information though: http://www.random.org/faq/#Q1.2


> wouldn't a TRNG require infinite memory, ... etc.? One number generated may be 2 but the next number may be negative infinity.

I may be misunderstanding you, but true randomness doesn't mean 'a distribution that's non-zero over the whole integers' - those are orthogonal concepts. You can have a TRNG that gives you a random choice from just {0,1} - the size of the distribution isn't what makes it a TRNG, that just determines how much entropy you need.

(And a uniform distribution over the integers is impossibly no matter how good your RNG is, as it's mathematically undefinable).


>wouldn't a TRNG require infinite memory, infinite storage, infinite time to generate etc.? One number generated may be 2 but the next number may be negative infinity

What are you talking about here? Generating random Real numbers or something? First off you'd have to define a distribution of some sort or your goal is meaningless...but throw all that out. It provides random the sensible way, one bit at a time. Interpret that bit however you want. Memory requirements: zero (or maybe O(1) depending on design). Time requirement: O(n) where n is the amount of entropy you desire.


I guess what I'm asking is say I have a device that can generate a stream of numbers, integers, nothing fancy.

I want random integers with no limit to the size returned, a truly random number and by that I mean a true random integer without any limitation in size.

Really I guess I'm trying to grasp how a true random anything could exist or more to the point how a person could make a device or even be able to know if a random number or anything has been returned as a result.


I used this code and just replaced rand() with mt_rand() in PHP to get this: http://i.imgur.com/EvjKnsO.png

If you ask me, it is look pretty random.


sure about that? http://i.imgur.com/gQkgwR3.png

(sorry for this)


It's easy to blame PHP (edit: not that I think the author meant to suggest that PHP or Windows is at fault), but LCGs are quite common in standard libraries and really bad at this particular task:

https://en.wikipedia.org/wiki/Linear_congruential_generator#...


with php in windows, this is apparently the best I can generate: http://i.imgur.com/4nCOI9B.jpg.

using this:https://gist.github.com/kennethrapp/198e419d1b620cddbc7d which should work a lot better in linux, but I haven't tested it with the image thing yet so who knows?


Slightly unrelated but is there a better algorithm that can give me "nth random number after a seed"?

Currently I am using something like

r = (n*very_big_number)+seed


I think PHP's rand() just uses the C rand(). So you would think that would be pretty good - well tested.


Well tested? Yes.

Pretty good? It has been known for decades that it is pretty bad.


I am shocked-- shocked!-- to learn that PHP's random number generator is seriously flawed, like everything else in PHP.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: