Really Fixing Getrandom()

Nican · on Oct 18, 2019

This is a continuation of this discussion "Fixing Getrandom()" (https://news.ycombinator.com/item?id=21114524).

I love this story, as a complete unrelated kernel commit that reduced the amount of file I/O, caused the kernel to not generate enough entropy at boot, and in consequence, made the system unbootable.

tytso · on Oct 18, 2019

My response to Linus:

I'm OK with this as a starting point. If a jitter entropy system allow us to get pass this logjam, let's do it. At least for the x86 architecture, it will be security through obscurity. And if the alternative is potentially failing where the adversary can attack the CRNG, it's my preference. It's certainly better than nothing, in that in that very worst case, "security through obscurity" is better than "don't block, because blocking is always worse than an guessable value being returned through getrandom(0)". And so if these are the only two options, I'm certainly going to choose the former.

That being said, I'm still very worried that people with deep access to the implementation details of a CPU might be able to reverse engineer what a jitter entropy scheme produces. For that reason, I'd very much like to see someone do an analysis of how well these jitter schemes work on an open, simple architecture such as an RISC-V iplementation (you know, the ones that were so simplistic and didn't have any speculation so they weren't vulnerable to Specture/Meltdown). If jitter approaches turn out not to work that well on RISC-V, perhaps that will be a goad for future RISC-V chips to include the crypto extension to their ISA.

In the long term (not in time for the 5.4 merge window), I'm convinced that we should be trying as many ways of getting entropy as possible. If we're using UEFI, we should be trying to get it from UEFI's secure random number generator; if there is a TPM, we should be trying to get random numbers from the RPM, and mix them in, and so on.

After all, the reason why lived with getrandom(0) blocking for five years was because for the vast majority of x86 platforms, it simply wasn't problem in practice. We need to get back to that place where in practice, we've harvested as much uncertainty from hardware as possible, so most folks are comfortable that attacking the CRNG is no longer the simplest way to crack system security.

(The above is a lightly edited almagamation of https://lore.kernel.org/r/20190930033706.GD4994@mit.edu and https://lore.kernel.org/r/20190930131639.GF4994@mit.edu)

pjc50 · on Oct 18, 2019

> "don't block, because blocking is always worse than an guessable value being returned through getrandom(0)".

It seems that this fight has continued for so long because both possible answers are valid here, depending on the exact use case of the system as a whole(+)? Have we got an interface which lets you properly specify the paranoia vs. best-effort choice yet?

(+) e.g. the SSH server running on your lightbulb may accept that it's not exposed to the internet and therefore not subject to targeted attack, but it's really inconvenient to sit in the dark, so the security-availability tradeoff is worth it.

> most folks are comfortable that attacking the CRNG is no longer the simplest way to crack system security.

Was it ever? It seems that phishing the admins has always been the easiest way.

tytso · on Oct 18, 2019

I proposed a solution where you could specify this via a boot-command line option, and/or a default specified why a compile-time kernel config option. That got rejected because most users wouldn't be able to find the tuning knob, and most distros wouldn't want the reputational/liability risk of "insecure by default", and from Linus's perspective, confused users seeing their system block on boot and and then e-mailing Linus == BAD.

As far as whether it might be simpler to attack the CRNG, please see https://factorable.net/ where a researcher found that at one point 10% of all publically reachable internet sites had insecure SSH keys due to a weaknesses in Linux's random number generator. Granted, these were mostly end user's home routers (many of which had open ssh ports in the day), and most servers on x86 wouldn't have been nearly as vulnerable. But the "Mining your p's and q's" paper is one of the key reasons why I remain very conservative about Linus's random driver to this day. I got the pre-publication notification on July 3rd, and I spent all day on July 4th on top of the Boston Science Museum's parking garage with a laptop, frantically trying to come up with a good fix to the problem while my very patient and accomodating girlfriend and I were waiting for the Boston Pops Fireworks show in the evening.... This is not the kind of thing one wants to do again!

MoOmer · on Oct 19, 2019

I’d definitely prefer a build-level option. Some of us are working with some internal systems that haven’t (yet) been embraced by the loving arms of automation.

justincormack · on Oct 18, 2019

If your lightbulb is running ssh not telnet it is assuming some source of entropy, or it may as well be running telnet. There has been some work on hardening handshakes on protocols where randomness might not be available that it could look at.

magicalhippo · on Oct 18, 2019

If the lightbulb is running SSH it's most likely running on a microcontroller which almost certainly has a built-in analog-to-digital converter, and thus can get entropy from thermal noise. Either sample a floating pin or an internal band-gap voltage reference should produce some bits of entropy.

derefr · on Oct 18, 2019

Does that really need to be true? Protocols with fixed keys can skip Diffie-Hellman, right?

As such, is there an option to configure OpenSSH such that it has a fixed authorized_keys, and does the authentication handshake in the opposite order: first establishing that it can talk to the client by decrypting the messages the client is sending; and then parsing the client’s auth commands from said decrypted stream, where one SSH AUTH message might be “auth me using the very key we’re conversing over.”

If there is, I’m surprised IoT devices don’t go for it. If there isn’t... why not?

tialaramex · on Oct 18, 2019

Now I'm in front of an actual PC let's explain a bit more

It's absolutely critical in SSH that we end up with a unique session ID which will be the output of a cryptographic hash function (these days maybe SHA-256) but obviously the input to that function must be secret. Everything in the higher level parts of SSH assumes that there is a secret unique session ID. The session ID stays the same for the lifetime of a connection, although you can run key agreement itself again if the connection is long-lived or moves a lot of data so that it's unwise to keep using the same symmetric keys.

If you do any variant of DH obviously this session ID is the end result of the first DH key agreement process and so it'll be different every time because you're using random numbers.

But if you want to add a "fixed asymetric keys" mode you're going to need to agree a secret session ID for each new connection... somehow.

At some cost you could pick at random and send it from one party to the other, but then we're exactly back where we started about how we're assuming we have a good source of entropy.

If we just pick any fixed value it might as well be 0000 0000 0000 0000 and so on then obviously a bad guy can break everything and we should have just used telnet.

shandor · on Oct 18, 2019

> As such, is there an option to configure OpenSSH such that it has a fixed authorized_keys

At least you can override the default functionality with your own by utilizing the "authorized keys command". It basically allows you to create your own authorization scheme instead of the default "key in file means access granted"

tialaramex · on Oct 18, 2019

No, you can't do this in SSH, it would be a radical re-design.

I'm not completely certain that it's impossible to make this secure, but it's definitely very easy for it to get done in a way that's insecure.

lathiat · on Oct 18, 2019

I had a great bug in libvirt fixed where two machines being deployed in an automated fashion generated the same MAC address for the virbr0. And at least one other person had reported the same issue previously on a mailing list.

Details from the bug I filed (https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/17103...):

src/util/virrandom.c:virRandomOnceInit seeds the random number generator using this formula: unsigned int seed = time(NULL) ^ getpid();

This seems to be a popular method after a quick google but it's easy to see how this can be problematic. The time is only in seconds, and during boot of a relatively identical system these numbers are both likely to be relatively similar across multiple systems which is quite likely in cloud-like environments. Secondly, by using bitwise OR only a small difference is created and if the 1st or 2nd MSB of the pid or time are 0 then it would be easy to have colliding values.

Though problematic from basic logic, I also tested this with a small test program trying 67,921 unique combinations of time() and pid() which produced only 5,693 random seeds using PID range 6799-6810 and time() range 1502484340 to 1502489999.

During the actual incident in question, the 4 systems were all booted within 1-2 seconds of each other. We can see from dmesg that the two systems that generated the same MAC did in fact boot during the same second and the other two did not

pjc50 · on Oct 18, 2019

Reminds me of mjg's recent attack on Bird: https://mjg59.dreamwidth.org/53258.html

> Digging through the code revealed 8 bytes worth of key fairly quickly, but the other 8 bytes were less obvious. I finally figured out that 4 more bytes were the value of another Bluetooth variable which could be simply read out by a client. The final 4 bytes were more confusing, because all the evidence made no sense. It looked like it came from passing the scooter serial number to atoi(), which converts an ASCII representation of a number to an integer. But this seemed wrong, because atoi() stops at the first non-numeric value and the scooter serial numbers all started with a letter[2]. It turned out that I was overthinking it and for the vast majority of scooters in the fleet, this section of the key was always "0".

schoen · on Oct 18, 2019

A classic PRNG seeding problem from 1996:

https://people.eecs.berkeley.edu/~daw/papers/ddj-netscape.ht...

Netscape was using a related method for seeing the PRNG to the one you mention, which was still a small enough space of possible seed values that when the resulting values were using in cryptography, an attacker could brute-force the seed.

vanni · on Oct 18, 2019

Somewhat relevant: CPU Time Jitter Based Non-Physical True Random Number Generator by Stephan Müller

http://www.chronox.de/jent/doc/CPU-Jitter-NPTRNG.html#toc-Ap...

https://pdfs.semanticscholar.org/af73/17c970fd416646b2e46659...

mangix · on Oct 18, 2019

FWIW this is used in OpenWrt.

marios · on Oct 18, 2019

I am by not means qualified to judge on the quality of the patch, but I'm glad Linus is moving away from adding flags to getrandom() (resulting in subtly different behaviour than the Open/FreeBSD implementations). Now if only Linux fixed the "/dev/random provides higher quality random numbers than /dev/urandom and blocks when the entropy pool is empty" nonsense, that would be great.

dev_dull · on Oct 18, 2019

They should do exactly what’s been done in Mac/bsd variants and make both of them the same CSPRNG via fortuna[1]. Then the only thing the system ever need to worry about is getting a few bytes of random data seeded and you can read securely all day long.

1. https://en.m.wikipedia.org/wiki/Fortuna_(PRNG)

the_why_of_y · on Oct 18, 2019

They already use the same CSPRNG algorithm; the only difference is the entropy accounting.

See 2nd point here: https://www.2uo.de/myths-about-urandom/

ploxiln · on Oct 18, 2019

My understanding is that /dev/random "entropy depletion" is because of FIPS or other enterprise/government certification requirements. It does not make logical sense, it just has to be that way, because requirements, because security.

wbl · on Oct 19, 2019

That's not the case. FIPS only recently covered random numbers and permits on time initialization.

thenewnewguy · on Oct 18, 2019

What's the issue there? As far as I can tell that's intended behavior - for most randomness /dev/urandom _never_ blocks, and for the paranoid /dev/random blocks when there's no entropy.

toast0 · on Oct 19, 2019

Classically, on Linux you had /dev/urandom which always gave you something, even if the system hadn't achieved a seeded state, and /dev/random which would block in case the system wasn't seeded and also in case random had been used too much without more input into the entropy.

Neither one of those is usable for key generation. urandom may give you repeatable data if the system hasn't seeded itself yet. random may block on accounting that most experts find problematic.

getrandom() finally provided the right semantics of only blocking when the system isn't seeded, and without a filesystem/device node dependency; however changes in startup software, including filesystem improvements have resulted in shortened boot sequences and less entropy gathered. In some cases, systems were blocked waiting on entropy before any (non human interfaced) entropy sources were enabled.

feanaro · on Oct 18, 2019

The issue is that entropy isn't actually expended by generating a large amount of random data from a CSPRNG. Therefore, the only time there is a need for blocking is prior to the initial seeding of the generator. Later on, entropy counting is just hoodoo and cargo culting.

derefr · on Oct 18, 2019

So, the big outlier in terms of TSC predictability would be virtual machines, no? Which VM hypervisors pass through RDTSC to the host (and then, presumably, add an offset before returning it, such that the VM can start at TSC=0); and which ones just have a “virtual” TSC unrelated to the host’s? Do instances on any production IaaS compute cloud have predictable virtual TSCs?

ChrisSD · on Oct 18, 2019

Using CPU jitter is a clever solution. I hope it stands up to scrutiny.

> but mixing entropy from the hardware with other sources is considered to be safe by most, even if the hardware generator is somehow suspect.

To be clear, it's a property of bitwise xor that it preserves entropy. If you xor a random bit with 1 you still have a random bit.

devit · on Oct 18, 2019

Assuming they are independent random variables, since e.g. X xor (not X) is constant.

dooglius · on Oct 18, 2019

Well, that's not entirely true, even if two random bits are independent, xor-ing them only gives you one random bit. What actually happens in Linux is that each new input of random bits gets pushed through a cryptographic hash, before being xor'd.

ChrisSD · on Oct 18, 2019

From replies it seems my two sentences weren't as clear as I thought. I'll try to refine it:

A property of bitwise xoring random variable `x` with a constant (or otherwise non-random but independent) value is that the entropy of `x` is preserved in the result.

WAHa_06x36 · on Oct 20, 2019

However, xoring two arbitrary random variables is NOT guaranteed to preserve entropy. Preserving entropy only happens under certain specific conditions.

ChrisSD · on Oct 21, 2019

Under what circumstances would xoring `x` and `y` not produce a result with entropy at least as great as `x`?

WAHa_06x36 · on Oct 21, 2019

In the very obvious cases where x == y, or x == not y, or x == y + 1, or many other correlations.

ChrisSD · on Oct 21, 2019

Only if y is known to be dependent on x. If y is random but just so happens to be `not x` or `== x` then that won't make the result any less random.

0 is as random a number as 529890873740477 is.

WAHa_06x36 · on Oct 22, 2019

Single numbers don't really have entropy at all, so that is not something it even makes sense to talk about.

tptacek · on Oct 18, 2019

CPU jitter randomness is an old idea, and not especially well thought of.

Diederich · on Oct 18, 2019

If you have a moment, can you expand on that or provide a link? Especially for the last part. Thanks.

This is well outside of my wheelhouse, so please pardon a naive inquiry.

toast0 · on Oct 19, 2019

These timing measurements by nature are going to have a small range of possible values --- the size of the range is going to likely be system specific, and for some systems, there may be only a few possible outcomes.

clarry · on Oct 19, 2019

And that's enough. One bit is enough, less than one bit of randomness is enough if you can just repeat the process enough times.

toast0 · on Oct 19, 2019

How are we to know how much entropy the process provides our system? The patch in the article counts one bit per timer interrupt, but there may be much less than that. If the tsc clock and the timer clock are linked, and the system is totally blocked on entropy, it seems plausible that the tsc reads will be predictable/repeatable beyond maybe the first value.

clarry · on Oct 19, 2019

If you've ever tried to measure cycle-accurate timings for algorithms, you might've noticed that even relatively simple and short instruction streams have variance.

Whether tsc clock and timer clock are linked or not is completely irrelevant, since the timer clock is only used to 1) credit entropy 2) possibly introduce additional jitter. The value of TSC when the timer interrupt runs is not used as entropy at all so even if that were 100% predictable, it doesn't matter.

It only matters that you do have jitter (between calls to schedule() and random_get_entropy()), and I have a hard time imagining a modern system with a high-res TSC where calling schedule() with active timers in a loop (with branches) would not have any jitter at all. You probably have way more entropy than one bit per jiffy; I think the counter is very conservative.

Notice that while entropy is credited only once per tick, actual TSC values are mixed into the entropy pool on every iteration of the loop that calls schedule().

md5person · on Oct 18, 2019

> "Using CPU jitter is a clever solution. I hope it stands up to scrutiny."

How safe is it to use CPU jitter in a virtual machine?

jfindley · on Oct 18, 2019

Safe as in safe from host interference? Not at all. But if your VM is running on a malicious host, you have bigger problems honestly.

stefan_ · on Oct 18, 2019

This is kind of the ongoing joke here, yeah?

Like HN users will gladly spend hours discussing the merits of various random number generation schemes, why that Intel HWRNG is probably backdoored and we should all be very afraid, all the while running their micro-services on the cloud where their entropy is literally spoon-fed from the hypervisor. SMC call into code-you-can't-see and back comes pure entropy (trust us)!

pjc50 · on Oct 18, 2019

Yes, although there's two kinds of VM safety people might care about. By using a VM you accept that the bare metal owner can look into it if they want to and you have to trust them not to. However, that doesn't mean accepting attacks from other customers - which was what all the Spectre/Meltdown risk was primarily about.

codedokode · on Oct 19, 2019

Maybe random values should not be necessary to boot the system? I remember reading somewhere that systemd needs cryptographically random numbers to start up and I don't understand why.

devit · on Oct 18, 2019

Well, the proper solution for the future would be for the kernel to save and restore a random seed using UEFI variables, a similar firmware mechanism or free space in a boot sector or swap partition.

But of course this doesn't solve the problem of newer kernels on existing installations with no such stored seed and that don't pass command line argument telling the kernel where to store the seed.

mlyle · on Oct 18, 2019

> Well, the proper solution for the future would be for the kernel to save and restore a random seed using UEFI variables, a similar firmware mechanism or free space in a boot sector or swap partition.

This needs to be invalidated before numbers dependent on the seed are used, else you risk attacks where you get someone to generate the same random numbers on consecutive boots.

In turn, if you crash during bootup at the wrong time, you're left without a seed.

devit · on Oct 18, 2019

Yes, you need to update the seed immediately after reading it and before using it.

There is no problem with a crash assuming that the seed write mechanism is crash-safe (which is achievable with block storage, but indeed I'm not sure whether firmware is properly designed and implemented).

ars · on Oct 18, 2019

On those machines the old way works fine, but not all machine even have a place to store this data, on top of that people clone VM instances, and you don't want the same number each time.

Read the previous https://news.ycombinator.com/item?id=21114524 story for more on the problem space and why what you wrote is not considered enough.

enriquto · on Oct 18, 2019

Speaking of which,...

does anybody here know a portable, easy to use, and high quality, deterministic random number generator? It is a requirement that you must get the exactly the same sequence of pseudo-random numbers on any architecture (depending on a user-settable seed).

I use a hand-crafted (two-line) linear congruential generator, but always feel a bit uneasy about that.

throw0101a · on Oct 18, 2019

Any stream cipher?

If you use the same symmetric key and nonce, the output bit stream should be the same every time.

This is how OpenBSD's arc4random(3) started out: using the output of RC4 and stirring the entropy pool regularly to ensure forward security. (They've since switched to ChaCha20.)

* https://man.openbsd.org/arc4random.3

Remove the stirring and you've got deterministic output. Security comes from the entropy of the initial conditions.

enriquto · on Oct 18, 2019

Yes, something like that would be great! But does it come packaged nicely? So that I can compile it out of the box on windows, linux, macos, android?

jedisct1 · on Oct 18, 2019

The `randombytes_deterministic()` function from libsodium or libhydrogen does exactly what the name suggests.

slrz · on Oct 18, 2019

Libbsd includes it.

https://libbsd.freedesktop.org/

SAI_Peregrinus · on Oct 18, 2019

Monocypher is a decent fit here. Just run its ChaCha20 with (a hash of) your seed as the key. Larger libraries (libsodium, NaCl, etc) are decent too, but bigger.

tptacek · on Oct 18, 2019

Alternative opinion: Don't use Monocypher. Much less safe than libsodium.

dfox · on Oct 18, 2019

That depends on what you use the PRNG for.

For cryptographic applications what you want is as previously noted just some arbitrary secure stream function (with block cipher in OFB or CTR mode being particularly convenient way to obtain such cipher). Bear in mind that for cryptographic applications it is often important whether the state can be determined by observing the output and whether the state can be rewound back (which then necessitates choosing stream cipher that satisfies these requirements or using more complex construction. Most notably block ciphers in CTR/OFB mode does not satisfy this and ChaCha/Salsa also does not as it is essentially block cipher in CTR mode). Interesting modern construction would involve Keccak-style sponge functions, which are kind of universal cryptographic primitive that is usable as hash, MAC and PRNG/stream cipher all at once depending on mode of operation.

For monte-carlo simulations you want something that is reasonably fast and has large state space. Mersenne Twister is common choice, although XorShift is significantly faster and usually has state space that is large enought.

For randomized distributed simulations (read: lock-step realtime multiplayer games) you don't care about the output quality of the output that much and LCG tends to be sufficient, althought also in this case XorShift would be probably better as it has comparable performance and significantly better output quality.

All the mentioned algorithms are by definition deterministic and discounting implementation errors platform independent.

tptacek · on Oct 18, 2019

Plenty of CSRNGs are built from AES-CTR and ChaCha20. There's probably no good reason to prefer XorShift over a cryptographic random bit generator that you can seed directly.

pcwalton · on Oct 18, 2019

For Monte Carlo simulations, CSRNGs are too slow. Xorshift is incredibly fast. You wouldn't want to use a CSRNG for path tracing rendering, for example; you're calling the RNG many times per ray and shooting dozens or hundreds of rays per pixel.

We've dealt with this in Rust—people love to write Monte Carlo simulations as language benchmarks, and Rust used to get dinged for preferring CSRNGs.

throw0101a · on Oct 18, 2019

> For Monte Carlo simulations, CSRNGs are too slow.

What kind of speed needs are we talking about? With AES-NI in most modern CPUs you can get many MB/s or GB/s.

* https://calomel.org/aesni_ssl_performance.html

(I don't do MC, so have no idea.)

pcwalton · on Oct 18, 2019

Xorshift is like 10 cycles. AES-NI can take hundreds, depending on the number of rounds.

In the case of brute force Monte Carlo path tracing, the speed of the RNG is important, because you have dozens or hundreds of rays per pixel, each with multiple bounces, and every bounce samples from your RNG. When your render times are 6 hours per frame to begin with, you really don't want to mess around.

clarry · on Oct 18, 2019

> When your render times are 6 hours per frame

Ok, let's say we're rendering at 8K (32 megapixels), 200 rays per pixel, 20 bounces per ray. And rendering that frame takes 6 hours. How many (mega)bytes of random data do we need per second?

    ; (32*10^6) * 200 * 20 / (6*60*60) / (1024^2)
        ~5.65140335648148148148

Six whopping megabytes per second, for the entire job, across all cores! Well, that assumed you only need one byte per bounce. Maybe you need a few bytes?

And here I can pull an order of magnitude more and then some (180 megabytes/s) via /dev/urandom on OpenBSD (i.e. Chacha20, no hardware accel), using a single core of a few-years-old fanless mini PC (some Intel Celeron with "U" postfix).

I generally don't buy the argument that your PRNG needs to be super duper fast, because usually the simulation is doing much more than just generating random data, and all that work is so much more expensive than even a relatively expensive CSPRNG. At least my ray tracers are. Generating rays is actually trivial compared to traversing, intersecting and shading the scene.. goodness gracious if I need to do actual texture & normal map lookups at a high resolution, so many cache misses :(

That said, a simple and fast PRNG with little internal state can make life easier if you want reproducibility.. and it might be easier to build a reproducible simulation if your RNG gives exactly one word the size you need, instead of requiring you to generate a big block (and then extract things out of it somehow). This is especially the case with threaded sims. So yeah.

But in this case, I wouldn't be looking at something like XorShift, I'd be looking at a random map that allows me to use (e.g.) the pixel's coordinates and a frame & bounce counter as the input and get matching (constant) output. Now it no longer matters which thread renders which pixel and when, the result will be all the same. It's not hard to build such functions, but they will be slower than XorShift if you want comparable quality. Speed should be no problem though.

littlestymaar · on Oct 19, 2019

It's amazing how much efforts people are willing to invest to contradict others' experience on the Internet.

The GP is giving a real-life example (people having complained about something being too slow compared to xorshift) and yet you try to do the math to prove it couldn't happen…

clarry · on Oct 19, 2019

The math was just the prelude to my real world experience. Please try to read and comprehend the rest of my post. If you can't comprehend it, then feel free to ask for clarification or just leave it be without adding any snide remarks.

People having complained is not a real-life example of an actual performance issue. Until otherwise proven, it's a real-life example of someone complaining about some microbenchmark that, based on my real life experience, would be completely irrelevant if they actually developed their real world problem-solving program which totally shifts the bottleneck to something that actually matters.

Alternatively, it's a real-life example of poor implementation or usage, and not necessarily poor performance of the underlying primitive. This also happens a lot. "Foo is slow!" is true for a lot of things if you use them wrong. Maybe it's simply the user's ignorance, or maybe the API isn't well designed to support their case. Maybe it's poor documentation.

Also, this is a technical forum. There's absolutely nothing wrong with people doing back of the envelope calculations to analyze the scale of a problem. If more people did that while pointing at real examples with concrete numbers, maybe we'd actually know things instead of just relying on hearsay and religion. Whatever, it's a cargo cult industry.

Here's the good part: if someone who knows better comes along, they find those numbers and can tell us why we're wrong. Otherwise, let's just take everything on faith?

It's amazing how often people (like you, who haven't contributed anything except noise to the discussion) just don't want to understand a problem and if someone comes along trying to understand it and do the math, they're called a lunatic.

And this is why some of the code I get to work on is so terrible, people are wasting time and complicating code to micro-optimize things that are completely irrelevant (a conclusion one could come to either by a ballpark-estimate or a quick profile), while massive bottlenecks sit unattended.

wolf550e · on Oct 18, 2019

AES-128-CTR on my laptop is 4.8GB/sec per core. I'm sure something producing random enough output for simulation can be built that is as fast as memcpy, but would anything real be affected by the difference?

dragontamer · on Oct 18, 2019

https://github.com/dragontamer/AESRand

Yes, it is possible but its difficult to do it right. I ran into at least 3 or 4 "roadblocks" while designing this algorithm, fortunately I managed to get through all of them.

I've got an unrolled dual-stream going at 37.1 GBps on Threadripper 1950x (which has 2x AES pipelines). ~29.2 GBps for more "typical" usage (but still impractical).

The issue with this is that random-bits of this speed are almost useless. A division or modulus operation takes 80-cycles (~20 nanoseconds) so as soon as you do "rand() % 6" (assuming you wanted a 6-sided dice roll), you've completely wreaked the bit-generator. A branch-misprediction is like 15-cycles, so that singular if-statement in your code that uses the RNG-number would be 5x slower than the generator itself.

Its fast, but I don't know how practical it is. There's no practical way to use all of these random bits that I'm aware of.

I did get a bit-compatible POWER9 version running by the way. So its portable across systems (at least, systems with an AES-encoder. ARM, Power9, and 2010-era x86.). Rasp. Pi doesn't have a CPU-AES engine however (its outside of the core), so Rasp. Pi can't use this methodology.

--------------

If I were to actually push forward with this project more, I'd need to study on whole-program optimization, so that the (rarely used) XMM registers can be pragmatically used in a lot of code. The entirety of this RNG fits inside a single XMM register, so it'd benefit grossly from whole-program optimization.

The XMM-data should also be fed into SIMD somehow, some kind of engine into ispc or OpenMP SIMD code. A bunch of raw-bits coming in through assembly commands is somewhat impractical, no matter how you look at it.

Dan Lemire informed me of an amazing algorithm (that does work with AES XMM registers) that creates uniform random numbers out of a few multiply instructions. So I can get XMM-registers full of arbitrary integers with very low bias. (ex: numbers between 0 and 100) out of the bitstream.

I also managed to convert the numbers into single-precision floats and double-precision floats between 0 and 1.0 uniformly.

Still, this only really seems practical if you have SIMD-code that actually consumes all of these bits. Converting from XMM-registers into RAX registers has quite a bit of slowdown.

I don't think the typical simulation code is actually limited by the speed of RNGs at all.

e12e · on Oct 18, 2019

Cycles pr what? Not bit, byte? If you generate a gig of random data (repeat when you read to the end) - is ares still too slow?

gok · on Oct 18, 2019

A lot of machine learning (e.g. NCE) requires fast PRNG also, often running on GPUs that don't have crypto primitives like many CPUs do.

dragontamer · on Oct 18, 2019

GPUs have bit-reverse.

Counter (0, 1, 2, 3, 4...) -> Multiply Const1 -> Bit-reverse -> XOR Const2 -> Multiply Const3 has been my PRNG for GPUs.

Const1 and Const3 need to be odd (bottom bit == 1). Const2 can be fully random. All three values can come from /dev/urandom with pretty good results with some preliminary testing. I haven't figured out a good methodology from selecting Const1 or Const3 aside from the bottom-bit == 1 and random (If I had more time, I'd probably work on it...)

---------

The "counter" is 64-bits and allocated per-thread. Assume 2^14 bits for 16384 GPU-threads, each GPU-thread then has 50-bits of values.

64-bit integer multiply is slow however. 32-bit multiply would be faster for most GPUs. But also in my experience, GPUs are almost always memory-constrained, not compute-constrained. So you probably can do fine with 64-bit integer multiply.

If you really need to stick with 32-bit integers, then make Const1 and Const3 per-thread specific. That gives a unique sequence of 32-bit numbers per thread, but a 32-bit cycle is inevitable every 4-billion numbers with only 32-bits of state.

enriquto · on Oct 18, 2019

It's not only monte-carlo simulations. You may want to add a bit of random noise to a high-resolution video sequence, for example. It is impractical to do so with a slow PRNG.

whyever · on Oct 18, 2019

I disagree. With the use of SIMD, CSPRNG are about as fast. Xorshift and friends are usually not optimized for SIMD.

SideQuark · on Oct 18, 2019

PCG generators are very good, very fast, and decently random, don't suffer (AFAIK) from bad seeds like Mersenne Twister, mix well, have small state, and have selectable periods of about any size. They are vastly better than LCGs.

I use a 64 bit one and a 128 bit one for all sorts of work.

These are not crypto, so don't use them for that. For non-crypto I have seen no better PRNG than these (and I've followed and written on this space for a long time).

Code available on github.

http://www.pcg-random.org/

tptacek · on Oct 18, 2019

When you say "not crypto, don't use them for that", you mean "don't use them for any situation in which the security of the randomness matters", right?

SideQuark · on Oct 22, 2019

Yep - which is a lot, if not the vast majority, of random numbers generated.

Also, I'm answering a question by someone that looks like they want precisely this type of PRNG. There is no need to make a mess with CPRNGs.

enriquto · on Oct 18, 2019

That's exactly what I was looking for! Is it your code?

SideQuark · on Oct 22, 2019

Not my code, but I have versions I've written for various platforms/languages that I use.

luizfelberti · on Oct 18, 2019

I don't know what you want this for exactly, so definitely don't use my suggestion for crypto purposes. However...

If what you want is just to add some uniform, seemingly random behavior to something (e.g. a game, generate some noise, whatever), I've used the Wolfenstein3D's FizzleFade effect algorithm [0] in the past.

It's boringly easy and works like a charm.

[0] http://fabiensanglard.net/fizzlefade/index.php

wahern · on Oct 18, 2019

What's high quality depends on your requirements and usage. But, for example, you can use a Feistel construction to generate a random, non-repeating sequence that is as unpredictable as the mixing function. I've used this technique instead of a Fisher–Yates shuffle, trading higher CPU for lower space, for generating 16-bit DNS QIDs.

enriquto · on Oct 18, 2019

Yes, I know many nice algorithms with "high quality" properties. My question is about actual implementations: is there a canonical github repo that everybody is using for this need, or I have to write or adapt the code myself?

ermir · on Oct 18, 2019

You can use a sequence of AES encryptions with a pre-shared seed, the quality of the random numbers will be among the highest you can get. You can find AES implementations all over the place, for basically any platform.

You would do something like rng1 = AES(seed); rng2 = AES(rng1); rng3 = AES(rng2);

ermir · on Oct 18, 2019

To add to my answer, you can also do something like this: randomNumber = AES(secretKey, counter), where counter is just a sequence number that you can share in the open. Anyone without the key would not be able to generate the random number with just the counter only.

aidenn0 · on Oct 18, 2019

Has anyone considered fixing systemd to be able to load the on-disk entropy store sooner? This would be only an issue when booting machines for the first time (with the exception of diskless machines) if that is fixed.

tzs · on Oct 18, 2019

An issue with that is how to deal with things that want random numbers before the disk is writable. If you read the seed while the disk is read-only, and then something uses random numbers based on that seed, and then something causes the boot to abort before the on-disk seed is updated, then the next boot attempt will reuse the same on-disk seed.

Depending on what random numbers are used for between seeding and writing the updated seed, that could range from harmless to disastrous.

With some care in how things are ordered during system startup, it could probably be made safe.

aidenn0 · on Oct 18, 2019

As long as you order it so the network isn't up before the disk is writable, then no persistent effects could happen from reusing numbers, right? If you can't send out a network packet and can't write to the disk, what other recordable events are there? An attacker writing down numbers observed on the console?

viraptor · on Oct 18, 2019

That assumes your disk is not provided over network. Which may or may not be true.

neilsimp1 · on Oct 18, 2019

Does anybody who is familiar with the BSDs know how those OSes solve this kind of problem? (I don't know enough to say if this even could be a problem with their architecture, just curious).

throw0101a · on Oct 18, 2019

The boot loaders read entropy off-disk to seed the kernel:

* https://www.freebsd.org/cgi/man.cgi?loader.conf(5) (see "entropy_cache_load")

When the system is brought down cleanly (init 0 or 6) a seed file is saved. Also, when the system is started up, a seed file is created in case there's a crash. OpenBSD's rc(8):

  # Push the old seed into the kernel, create a future seed  and create a seed
  # file for the boot-loader.
  random_seed() {
    dd if=/var/db/host.random of=/dev/random bs=65536 count=1 status=none
    chmod 600 /var/db/host.random
    dd if=/dev/random of=/var/db/host.random bs=65536 count=1 status=none
    dd if=/dev/random of=/etc/random.seed bs=512 count=1 status=none
    chmod 600 /etc/random.seed
  }

* https://cvsweb.openbsd.org/src/etc/rc?rev=1.537

FreeBSD also has a cron job that saves 4KB of entropy every 11 minutes in a rotating series of files:

* https://svnweb.freebsd.org/base/head/libexec/save-entropy/

* https://svnweb.freebsd.org/base/head/libexec/rc/rc.d/random

beefhash · on Oct 18, 2019

The installer also leaves a random.seed file so that there's a direct chain from the very first boot.

throw0101a · on Oct 18, 2019

So the main remaining risk is things like (cloud) images where many people/machines use the same seed. This can be mitigated by stirring in RDRAND, unique CPU information (serial number?), high-res date/time, and other possibly unique information (MAC addresses?).

bentley · on Oct 18, 2019

If you’re running virtual machines you can just use a virtual RNG device exposed by the hypervisor.

wahern · on Oct 18, 2019

The Virtio RNG driver was merged into Linux 2.6.26, released a decade ago. AFAIU it should be built into most Linux kernels. I just confirmed on Alpine 3.10 and Ubuntu 18.10 (Cosmic).

The problem is that not all hypervisors provide the device by default. I use libvirt KVM/QEMU and the default template doesn't include it, but you can add it. AWS EC2 doesn't provide it, though it doesn't provide any virtio devices. Parallels Desktop 15 doesn't support virtio-rng, either

OTOH, OpenBSD's VMD not only supports it but I think it's enabled by default. (I see it in the source code. I can't find a way to enable/disable it, but the publicly posted dmesg dumps seem to always show it as found when an OpenBSD guest boots.)

Of course, most VMs are x86_64-based using hardware extensions and likely using CPUs providing rdrand. But hypervisors really should provide the virtio-rng device default, perhaps even unconditionally as OpenBSD apparently does.

bentley · on Oct 19, 2019

> OTOH, OpenBSD's VMD not only supports it but I think it's enabled by default.

Yes, my VMD virtual machines have it with no configuration necessary.

    pvbus0 at mainbus0: OpenBSD
    pci0 at mainbus0 bus 0
    virtio0 at pci0 dev 1 function 0 "Qumranet Virtio RNG" rev 0x00
    viornd0 at virtio0

Here’s the manpage: https://man.openbsd.org/viornd.4

the_why_of_y · on Oct 18, 2019

It's this kind of problem, but it's not the same problem; the Linux problem under discussion happens at a point during the boot when /var is not yet accessible/writable.

throw0101a · on Oct 18, 2019

You drop an entropy file in /boot and have GRUB seed it into the kernel.

The file is created (a) when the system is shutdown cleanly, but also (b) when the system comes up cleanly in case of a crash later on.

magashna · on Oct 18, 2019

I didn't know that a jiffy was the formal unit for kernel timing. That tickled me quite a bit.

denton-scratch · on Oct 18, 2019

Analogously, a "mickey" is the (informal) unit for timing mouse movements.

fjp · on Oct 18, 2019

How the hell do people get to a skill level where they can do this stuff.

tenebrisalietum · on Oct 18, 2019

Do what? Fix getrandom() or take advantage of getrandom() vulnerabilities?

kzrdude · on Oct 18, 2019

hm, working on linux for every day since 1991?

cabalamat · on Oct 18, 2019

When I want a randomness source, I use the output from `ps auxw`, `df`, and `date`.

shakna · on Oct 18, 2019

Ignoring that those may not be great sources for cryptographic randomness, how does one do something like that when the userland isn't established yet because the system is still booting?

That's why a working getrandom is so important.