Removing the Linux /dev/random blocking pool

CiPHPerCoder · on Jan 7, 2020

Making /dev/random behave like getrandom(2) will finally put to rest one of the most frustrating arguments in the public understanding of cryptography. Please do it.

jdormit · on Jan 7, 2020

What argument are you referring to?

geofft · on Jan 7, 2020

The question of whether "true random numbers" as defined by this weird government standard exist.

More fundamentally, an important conceptual part of crypto is that you can use a random key of a very small, fixed size, like 16 bytes, to generate an infinite amount of output, such that knowing any part of the output doesn't help you guess at any other part of the output nor the input key. If true, this is an amazingly powerful property because you can securely exchange (or derive) a short key once and then you can send as much encrypted data back and forth as you want. If you needed to re-seed the input with as many bits as you take from the output, disk encryption wouldn't be sound (you'd need a key or passphrase as long as your disk), SSL would be way more expensive (and couldn't be forward-secure) if it even worked at all, Kerberos wouldn't work, Signal wouldn't work, etc.

The claim of /dev/random and this wacky government standard is that in fact disk encryption, SSL, etc. are flawed designs, good enough for securing a single communication but suboptimal because they encrypt more bits of data than the size of the random key, and so when generating random keys, you really ought to use "true random numbers" so that breaking one SSL connection doesn't help you break another. Whether it's a pen-and-paper cipher like Vigenère or a fancy modern algorithm like AES, anything with a fixed-size key can be cryptanalyzed and you shouldn't provide too much output with it, for your secure stuff you must use a one-time pad. The claim of the cryptography community is that, no, in fact, there's nothing flawed about this approach and stretching fixed-size keys is the very miracle of cryptography. We know how to do it securely for any worthwhile definition of "securely" (including definitions where quantum computers are relevant) and we should, because key-based encryption has changed our world for the better.

throw0101a · on Jan 7, 2020

> ... like 16 bytes, to generate an infinite amount of output, such that knowing any part of the output doesn't help you guess at any other part of the output nor the input key.

Isn't that the theory behind every stream cipher? (And stream ciphers are generally just 'simplified' one-time pads.)

That's what OpenBSD's arc4random(4) start as: the output of RC4.

ben_bai · on Jan 7, 2020

Yep. The OpenBSD bootloader reads an entropy file from hard disk, mixes it with RDRAND from CPU (if available) and passes it to the Kernel.

The Kernel starts an ChaCha20 stream cipher with this supplied entropy while constantly mixing in timing entropy from devices.

This chipherstream supplies the Kernel with random data and once userland is up this is good enought and also used for /dev/random and /dev/urandom, which on OpenBSB is the same device(non blocking).

Now the fun part: When a userland process gets created it has a randomdata ELF segment that the Kernel fills and which is used as entropy for a new ChaCha20 stream, just for the application should it decide to call arc4random or use random data in any other way (like calling malloc or free, which on OpenBSD make heavy use of random data).

wahern · on Jan 8, 2020

The .openbsd.randomdata ELF section is used for RETGUARD. arc4random(3) uses the getentropy(2) system call for seeding, and minherit(2)+MAP_INHERIT_ZERO for consistent, automatic reinitialization on fork.

Interestingly, Linux provides 128 bits of random data on exec through the ELF auxiliary vector mechanism. (https://lwn.net/Articles/519085/) Between the disappearance of the sysctl(2) syscall and the addition of getrandom(2), it was the only way to acquire strong seed entropy without opening a file resource.

EDIT: Which makes me curious how Linux filled AT_RANDOM for init(1) and other early boot time processes. But not curious enough to comb through old code...

geofft · on Jan 8, 2020

> EDIT: Which makes me curious how Linux filled AT_RANDOM for init(1) and other early boot time processes. But not curious enough to comb through old code...

It uses get_random_bytes(), which is documented as "equivalent to a read from /dev/urandom."

https://github.com/torvalds/linux/blob/v5.4/fs/binfmt_elf.c#...

throw0101a · on Jan 8, 2020

> When a userland process gets created it has a randomdata ELF ...

This is news to me. When did they add this (neat) functionality?

notaplumber · on Jan 8, 2020

In 2012, first release 5.3 in 2013. Added by Matthew Dempsky. It was used for the per-shared object stack protector cookie extended for the per-function cookies required for RETGUARD.

https://github.com/openbsd/src/blob/master/libexec/ld.so/SPE...

https://www.openbsd.org/innovations.html

No other OS has this.

geofft · on Jan 8, 2020

Linux has an equivalent feature available through the "auxiliary vector," a set of data passed as the secret fourth parameter to main() / on the stack at program startup (after the secret third parameter, envp). http://man7.org/linux/man-pages/man3/getauxval.3.html and https://lwn.net/Articles/519085/ have a description of the auxiliary vector. One key, AT_RANDOM, contains 16 bytes (128 bits) of random data which libc uses for stack protector cookies. glibc uses this to implement stack and pointer protector cookies.

(Unfortunately, glibc uses this data directly as stack and pointer protector cookies, instead of deriving something from it, which means it feels a little risky to use this to initialize a userspace PRNG. I guess you shouldn't be leaking cookies....)

Linux added this in v2.6.29 (2009) in https://github.com/torvalds/linux/commit/f06295b4 , and glibc in 2009 added support for using it if available to set up cookies (it previously read from /dev/urandom). That said, I don't really think the "we're the only OS" game is a game worth playing - if it's a security improvement, it's best if everyone has it, regardless of OS!

cosarara · on Jan 7, 2020

From a more recent OpenBSD man page:

> The original version of this random number generator used the RC4 (also known as ARC4) algorithm. In OpenBSD 5.5 it was replaced with the ChaCha20 cipher, and it may be replaced again in the future as cryptographic techniques advance. A good mnemonic is “A Replacement Call for Random”.

ATsch · on Jan 7, 2020

The idea of "randomness" being "used up", and then "running out of randomness", somehow.

So let's look at how a hypothetical CSPRNG might work. We get our random numbers by repeatedly hashing a pool of bytes, and then feeding the result, and various somewhat random events, back into the pool. Since our hash does not leak any information about the input (if it did, we'd have much bigger problems), this means attackers must guess, bit for bit, what the value of the internal pool of entropy is.

This is essentially how randomness works on Linux (they just use a stream cipher instead for performance)

This clarifies a few things:

1. even if you assume intels randomness instructions are compromised, it still is not an issue to stirr them into the pool. Attackers need to guess every single source of randomness.

2. "Running out of randomness" is nonsensical. If you couldn't guess the exact pool before, you can't suddenly start guessing the pool after pulling out 200 exabytes of randomness either.

a1369209993 · on Jan 7, 2020

There is actually a sense in which you can "use up" or "run out" of randomness; it's just almost the exact opposite of how unix-style /dev/random design thinks about it.

Basically, you[0] should think of /dev/random as having a front buffer and a back buffer. The back buffer has a certain amount of entropy in it, but you can't take part of that entropy out; the only thing you can do with it is add entropy or empty the entire back buffer into the front buffer. The front buffer doesn't have a entropy amount per se, what it has is a security rating[1]; when you empty the back buffer into it, its security rating increases up to (not plus) the number of bits in the back buffer (this is not additive; a 256-bit front buffer combined with a 256-bit back buffer produces a front buffer with 256 bits, not 512) and the back buffer goes to zero. If you keep dumping the back buffer into front buffer whenever it reaches 64 bits, you'll never have a RNG that's more than 64-bit secure.

Reading from /dev/random doesn't deplete the front buffer (because CSPRNG) or the back buffer (because it doesn't interact with the back buffer). A memory-read attack on the other hand basically sets both buffers to zero - you have to start all over again.

So you can "use up" randomness by constantly wasting it to refresh a insufficiently-strong front buffer. And you can "run out" if someone is able to read your buffers (or brute force a weak buffer in, say, 2^64 CSPRNG invocations).

0: As a designer. As a user, you should treat /dev/random (like any cryptographic primitive) as something that will look perfectly secure from the outside even if it's hopelessly broken, and investigate the details of the specific implementation you're using accordingly.

1: Just like a cryptographic algorithm; the lowest rating involved determines how secure your system is. A 512-bit RNG with a 64-bit cypher is only 64-bit secure, and a 512-bit cypher fed by a 64-bit RNG is also only 64-bit secure.

throw0101a · on Jan 7, 2020

> 2. "Running out of randomness" is nonsensical. If you couldn't guess the exact pool before, you can't suddenly start guessing the pool after pulling out 200 exabytes of randomness either.

Not entirely.

/dev/random and arc4random(4) under OpenBSD originally used the output of RC4, which has a finite state size:

* https://en.wikipedia.org/wiki/RC4

Rekeying / mixing up the state semi-regularly would reset things. It's the occasional shuffling that really helps with forward security, especially if a system has been compromised at the kernel level.

tptacek · on Jan 7, 2020

No, Arc4random didn't reveal its internal RC4 state as it ran, in the same sense that actually encrypting with RC4 doesn't deplete RC4's internal state.

throw0101a · on Jan 8, 2020

> No, Arc4random didn't reveal its internal RC4 state as it ran ...

Yes, I know. Where did I say anything about revealing? My comment was about 'running out', which is (IIRC) a limitation of some random number generators because of how they handle internal state. Now, that state may have many, many bits, but it is still finite. An analogy I've seen is like having a (paper) codebook.

Of course, if a system is compromised, and the attacker can read kernel memory, they can probably then recreate the stream--which is why (e.g.) OpenBSD stirred things up every so often.

cperciva · on Jan 7, 2020

Many implementations didn't do enough mixing before generating output, though.

Also, when you look at cache side channel attacks -- RC4 definitely publishes its internal state.

ben_bai · on Jan 7, 2020

That's why OpenBSD cut away the start of the RC4 stream (don't remember how many bytes) to make backtracking harder.

But the point is mood b.c. the stream cipher used switched from RC4 to ChaCha20 like 5 years ago. And there is no side channel attack on ChaCha20, yet.

cperciva · on Jan 8, 2020

why OpenBSD cut away the start of the RC4 stream (don't remember how many bytes) to make backtracking harder

Yes, everybody does that. But how many bytes you drop matters; over the years the recommendations have gone from 256 bytes to 512 bytes to 768 bytes to 1536 bytes to 3072 bytes as attacks have gotten better.

tptacek · on Jan 7, 2020

That's obviously true, but in the most unhelpful way possible, where you introduce a complex additional topic without explaining how it doesn't validate the previous commenter's misapprehension about how "state" works in this context.

cperciva · on Jan 7, 2020

I wasn't entirely sure if the previous commenter was confused or merely saying things in a confusing way. The fact is that with a small entropy pool and a leaky mechanism like RC4, you absolutely can run out of entropy.

jacobush · on Jan 7, 2020

That you must never use urandom for serious stuff?

Rebelgecko · on Jan 8, 2020

that /dev/random must be better than /dev/urandom because it blocks until there's "enough" entropy (where "enough" is a huge misnomer and not really something you need as long as you're using a modern OS)

devit · on Jan 7, 2020

There are two threat models against code using RNGs:

1. The adversary has an amount of computing power that is feasible as currently foreseeable: in this case, all you need are K "truly random" bits where K=128/256/512 and can then use a strong stream cipher or equivalent to generate infinite random bits, so you only need to block at boot to get those K bits, and you can even store them on disk from the previous boot and have them passed from an external source at install time

2. The adversary has unlimited computing power: in this case, you need hardware that can generate truly random bits and can only return randomness at the rate the hardware gives you the bits

Now obviously if you are using the randomness to feed an algorithm that is only strong against feasible computing power (i.e. all crypto except one-time pads) then it doesn't make sense to require resistance against unlimited computing power for the RNG.

So in practice both /dev/urandom, /dev/random, getrandom(), etc. should only resist feasible computing power, and resisting against unlimited computing power should be a special interface that is never used by default except by tool that generate one-time pads or equivalent.

xyzzyz · on Jan 7, 2020

> 2. The adversary has unlimited computing power: in this case, you need hardware that can generate truly random bits and can only return randomness at the rate the hardware gives you the bits

What would you need those bits for in that case? Literally the only things that comes to my mind is generating one time pads, as standard cryptography is useless in such scenario.

firethief · on Jan 7, 2020

Game-theoretically, you want a source of random numbers when you need to make a decision your adversary can't predict. Traditionally some cultures have (accidentally?) used bird augury for this, but obviously that won't do when you're up against Unlimited Computing Power, as birds are basically deterministic.

pdonis · on Jan 7, 2020

> There are two threat models against code using RNGs

Actually, there are three:

3. The adversary has the ability to put backdoors in your hardware so you can't trust it to give you truly random bits at all.

tatersolid · on Jan 8, 2020

> The adversary has the ability to put backdoors in your hardware so you can't trust it to give you truly random bits at all.

This makes zero sense. You trust your processor/chipset/mobo enough to do your bidding and not leak the sensitive plaintext data your app is processing.

But you don’t trust that hardware enough to give you entropy via RDRAND?

If your adversary is someone who can undetectably compromise your hardware you’ve already lost.

pdonis · on Jan 8, 2020

> You trust your processor/chipset/mobo enough to do your bidding and not leak the sensitive plaintext data your app is processing.

But you don’t trust that hardware enough to give you entropy via RDRAND?

Yes, because hardware that "leaks" app data would have to do it in ways that are easily detectable (for example, if an instruction to write a certain byte to one memory location also wrote it to some other location, or to an I/O port or a device driver). Whereas a compromised RDRAND can be undetectable--the bits it yields look random and pass all the tests I can give them for randomness, but are still predictable by the attacker.

tatersolid · on Jan 8, 2020

I think your threat modeling needs some re-prioritization.

The universe of potentially undetectable hardware compromises is a much larger threat than a potential compromise of an RNG which is certain to be under constant scrutiny by security researchers.

You assume the existence of an attacker who is able and willing to compromise RDRAND but unable ore unwilling to implement one of a million other compromises at the same time. This seems unlikely.

pdonis · on Jan 9, 2020

> This seems unlikely.

Not to me, for the reasons I've already given. I guess we'll just have to agree to disagree.

fooker · on Jan 7, 2020

With truly unlimited computing power, you would just brute force it.

Hence, this is not a credible threat model.

pczy · on Jan 7, 2020

This is the best explanation of this issue that i know of: https://www.2uo.de/myths-about-urandom

brohee · on Jan 7, 2020

That this happens right after Thomas Pornin ridiculing the blocking pool (https://research.nccgroup.com/2019/12/19/on-linuxs-random-nu..., HN discussion https://news.ycombinator.com/item?id=21843081) is obviously purely coincidental, right? Especially as it was read and commented upon by Theodore Tso that at last changed his mind...

tytso · on Jan 7, 2020

Hardly; the first version of this patch series was from August 2019 (which is before the brouhaha caused by ext4 getting optimized and causing boot-time hangs for some combinations of hardware plus some versions of systemd/udev), and the second version was from September 2019. In the second version, Andy mentioned he wanted to make further changes, and so I waited for it to be complete. I had also discussed making this change with Linus in Lisbon at the Kernel Summit last year. So this was a very well considered change that had been pending for a while, and it predates the whole getrandom boot hang issue last September. I don't like making changes like this without careful consideration.

The strongest argument in favor of not making this change was there are some (misguided) PCI compliance labs which had interpreted the PCI spec as requiring /dev/random, and changing /dev/random to work like getrandom(2) might cause problems for some enterprise companies that need PCI compliance. However, the counter-argument is that it wasn't clear that the PCI compliance labs somehow thought that /dev/random was better than getrandom(2); it was just as likely they were so clueless that they hadn't even heard about getrandom(2). And if they were that clueless, they probably wouldn't notice that /dev/random had changed.

If they really did want TrueRandom (whatever the hell that means; can you guarantee your equipment wasn't intercepted by the NSA while it was in-transit to the data center?) then the companies probably really should be using some real hardware random number generator, since on some VM's with virtio-rng, /dev/random on the guest was simply getting information piped from /dev/urandom on the host system --- and apparently that was Okey-Dokey with the PCI labs. Derp derp derpity derp....

rrauenza · on Jan 7, 2020

For anyone following along not familiar with all security acronyms, in this context PCI is Payment Card Industry not Peripheral Component Interconnect.

I was confused for a bit since we're talking about the kernel...

ATsch · on Jan 7, 2020

Many people have been ridiculing entropy tracking as pointless pseudeo-security vodoo for a very long time.

https://media.ccc.de/v/32c3-7441-the_plain_simple_reality_of...

This talk from 2015 springs to mind to me as the previous widely discussed one.

But the ncc article might of course have been the straw that healed the camels back.

tytso · on Jan 7, 2020

There was a time when some people thought that it was just fine to use MD5 and SHA1. Yarrow-160, a random number generator devised by Kelsey, Schineier, and Ferguson, used SHA1.

Entropy tracking was used in the original versions of PGP because there were those people who had a very healthy (for that time) lack of confidence in "strong cryptographic algorithms" actually being strong.

As PBS Space Time once said, discussing Pilot Wave Theory and why it's considered unorthodox when compared to the Many Worlds interpretation of Quantuum Theory, "Orthrodoxy == Radicalism plus Time". There was a time when the Many Worlds interpretation was considered out there.

Similarly, there was a time when not trusting crypto algorithms as being Forever Strong was normal, and designing a network protocol like Wireguard without algorithm agility would have been considered highly radical. Today, trusting "strong cryptographic primitives" is considered an axiom. But remember that an axiom is something that you assume to be true and use as the basis of further proofs. Just as a Voodoo practitioner assumes that their belief system is true....

tptacek · on Jan 7, 2020

PGP was designed without message authentication. The people who designed PGP had a lack of understanding of cryptography, full stop. To an extent that is because PGP is a 1990s design, and very few people had a thorough understanding at the time. But to a significant extent it is also because the PGP engineering community consisted largely of stubborn amateurs attempting to (re-)derive cryptographic science from first principles. An appeal to the healthy paranoia of PGP is not a persuasive argument.

azinman2 · on Jan 7, 2020

Has it not evolved at all in any implementation, particularly gnupg?

tptacek · on Jan 7, 2020

No, not really. The Efail attack is a pretty good example of how PGP's flawed design really just sets the system up for researchers to dunk on it; the GnuPG team's belief that they can't make breaking changes without synchronizing with the IETF OpenPGP working group ensures it'll remain like this for a long time.

See also: https://latacora.micro.blog/2019/07/16/the-pgp-problem.html

upofadown · on Jan 7, 2020

The Efail attack was almost, if not entirely, a client issue where those clients were leaking information from html emails. There were no real weaknesses in the OpenPGP standard or the GnuPG implementation of that standard.

>... the GnuPG team's belief that they can't make breaking changes without synchronizing with the IETF OpenPGP working group ...

That does not actually sound like a bad thing to me.

The linked rant against OpenPGP/GnuPG takes the form of a semi-random list of minor issues/annoyances associated with the OpenPGP standard and the GnuPG implementation mixed together in no particular order. It ends with the completely absurd solution of just abandoning email all together. So you have to explain which parts of it support your contention.

The OpenPGP standard is in reality one of the better written and implemented standards in computing (which isn't saying much). There may in the future be something better but it is downright irresponsible to slag it without coming up with any sort of alternative. It is here and it works.

tptacek · on Jan 7, 2020

I think it's interesting that when a pattern of vulnerabilities is discovered that exfiltrates the plaintext of PGP-encrypted emails, a pattern that simply cannot occur with modern secure messaging constructions, the immediate impulse of the PGP community is to say "it's not our fault, it's not OpenPGP's fault, it's not GnuPG's fault". Like, it happened, and it happened to multiple implementations, including the most important implementations, but it's nobody's fault; it was just sort of an act of God. Like I said, interesting. Not reassuring, but interesting.

upofadown · on Jan 7, 2020

It is well known that end-points are the weak parts of any sort of privacy system involving cryptography. So not really all that interesting.

It was literally not GPG's fault or the fault of the standard.

tptacek · on Jan 7, 2020

And when a practical application of SHA-1 collision generation to PGP is found, it won't be their fault either. After all, the OpenPGP standard says they have to support SHA-1! Blame the IETF!

Stuff like this happened to TLS for a decade, and then the TLS working group wised up and redesigned the whole protocol to foreclose on these kinds of attacks. That's never going to happen with PGP, despite it having a tiny fraction of the users TLS has.

upofadown · on Jan 7, 2020

Support is different than utilization. GPG no longer uses SHA1 for message digests and has not done so for a fair time now. This what the preferences in a public key generated with gpg2 recently look like:

     Cipher: AES256, AES192, AES, 3DES
     Digest: SHA512, SHA384, SHA256, SHA224, SHA1
     Compression: ZLIB, BZIP2, ZIP, Uncompressed
     Features: MDC, Keyserver no-modify

So SHA1 is the last choice. Note that 3DES is there at the end of the symmetrical algorithm list. It ain't broken either so they still include it for backward compatibility. This is a good thing. It is essential in a privacy system for a store and forward communications medium.

akerl_ · on Jan 7, 2020

What portion of the implementations have to fall victim to the exact same misbehavior before, in your opinion, it’s plausible to suggest that the issue is a foot-gun on the part of the overall standard/ecosystem?

upofadown · on Jan 7, 2020

The list is at the bottom of this:

* https://efail.de/

Throwing out the discontinued things (Outlook 2007) and the webmail things that PGP can't be even sort of secure on (Roundcube, Horde) we end up with 7 bad clients out of a total of 27 good clients. So 26%. To get that they allegedly had to downgrade GPG.

tptacek · on Jan 8, 2020

Are you serious with this analysis? You counted the number of different client implementations? You don't think it matters that the "26%" includes GPGTools and Thunderbird?

This is like counting how many TLS implementations were vulnerable to Heartbleed. Was WolfSSL? GnuTLS? TLS.py? What, just OpenSSL? I guess things are looking pretty good!

akerl_ · on Jan 7, 2020

Sorry, to clarify, I’m not asking what percentage of clients were vulnerable in this case. I’m asking what the threshold is, beyond which you would consider the possibility that the issue was with the broader spec/ecosystem rather than the individual tools.

upofadown · on Jan 7, 2020

Obviously something higher than 26% or zero depending on what you believe happened here....

... and just to be clear, we are only talking about Efail here... There is no pattern of client information leakage issues... So it is hard to generalize.

akerl_ · on Jan 8, 2020

Regarding the threshold, no, I’m not talking about EFail, I’m speaking generally.

Much of this thread predicates on your claim that an issue which could have been caught in an end-use implementation is only a flaw in that implementation. By contrast, I’m claiming that it’s the responsibility of a secure specification and ecosystem to guard against classes of misuse, such that end-use implementations are not each individually required to mitigate that class of issue.

None of the above is specific to EFail, but the resulting threshold is noteworthy for EFail. Even if, by your numbers, we’re talking about 26% (I’m not sure why it would be “or zero depending on what you believe happened here”, since there’s not really any interpretation of what happened here where 0% of implementations were impacted), that’s a quarter of implementations that were impacted by this class of misuse (the ecosystem/specification not enforcing that a MAC check pass before returning decrypted plaintext).

As tptacek points out in a parallel comment, this is a pretty skewed measurement, because that 26% of implementations accounts for the vast majority of actual users (for example, “Thunderbird” and “GPGTools” account for the same weight in your percentage as “United Internet” and “Mailpile”). But even so, if a quarter of apples in my basket were bad, I’d potentially stop blaming individual apples and start to wonder about quality control at the grocery store.

As is exemplified by newer libraries like Nacl / libsodium, a primary goal of a strong cryptographic library is providing interfaces which make correct usage easy and avoid classes of misuse, so that authors of end-use tools are not each required to replicate the same correct sequence of sanity checking and safeguards, with any misstep being a security fault for their users.

I’m still curious for your threshold. For example, by your measurement methodology, is the EFail attack on S/MIME clients purely a client error? In that case, 23 of 25 tested end-use implementations were vulnerable, or 92%. Is 92% enough widespread impact for ownership to bubble up to the overall S/MIME specification, to guard against this kind of misuse?

upofadown · on Jan 8, 2020

OK, but this is all in response to an attempt to have me come up with an acceptable level of client information leaks. That is an obvious trap and attempt to turn this into a discussion about me. So I had fun with the idea instead.

None of this changes the fact that the GPG people claimed that the current implementation was not vulnerable with any of the clients and that the Efail people had to downgrade GPG so they could even mention GPG at all. If that is true (and there is evidence that it was) then the whole thing was just a hoax, at least as presented.

In other words, at the time of Efail, there was nothing further that the GPG people could do to work around the dodgy client implementations that were leaking data in general. They had already done it.

* https://lists.gnupg.org/pipermail/gnupg-users/2018-May/06032...

Even if the GPG implementation and/or OpenPGP standard had of been entirely broken, this is still mostly the email clients fault. The information leak from URLs loaded in HTML emails was up to that point routinely exploited. Heck, it is still routinely exploited. Efail did not actually result in a fix for all or even most of the email clients affected.

akerl_ · on Jan 8, 2020

I was curious for your thoughts on the threshold for how widespread an end-use implementation issue needs to be before you’d consider it to be exemplary of an issue with the spec/ecosystem, which is why I asked about that.

Given that you’d established that this issue, in your opinion, wasn’t with GPG but instead was with the end-use implementations, I thought that discussing that threshold would help clarify the point under discussion.

I didn’t ask about an “acceptable” level of anything, nor was this intended to make the discussion about you, except insofar as you are a party to the discussion, so I was attempting to get more details on your position.

The new position you’ve given, that EFail was a “hoax” and that GPG wasn’t vulnerable, is pretty readily false given the details already provided as part of the EFail disclosure. The claim from the GnuPG devs (between https://lists.gnupg.org/pipermail/gnupg-users/2018-May/06032... and https://lists.gnupg.org/pipermail/gnupg-users/2018-May/06033... ) is that GnuPG will print a warning if the MDC is stripped or fails to validate. This isn’t disputed by the EFail release, which notes that the issue occurs because decrypted plaintext is returned alongside the eventual warning, and that common clients will utilize the decrypted plaintext despite the warning.

This is the crux of the discussion as to whether this is an end-use implementation problem or a spec/ecosystem problem. My position is that as a security-focused tool, GnuPG should not present foot-gun opportunities like this which require all end-use implementations to handle their own validation for these kind of failure modes. A proper security-focused tool would refuse to provide decrypted plaintext in the case that the MAC check failed, because it would have required the MAC check to pass before ever starting to decrypt anything.

upofadown · on Jan 9, 2020

GPG just happened to have a check that could of prevented this particular attack if the client had done certain things in response to the failure of that check. S/MIME didn't have a check of that type and was as a result was more affected by the attack. S/MIME had one less "footgun" than GPG did. That still didn't help anyone and in this particular case make things worse in practice.

There is a tendency in these sorts of things to get so wrapped up in the details that the root issue gets forgotten about. In this case the root issue is the leakage of information from HTML emails. After that, what really matters here? What point is there in considering each and every thing that could of been different that would of prevented the attack? Sure, if I hadn't of left the house on Thursday I would not of been hit by the bus, but this particular insight is not valuable in any way.

The hoax here is the suggestion that PGP (and S/MIME for that matter) was broken in some way. The original paper was called "Efail: Breaking S/MIME and OpenPGP Email Encryption using Exfiltration Channels" which was not just misleading, it was straight up wrong.

akerl_ · on Jan 9, 2020

S/MIME didn’t have one less footgun: it had roughly the same footgun. The fact that GPG prints a warning isn’t the footgun, the footgun is the fact that decrypted plaintext was returned even if the MDC check failed.

The point of considering each thing that could have prevented an attack is clear, and is a central part of threat modeling and defense in depth. Those concepts aren’t really controversial. Thinking critically about the parts of a system that can contribute to adverse results, and then applying mitigations and avoiding pitfalls, is a pretty core part of basically all engineering (software and otherwise).

The bus analogy (if I hadn’t left the house on the day I got hit by a bus, I’d not have gotten hit) would, in a threat modeling context, be accurately identified as both ‘definitely true’ and ‘low probability’. Yes, leaving your house is dangerous, for a variety of reasons. But the relative danger of leaving your house vs not leaving your house is more questionable (staying inside the house is likewise dangerous), and the probability of leaving-the-house causing hit-by-bus is low. But a security tool returning plaintext despite a MAC fail isn’t like leaving your house, it’s like looking both ways once you’re already crossing the street. GPG warns you there’s a car coming, but you’re already standing in front of the car. A dexterous human could potentially dive out of the way, as an end-use implementation could discard the decrypted plaintext when it sees the MDC warning, but a root-cause-analysis would still rightly suggest that you should be looking both ways before crossing, rather than during.

Taking this thread in aggregate, it’s interesting to me how the goalposts keep shifting. The original thrust was “There were no real weaknesses in the OpenPGP standard or the GnuPG implementation of that standard”. When pressed, your position shifted to include “Even if the GPG implementation and/or OpenPGP standard had of been entirely broken, this is still mostly the email clients fault.” You’ve now further shifted to questioning why we should even worry about whether GPG could be better (“What point is there in considering each and every thing that could of been different that would of prevented the attack?”).

I don’t understand the rigidity with which you refuse to consider the possibility that GPG could have better handled this kind of threat.

Dylan16807 · on Jan 8, 2020

> OK, but this is all in response to an attempt to have me come up with an acceptable level of client information leaks. Since that is an obvious trap I had fun with the idea instead.

Oh, you're commenting badly on purpose because you misinterpreted something as a 'trap'? Great.

It's not a trap question. If something gets misimplemented once, it's maybe probably not an issue with the spec. If it happens over and over again, it's suspicious.

upofadown · on Jan 7, 2020

It has evolved in the standard as well as the implementations. It is a bit silly to claim that the current thing is bad because there was a program once with a similar name.

andrepd · on Jan 7, 2020

Many-worlds is still very much not the orthodox interpretation.

jabl · on Jan 7, 2020

Well, less unorthodox than hidden variable interpretations.

dfc · on Jan 7, 2020

It's version three of this patch.

JdeBP · on Jan 7, 2020

Interestingly, this follows the systemd people back in 2018 changing its seed-at-boot tool, systemd-random-seed, to write the machine-ID as the first 16 bytes of seed data to /dev/random at every seed write.

* https://github.com/systemd/systemd/commit/8ba12aef045ba1a766...

* https://www.freedesktop.org/software/systemd/man/systemd-ran...

* http://jdebp.uk./Softwares/nosh/guide/commands/machine-id.xm...

zaarn · on Jan 7, 2020

It's very amusing that the various kernel developers are bashing on GnuPG, going as far as calling it's behaviour a "misuse. Full stop."

PGP/GPG has certainly falled out of favor.

tytso · on Jan 7, 2020

I actually thought that was a bit unfair. Is it a misuse to use three times as much concrete as is strictly necessary? That would make the Empire State Building a "misuse" of concrete. Even if you aren't doing Empire State Building levels of overkill having engineering margins is a very well accepted thing. Is extracting 4096 bits from /dev/random for a 4096-bit RSA key "misuse" when said key only has between 200-300 bits of cryptographic strength? Meh.... I've got more important things to worry about; public key generation happens so rarely. And I do use a hardware random number generator[1] as a supplement when I generate new GPG keys.

[1] https://altusmetrum.org/ChaosKey/

zaarn · on Jan 7, 2020

The problem is that extracting more than you need is what ruined /dev/random for everyone.

grammarxcore · on Jan 7, 2020

So is GnuPG bad because it reads directly from /dev/random instead of using an interface like getrandom()? I'm naive enough to not know reading directly from /dev/random is bad and would love to know more.

toast0 · on Jan 7, 2020

The getrandom() syscall is relatively new. Before it was available, you had two choices.

Use a non-Linux OS with reasonable /dev/(u)random or use Linux with its Sophie's choice:

/dev/random will give you something that's probably good, but will block for good and bad reasons.

/dev/urandom will never block, including when the random system is totally unseeded.

GnuPG could not use /dev/urandom, since there was no indication of seeding, so it had to use /dev/random which blocks until the system is seeded and also when the entropy count of nebulous value was low. Most (all) BSDs have /dev/urandom the same as /dev/random, where it blocks until seeded and then never blocks again . This behavior is available in Linux with the getrandom() syscall, but perhaps GnuPG hasn't updated to use it? Also, there was some discussion in the last few months of changing the behavior of that syscall, which thankfully didn't happen, in favor of having the kernel generate some hopeful entropy on demand in case there is a caller blocked on random with an unseeded pool.

bscphil · on Jan 8, 2020

> This behavior is available in Linux with the getrandom() syscall, but perhaps GnuPG hasn't updated to use it?

GnuPG has been using getrandom() where available for over a year[1]. Obviously some distros may not yet have updated to a recent enough version, but it (and OpenSSL) are no longer among the offenders that cause /dev/random blocking hangs.

[1] https://lists.gnupg.org/pipermail/gnupg-announce/2018q4/0004...

grammarxcore · on Jan 7, 2020

So the issue is the block? I make a blocking call and another app attempts to make a call during the block and will fail if it's not expecting to wait? Is that (one of) the problem(s)?

Thanks for breaking that down for me!

toast0 · on Jan 7, 2020

So, if the random system hasn't been properly seeded, you do need to block, if you're using the random for security; especially for long term security, ex long lived keys.

The problem is, before this patch, Linux keeps track of an entropy estimate for /dev/random, and if the estimate gets too low, read requests will block. Each read reduces the estimate significantly, so something that does a lot of reads makes it hard for other programs to do any reads in a reasonable amount of time.

If you knew the system was seeded, you could use urandom instead, but there's not a great way to know. Perhaps, you could read from random the first time, and urandom for future requests in the same process... but that only helps in long running processes; also reading once from random and using it as a seed to an in-process secure random generator works almost as well. The getrandom() syscall is really the way forward, but you would need to keep old logic conditionally or accept loss of compatibility with older releases.

In summary, it's not really fair to say GnuPG is doing it wrong, when they didn't have a way to do it right.

grammarxcore · on Jan 7, 2020

Thanks! That makes sense. I appreciate you taking the time to break all that down.

ploxiln · on Jan 7, 2020

It should have read just 16 or 32 bytes from /dev/random in order to seed its own CSPRNG (at most once per process invocation, only when first needed)

tatersolid · on Jan 8, 2020

No! Per-processs CSPRNGs are a terrible idea. Fork-safety is hard. Swap-safety is hard.

ploxiln · on Jan 10, 2020

I guess all programming is kinda hard, it's the nature of expectations of modern computing.

Per-process CSPRNGs are pretty common. Most programs don't fork without exec, no problem for them. Managing a per-process CSPRNG is only hard for libraries that might be used by some programs that fork without exec, and don't want to require the program to do anything right.

No! It's not hard, just don't screw it up. This is true of most things.

tatersolid · on Jan 17, 2020

Why not just use getrandom() or CryptGenRandom() instead and simplify everything avoiding all those classes of bugs?

A user space CSPRNG is just a foot-gun waiting to go off.

tptacek · on Jan 7, 2020

In related news: https://twitter.com/matthew_d_green/status/12145739830871080...

lmm · on Jan 8, 2020

Snowden kept his communications secure using GPG. The papers he leaked told us that the NSA was reading everyone's emails, and also that they weren't able to break GPG - which made sense, GPG was the respected gold standard. For a moment it looked like GPG might finally get its day in the sun.

And then suddenly, as if overnight, the "crypto community" was all about crapping on it. Open source and open standards were suddenly not so important, for reasons that were never really explained. Proprietary "secure" hardware was suddenly fine and not worth worrying about. Automated updates from a single vendor, yeah, why not. And a theoretical cryptographic property whose real-world impact was marginal-to-nonexistent (perfect forward secrecy) was suddenly the most important thing and a reason to write off any existing cryptosystem.

Call me a conspiracy theroist, but something stinks there.

zaarn · on Jan 8, 2020

GPG is fine if properly configured and very carefully used.

The current defaults GPG presents aren't that safe anymore and everyone who wants to develop integration with GPG suffers extreme pain because for GPG therer is only the CLI Interface.

Modern E2EE-capable chat solutions are a good replacement, which are cryptographically stronger and don't have the same chances of blowing up as GPG does.

I don't think it's that much of a conspiracy there is a bit of time between those events, it's simply that in the latest years, people are advocating for security tools that prefer being resistant to misuse (GPG isn't) and safe by default (GPG isn't) over other tools.

lmm · on Jan 8, 2020

> The current defaults GPG presents aren't that safe anymore and everyone who wants to develop integration with GPG suffers extreme pain because for GPG therer is only the CLI Interface.

Entirely true.

> Modern E2EE-capable chat solutions are a good replacement, which are cryptographically stronger and don't have the same chances of blowing up as GPG does.

I'm not convinced. Most or all of these chat solutions seem to involve closed-source code, single-vendor implementations, closed networks, complicated protocols that lead to incomplete analysis, lack of pseudonymity, and an embrace of closed-source operating systems and hardware, and I think those things are still just as worrying as they were 10 years ago. I'm all for improving on the safety and usability of GPG, but I don't think the tradeoff in overall security that we're currently offered is a good one.

zaarn · on Jan 8, 2020

There is Signal (which has open forks), Matrix, XMPP and several others which support E2EE. For E-Mail there isn't a good alternative.

JdeBP · on Jan 7, 2020

It is worth considering how things appear from the perspectives of the application developers.

* https://dev.gnupg.org/T3894

zaarn · on Jan 7, 2020

Well, I got as far as "just have your distro dynamically edit grypt conf to use urandom only after startup" before I considered that the GPG devs are being weird about it. Still took them half a year to replace "read(/dev/random)" with "getrandom()".

latchkey · on Jan 7, 2020

Filed this one in 2011 and it got a lot of heated discussion...

https://bugs.launchpad.net/bugs/706011

jancsika · on Jan 7, 2020

A professional response on a bug report seeks to narrow down the possible source of a bug (if any) so it may be understood, tested, and addressed properly.

The first response to start such a process is taligent in response #22.

A useful addition to that is #23 where Steven Ayre suggests opening that as a separate bug that focuses solely on this issue.

I'm not sure what the purpose is for the other responses you received. They seem to seek to use the breadth of your issue report to widen the discussion to maximally contentious security topics.

JdeBP · on Jan 7, 2020

That's not a fair assessment of the other responses. Steve McIntyre's in #7, for one example.

csours · on Jan 7, 2020

I was looking at Java properties the other day and I thought to myself, "We still need to set /dev/./urandom in 2019?"

ktpsns · on Jan 7, 2020

This is a valid point -- most high level programming languages provide some kind of function to provide random numbers in a given interval, such as [0,1]. See also https://stackoverflow.com/questions/2572366/how-to-use-dev-r...

This is even true for shell scripting, see for instance http://www.tldp.org/LDP/abs/html/randomvar.html

saagarjha · on Jan 7, 2020

Am I correct in my understanding that /dev/random will not block anymore and behave similarly to /dev/urandom after it has been initialized? Or is there still some inherent difference between the two?

hannob · on Jan 7, 2020

The "after it has been initialized" is the inherent difference.

SignalsFromBob · on Jan 7, 2020

Are hardware RNGs, such as the ones that plug into a USB port, of any value when the RNG in Linux is good enough for generating GPG keys? I'm wondering what the use case is for people that buy them.

walterbell · on Jan 7, 2020

On the subject of TRNG, John Denker wrote a 2005 paper for using soundcard data as a source of randomness, http://www.av8n.com/turbid/

> We discuss how to configure and use turbid, which is a Hardware Random Number Generator (HRNG), also called a True Random Generator (TRNG). It is suitable for a wide range of applications, from the simplest benign applications to the most demanding high-stakes adversarial applications, including cryptography and gaming. It relies on a combination of physical process and cryptological algorithms, rather than either of those separately. It harvests randomness from physical processes, and uses that randomness efficiently. The hash saturation principle is used to distill the data, so that the output is virtually 100% random for all practical purposes. This is calculated based on physical properties of the inputs, not merely estimated by looking at the statistics of the outputs. In contrast to a Pseudo-Random Generator, it has no internal state to worry about. In particular, we describe a low-cost high-performance implementation, using the computer’s audio I/O system.

On randomness, http://www.av8n.com/turbid/paper/turbid.htm#sec-raw-randomne...

> Understanding turbid requires some interdisciplinary skills. It requires physics, analog electronics, and cryptography.

hannob · on Jan 7, 2020

This is one of these approaches that are for all practical purposes completely useless.

There's basically only three problems generating good randomness: 1. Very early during boot you don't have a lot of good sources. 2. On very constrained devices you have limited external input. 3. Bugs in the implementation.

Randomness from soundcards doesn't help with any of these. They probably aren't yet initialized at the point in time where it matters most, they don't exist for the most problematic devices and bugs are bugs, no matter what your source of randomness.

bigiain · on Jan 7, 2020

> Randomness from soundcards doesn't help with any of these.

"Randomness from soundcards" is also spectacularly unlikely to help on the cloud. Pretty sure they don't fit SoundBlaster16s to EC2 instances or Digital Ocean Droplets...

simias · on Jan 7, 2020

Given that these instances are typically virtualized I wouldn't be surprised if you could extract a decent amount of entropy just by using the system timings (interrupts, RTC sampling etc...) given that they would be affected by the other running systems. And of course there's always the network card.

In my experience it's not usually super difficult to get a decent amount of entropy on complex desktop or servers, it's only a real issue on simple embedded hardware where you might have no way to sample the environment and all the timings are ultra-deterministic. In this case I've resorted to using temperature measurements and fan speed as a source of entropy during early boot, which isn't ideal.

pjc50 · on Jan 7, 2020

> given that they would be affected by the other running systems

Philosophically, does that make it more or less random? The whole category of side-channels suggests that there are problems with sharing a system like this.

There's also the philosophical question of how cryptographically secure a fully virtual system can be when the host has full inspection and control capability. If you're running in the cloud you need to think carefully about how to delegate this to the platform if at all possible.

tinus_hn · on Jan 7, 2020

It would seem quite helpful if these virtual machines provided a virtual device that gives random numbers provided by the host device.

skrebbel · on Jan 7, 2020

I want a cloud version of Dr. Sbaitso!

simias · on Jan 7, 2020

It could potentially help with 1 if you have some early bootstrap code to configure the sound card and get some samples. I agree with your general point however.

atoav · on Jan 7, 2020

I probably don’t know enough about computers, but wouldn’t there be a way to use e.g. the digital outputs of a DAC directly even in the earliest part of boot? Is there a good reason why CPUs aren’t doing this already?

pjc50 · on Jan 7, 2020

Well, if you have one, and if the thing it's listening to is sufficiently random - both of which are in question here.

This is sort of the problem the CPU random number generator is intended to solve, but see the discussion on trust.

b2ccb2 · on Jan 7, 2020

Reminds me of Cloudflares wall of lava lamps[1], discussed here: https://news.ycombinator.com/item?id=16041295

[1] https://www.cloudflare.com/learning/ssl/lava-lamp-encryption...

brian_herman · on Jan 7, 2020

How long does the lava lamp system take to start from a cold boot?

RL_Quine · on Jan 7, 2020

About half an hour to get a solid glob going for mine, too long and you tend to end up with a large bias towards the top. I looked into this because I wanted to pre warm my lava lamp because it’s depressing to wake up to cold blobs.

amelius · on Jan 7, 2020

CPUs already have a physics-based random number generator.

https://en.wikipedia.org/wiki/RDRAND

CodeArtisan · on Jan 7, 2020

https://sharps.org/wp-content/uploads/BECKER-CHES.pdf

This paper demonstrates that by adding a small amount of dopant[1] to the RDRAND circuitry, you can weaken it enough while it still pass NIST suite. And the modification is undetectable.

In this paper we introduced a new type of sub-transistor level hardware Trojan that only requires modification of the dopant masks. No additional transistors or gates are added and no other layout mask needs to be modified. Since only changes to the metal, polysilicion or active area can be reliably detected with optical inspection, our dopant Trojans are immune to optical inspection, one of the most important Trojan detection mechanism. Also, without the ability to use optical inspection to distinguish Trojan-free from Trojan designs, it is very difficult to find a chip that can serve as a golden chip, which is needed by most post-manufacturing Trojan detection mechanisms. To demonstrate the feasibility of these Trojans in a real world scenario and to show that they can also defeat functional testing, we presented two case studies. The first case study targeted a design based on Intel’s secure RNG design. The Trojan enabled the owner of the Trojan to break any key generated by this RNG. Nevertheless, the Trojan passes the functional testing procedure recommended by Intel for its RNG design as well as the NIST random number test suite.This shows that the dopant Trojan can be used to compromise the security of a meaningful real-world target while avoiding detection by functional testing as well as Trojan detection mechanisms. To demonstrate the versatility of dopant Trojans, we also showed how they can be used to establish a hidden side-channel in an otherwise side-channel resistant design. The introduced Trojan does not change the logic value of any gate, but instead changes only the power profile of two gates. An evaluator who is not aware of the Trojan cannot attack the Trojan design using common side-channel attacks. The owner of the Trojan however can use his knowledge of the Trojan power model to establish a hidden side-channel that reliably leaks out secret keys.

[1] https://en.wikipedia.org/wiki/Doping_(semiconductor)

littlestymaar · on Jan 7, 2020

Amazing!

saagarjha · on Jan 7, 2020

Which has concerns that it may be backdoored by intelligence agencies: https://github.com/torvalds/linux/blob/6398b9fc818eea79dcd6e...

kalleboo · on Jan 7, 2020

I'm a lot less worried about an NSA backdoor than I am that Intel (or mediatek, or whatever cheap ARM license is in my router) just fucked up the implementation

throwaway2048 · on Jan 7, 2020

First generation AMD Zen CPUs returned all 1s for RDRAND instructions for instance

lmm · on Jan 8, 2020

https://dilbert.com/strip/2001-10-25

Jasper_ · on Jan 7, 2020

What's stopping the NSA from inserting a backdoor to recognize it's running kernel randomness code and change the results too? If you don't trust your CPU, you can't trust anything it does. Expecting the backdoor to show up in one solely instruction is hopelessly naive.

cesarb · on Jan 7, 2020

> What's stopping the NSA from inserting a backdoor to recognize it's running kernel randomness code and change the results too?

It's much much harder. They'd have to insert something on the frontend (where the instruction decoder is) or on the L1 instruction cache to recognize when it's running that specific piece of code; both parts are very critical for the processor performance, so every gate of delay counts. And that's before considering that the Linux kernel code changes unpredictably depending on the compiler, kernel configuration options, and kernel release. Oh, and you have to be very precise in detecting that code, to make sure nothing else misbehaves or even gets slower (some people count cycles on parts of their code, so an unexpected slowness would get noticed).

Contrast with RDRAND, an instruction which is defined to return an unpredictable value; it would be simple to make its output depend on a counter mixed with a serial number and a couple of bits of real randomness, instead of being fully random. It's not even on the performance-critical part of the chip; it's isolated on its own block, so adding a backdoor to it would cause no performance problems, and would break no other software.

feanaro · on Jan 7, 2020

Why does anyone even continue to bother arguing this?

There are ways of mixing RDRAND into the entropy pool safely and this can be done easily. Why would you deliberately choose to not mix RDRAND and use it directly instead? You wouldn't. It makes no sense. Therefore, RDRAND should be mixed into the pool, it is being mixed into the pool and there is no more reason to debate this.

tytso · on Jan 7, 2020

Yes, and Linux has done it for years. The problem is whether or not RDRAND should be trusted in the absence of sufficient estimated entropy that it should be used to unblock the CRNG during the boot process. This is what CONFIG_RANDOM_TRUST_CPU or the random.trust_cpu=on on the boot command is all about. Should RDRAND be trusted in isolation? And I'm not going to answer that for you; a cypherpunk and someone working at the NSA might have different answers to that question. And it's fundamentally a social, not a technical question.

throwaway2048 · on Jan 7, 2020

The blocking CRNG (besides the necessary early seeding) is an entirely artificial problem however..

littlestymaar · on Jan 7, 2020

“you can't be safe from all attacks” doesn't means you should not attempt to protect yourself from the obvious ones.

Jasper_ · on Jan 7, 2020

If you expect RDRAND to change its behavior to be nefarious, also expect ADD and MUL to do the same. The RDRAND conspiracy theory is bizarre because underneath lies a core belief that a malicious entity would go so far as to insert a backdoor, but put it strictly in a simple and easily avoidable place? So they're malicious entities, but can only touch certain instructions?

Malicious entities don't play by made up rules. RDRAND being a convenient scapegoat seems like exactly the thing they'd want, too.

tialaramex · on Jan 7, 2020

The concern for RDRAND was really that it might be used to exfiltrate data.

Imagine a parallel universe where everybody happily just uses RDRAND to get random numbers. For example when you connect to an HTTPS server, as part of the TLS protcol it sends you a whole bunch of random bytes (these are crucial to making TLS work, similar approaches happen in other modern protocols). But in that universe those bytes came directly from RDRAND, after all it's random...

Except, if RDRAND is actually an exfiltration route, what you've just done is carve a big hole in your security perimeter to let RDRAND's owners export whatever they want.

XORing RDRAND into an apparently random bitstream is thus safe because the bitstream is now random if either RDRAND works as intended OR your apparently random bitstream is indeed random.

mnw21cam · on Jan 7, 2020

> XORing RDRAND into an apparently random bitstream is thus safe because the bitstream is now random if either RDRAND works as intended OR your apparently random bitstream is indeed random.

That's making assumptions. For instance, it wouldn't be beyond the realms of possibility for the compromised CPU to also track which registers contain a result produced from RDRAND, and make (X XOR RDRAND) produce a predictable result. After all, RDRAND is already an undefined number, so the system can legitimately decide later what it would like it to be. Yes, it would require more silicon in the registers, re-ordering and dispatch system, and ALU, but it would be feasible.

cwzwarich · on Jan 7, 2020

A change that large (probably requiring new custom RAMs and modifications to fairly timing-constrained register renaming logic) doesn't seem feasible for somebody to insert below the RTL level without being noticed. It would be much easier to just make RDRAND somehow less random, while still passing whatever randomness test is used for qualification.

mnw21cam · on Jan 8, 2020

Wouldn't require different RAM, and wouldn't really require any different register renaming logic. It would however require an extra bit on each register to flag the value as originating from RDRAND, and a different ALU that uses that flag and changes the outcome for certain operations.

Obviously, if you store the result of RDRAND to main RAM and read it back in later, this would defeat the system as the flag wouldn't be preserved. But I'm guessing most code won't do that for performance reasons.

The simpler option of just making RDRAND predictable is less powerful, because then the operating system can compensate with randomness obtained from elsewhere. The attack above allows the CPU to actually compromise the operating system's own randomness source.

jlokier · on Jan 7, 2020

I don't see how the RDRAND change would be gotten away with either, if someone else is looking at the silicon design.

To modify RDRAND so that it is less random in a way that's useful for an attacker, yet passes statistical randomness testing by the OS and other software, would require RDRAND to implement something cryptographic, so that only the attacker, knowing a secret, can "undo" the not-really-randomness.

A new crypto block would surely be very noticable at the RTL level.

littlestymaar · on Jan 7, 2020

Another comment[1] gave a link to an existing implementation of such backdoor using only doping. It doesn't implement a cryptographic scheme but weakens the randomness in a way that still pass NIST test suite.

[1]: https://news.ycombinator.com/item?id=21979268

littlestymaar · on Jan 7, 2020

I don't get your point. Should we assume there is no backdoor at all just because we can't predict all kind of backdoors that can be?

Being defensive against RDRAND is just one (relatively easy) way to defend oneself against a (relatively easy to implement) backdoor. Yes, this defense isn't perfect, because there can be other (more difficult to escape) backdoors, but that's not a compelling reason not to avoid the “easy” ones…

imtringued · on Jan 7, 2020

Because obtaining a seed is the only non deterministic part of a RNG. Once you have the seed you can trivially predict the next numbers. Since random numbers are used to generate encryption keys, being able to manipulate random numbers also allows you to defeat encryption. The way the RDRAND backdoor would work is pretty simple. When it is activated (system wide) it simply needs to return numbers based on a deterministic RNG (with a seed known to the owners of the backdoor). To an observer it would still work as intended and there is no way to prove that there is a backdoor.

Verifying an ADD instruction is very easy and if it were to return wrong results then it would be obvious that it is buggy. Programs would cease to work. Alternatively it would have to be incredibly smart and detect the currently executed program and exfiltrate the encryption key during the execution of the encryption function.

The first backdoor scales to millions of potential targets. The second is a carefully targeted attack against a known enemy. Targeted attacks have cheaper alternatives than silicon back doors.

pdpi · on Jan 7, 2020

There's a reasonable argument to be made that limiting the backdoor to very specific instructions that specifically target the domains you care about makes it less likely that your backdoor will be triggered by accident. Escaping detection is just as important here.

CJefferson · on Jan 7, 2020

There have been bugs in RDRAND, AMD processors would keeping returning 0 in some cases.

saagarjha · on Jan 7, 2020

0xffffffff: https://arstechnica.com/gadgets/2019/10/how-a-months-old-amd...

cesarb · on Jan 7, 2020

> If you expect RDRAND to change its behavior to be nefarious, also expect ADD and MUL to do the same.

There's a very important difference: ADD and MUL are deterministic, while the output of RDRAND is random. If ADD or MUL or FDIV return incorrect results, that can be detected (as the Pentium has shown). If RDRAND returns backdoored values, it cannot be detected by only looking at its output; you have to check the circuit.

pbhjpbhj · on Jan 7, 2020

CodeArtisan posted upstream, https://news.ycombinator.com/item?id=21979268, about a method of doping that effectively makes a side-channel (crazy!).

Surely you can check the output and see if it's random? Don't these attacks rely on perturbing the RNG do it's no longer a TRNG, isn't output the only way to tell??

imtringued · on Jan 7, 2020

>Surely you can check the output and see if it's random?

No you can't. That's an inherent property of randomness. If you're lucky you can win the lottery 10 times in a row. The only thing you can verify is that the random number generator is not obviously broken (see AMD's RDRAND bug) but you can't verify that it's truly random.

> isn't output the only way to tell??

Looking at the implementation is the only way to tell.

Let's say I generate a list of one million random numbers through a trustworthy source. Now I build a random number generator that does nothing but just return numbers from the list.

There is an impractical way of verifying a random number generator that is only useful in theory. Remember how you can flip a coin and take 1000 samples and you get roughly 1/2 probability for each side? The number of samples you have to take grows with the number of outcomes. If your RNG returns a 64 bit number you can take 2^64*x samples where x is a very large number (the larger the better). x = 1 is already impractical (50 years @ 3 billion RDRAND/s) but to be really sure you would probably need x > 10000. Nobody on earth has that much time. Especially not CPU manufacturers that release new chips every year.

littlestymaar · on Jan 7, 2020

> but you can't verify that it's truly random.

Even if you cannot be 100% sure that something is not really random, there are plenty of statistical measurements that can be used to assess the quality of the output (in terms of entropy).

> Let's say I generate a list of one million random numbers through a trustworthy source. Now I build a random number generator that does nothing but just return numbers from the list.

In less than a second your generator would be stalled which is a pretty obvious flaw to see.

The real issue is cryptographic algorithms because they are designed to simulate randomness, and they are adversarially improving when statistic methods of studies become more powerful. At every single point in time, state of the art cryptography is going to be able to produce fake random that the current state of the art cryptanalysis cannot prove as not random.

jlokier · on Jan 7, 2020

> At every single point in time, state of the art cryptography is going to be able to produce fake random that the current state of the art cryptanalysis cannot prove as not random.

That may not be true. (As in, I'm not sure it is.)

For many useful crypto algorithms where we give a nominal security measure (in bits), there is a theoretical attack that requires several fewer bits but is still infeasible.

For example, we might say a crypto block takes a 128-bit key and needs an order of magnitude 2^128 attempts to brute force the key.

The crypto block remains useful when academic papers reveal how they could crake it in about 2^123 attempts.

The difference between 2^128 and 2^123 is irrelevant in practice as long as we can't approach that many calculations. But it does represent a significant bias away from "looks random if you don't know the key".

It seems plausible to me that a difference like that would manifest in statistical analysis that state of the art cryptanalysis could prove as not random by practical means, while still unable to crack (as in obtain inputs) by practical means.

littlestymaar · on Jan 8, 2020

That makes sense. I stand corrected.

saagarjha · on Jan 7, 2020

Detecting backdoored random number generators is quite difficult.

littlestymaar · on Jan 7, 2020

Or even impossible: if the not-so-random value comes from an AES stream (or any encryption method actually), it's impossible to prove as long as AES is not broken (proving that an AES stream can be distinguished from random is a sufficient definition of “broken” in the crypto community).

noodlesUK · on Jan 7, 2020

An implementation that is not good by itself. Not least because on some AMD cpus the CPU takes the XKCD approach [1] [2].

[1] https://www.xkcd.com/221/ [2] https://arstechnica.com/gadgets/2019/10/how-a-months-old-amd...

commandersaki · on Jan 9, 2020

I don't see why this is so difficult.

1. Make a kernel config option that makes /dev/urandom block until entropy pool initialises.

2. Make a dependant kernel config option so /dev/random is /dev/urandom.

There done. Everyone can have their own choice on what security they want.

edoceo · on Jan 7, 2020

This looks like it explains why syslog-ng is hanging on boot? It's trying to read a random. At least, hangs until there is some random (have to just mash the keyboard a bit)

beefhash · on Jan 7, 2020

I am now somewhat curious why syslog-ng needs random bytes on boot.

downerending · on Jan 7, 2020

This reminds me of the (possibly apocryphal) story of old VMS (?) systems hanging if their console, which was often a printer, ran out of paper.

lanternslight · on Jan 7, 2020

Ugh! That explains so much! I did not correlate the mashing of keys and mouse clicks with RNG.

emilfihlman · on Jan 7, 2020

Changing getrandom was just idiotic and breaks god damn userspace. The man page was extremely clear in documentation in the first place.

Erlich_Bachman · on Jan 7, 2020

Summarizing the article, `cat /dev/random` will still work but will never block, possibly returning random data based on less entropy than before. They claim that in the modern situation there is already enough entropy in it even for secure key generation. There seemingly will still exist a programmatic way to get a random stream based on predictable amount of entropy, but not through reading this filesystem node.

gioele · on Jan 7, 2020

> Summarizing the article, `cat /dev/random` will still work but will never block

`cat /dev/random` may still block, but only once per reboot. It may block if it is called so early that not enough entropy has been gathered yet. Once there enough entropy has been gathered it will never block again.

simias · on Jan 7, 2020

As mentioned by the article that's already the default behaviour of getrandom() and the BSDs have symlinked /dev/random to /dev/urandom for a long time already.

I think this is a change for the best, in particular this bit sounds completely true to my ears:

> Theodore Y. Ts'o, who is the maintainer of the Linux random-number subsystem, appears to have changed his mind along the way about the need for a blocking pool. He said that removing that pool would effectively get rid of the idea that Linux has a true random-number generator (TRNG), which "is not insane; this is what the *BSD's have always done". He, too, is concerned that providing a TRNG mechanism will just serve as an attractant for application developers. He also thinks that it is not really possible to guarantee a TRNG in the kernel, given all of the different types of hardware supported by Linux. Even making the facility only available to root will not solve the problem: Application programmers would give instructions requiring that their application be installed as root to be more secure, "because that way you can get access the _really_ good random numbers".

The number of time I've had to deal with security-related software and scripts who insisted in sampling /dev/random and stalling for minutes at a time...

JdeBP · on Jan 7, 2020

A minor note:

* Only FreeBSD symbolically links, and it does it in the other direction. urandom is the symbolic link to random.

* OpenBSD has four distinct character device files: random, arandom, srandom, and urandom.

* NetBSD (as of 2019) has two distinct character device files: random and urandom. They have different semantics from each other. https://netbsd.gw.com/cgi-bin/man-cgi?rnd+4+NetBSD-current

aquabeagle · on Jan 7, 2020

On OpenBSD:

    $ ls -l /dev/*random*
    lrwxr-xr-x  1 root  wheel         7 Dec 10 15:05 /dev/random@ -> urandom
    crw-r--r--  1 root  wheel   45,   0 Jan  6 15:30 /dev/urandom

JdeBP · on Jan 7, 2020

That must be a recent change.

    $ ls -F /dev/*random*
    /dev/arandom  /dev/random   /dev/srandom  /dev/urandom
    $

ben_bai · on Jan 7, 2020

Deleted in 2017. https://marc.info/?l=openbsd-cvs&m=151069089605712&w=2 you can delete arandom and srandom

Edit: better link

guenthert · on Jan 8, 2020

And you don't think they did so for a reason? If you don't care for security (and there are good reason why one wouldn't), then you can create the link yourself on those systems (or link to /dev/zero for extra low wait times). Why does now everyone have to suffer the same? What happened to "keep policy out of the kernel"?

ale_jrb · on Jan 7, 2020

I understood it that reading from `/dev/random` will still block just after boot (i.e. before it's initialised) unless you pass the new flag `GRND_INSECURE` to `getrandom`. After it's initialised, however, it will never block because it's now just using the CRNG.

richardwhiuk · on Jan 7, 2020

Please don't confuse `/dev/random` and `getrandom` - they are separate interfaces.

quotemstr · on Jan 7, 2020

Not blocking under insufficient entropy does not suddenly make that entropy available. Punting entropy collection to userspace doesn't magically allow for DoS-free random number generation --- it just transforms, silently, a condition of insufficient entropy into a subtle security vulnerability. It feels like a form of reality denial, a bit like overcommit. The more time goes by, the more I wish there were a unixlike built on robustness, determinacy, and strict resource accounting.

Hendrikto · on Jan 7, 2020

> Not blocking under insufficient entropy does not suddenly make that entropy available.

That’s why it is still blocking until it has been sufficiently initialized. After it has gathered sufficient entropy, the pool’s entropy is not exhausted by reading from it. /dev/random assumes that reading 64 bits from it will decrease the entropy in its pool by 64 bits, which is nonsense.

Linux’s PRNG is based on cryptographically strong primitives, and reading output from /dev/random does not expose its internal state.

Your pointless rant just indicates that you do not really understand what’s going on.

jerf · on Jan 7, 2020

"/dev/random assumes that reading 64 bits from it will decrease the entropy in its pool by 64 bits, which is nonsense."

To amplify Hendrikto's point, /dev/random is implemented to "believe" that if it has 128 bits of randomness, and you get 128 bits from it, it now has 0 bits of randomness in it. 0 bits of randomness means that you ought to now be able to tell me exactly what the internal state of /dev/random is. I don't mean it vaguely implies that in the English sense, I mean, that's what it mathematically means. To have zero bits of randomness is to be fully determined. Yet this is clearly false. There is no known and likely no feasible process to read all the "bits" out of /dev/random and tell me the resulting internal state. Even if there was some process to be demonstrated, it would still not necessarily result in a crack of any particular key, and it would be on the order of a high-priority security bug, but nothing more. It's not an "end of the world" scenario.

throwaway2048 · on Jan 7, 2020

Yes, this depleting entropy argument is like arguing a 128 bit AES key is no longer secure after it has encrypted 128 bits of data, and encrypting more will give up the AES private key, so the ONLY thing to do is block.

Its completely nuts.

jlokier · on Jan 7, 2020

The entropy value was designed to be a known underestimate, not an accurate estimate of the entropy available.

What that in mind, zero is ok as a value. You may not be able to calculate the state of /dev/random given the tools and techniques available to you, but that doesn't make zero an incorrect lower bound on what you could mathematically calculate from the extracted data.

tptacek · on Jan 7, 2020

In reality, the entropy estimate is of no value. See Ferguson and Schneier, who have a chapter on this.

The meaningful states of a CSPRNG are "initialized" or "not". Once initialized, there is never a diminishment of "entropy".

jlokier · on Jan 7, 2020

I agree with that, subject to the assumption that your CSPRNG is built on CS-enough primitives.

(What I disagreed with is the argument made by the GP, not you, that the Linux entropy value was incompatible with their in-principle mathematical description of "true" entropy. Pretty irrelevant to real cryptography.)

quotemstr · on Jan 7, 2020

> There is no known and likely no feasible process to read all the "bits" out of /dev/random and tell me the resulting internal state

That's fine if you trust the PRNG. Linux used to at least attempt to provide a source of true randomness. You and Hendrikto are essentially asserting that everyone ought to accept the PRNG output in lieu of true randomness. Given various compromises in RNG primitives over the years, I'm not so sure it's a good idea to completely close off the true entropy estimation to userspace. I prefer punting that choice to applications, which can use urandom or random today at their choice.

Maybe everyone should be happy with the PRNG output. T'so goes further and argues, however, that if you provide any mechanism to block on entropy (even to root only), applications will block on it (due to a perception of superiority) and so the interface must be removed from the kernel. I see this change as an imposition of policy on userspace.

aidenn0 · on Jan 7, 2020

> That's fine if you trust the PRNG. Linux used to at least attempt to provide a source of true randomness. You and Hendrikto are essentially asserting that everyone ought to accept the PRNG output in lieu of true randomness. Given various compromises in RNG primitives over the years, I'm not so sure it's a good idea to completely close off the true entropy estimation to userspace. I prefer punting that choice to applications, which can use urandom or random today at their choice.

Linux never provided a source of true randomness through /dev/random. The output of both /dev/random and /dev/urandom is from the same PRNG. The difference is that /dev/random would provide an estimate of the entropy that was input to the PRNG, and if the estimate was larger than the number of bits output, it would block.

jerf · on Jan 7, 2020

"You and Hendrikto are essentially asserting that everyone ought to accept the PRNG output in lieu of true randomness."

No, what I am asserting is simply that the idea that you drain one bit of randomness out of a pool per bit you take is not practically true unless you can actually fully determine the state of the randomness generator when you've "drained" it. No less, no more. You can't have "zero bits of entropy" and also "I still can't tell you the internal contents of the random number" generator at the same time, because the latter is "non-zero bits of entropy". Either you've got zero or you don't.

As of right now, nobody can so determine the state of the random number generator from enough bits of output, we have no reason to believe anybody ever will [1], and the real kicker is even if they someday do, it's a bug, not the retroactive destruction of all encryption ever. A SHA-1 pre-image attack is a much more serious practical attack on cryptography than someone finding a way to drain /dev/random today and get the internal state.

It's only true in theory that you've "drained" the entropy when you have access to amounts of computation that do not fit into the universe. Yes, it is still true in theory, but not in a useful way. We do not need to write our kernels as if our adversaries are turning galaxy clusters into computronium to attack our random number generator.

[1]: Please carefully note distinction between "we have no reason to believe anyone ever will" and "it is absolutely 100% certain nobody ever will". We have no choice but to operate on our best understandings of the world now. "But maybe somebody may break" doesn't apply to just the current random generator... it applies to everything including all possible proposed replacements, so it does not provide a way to make a choice.

cesarb · on Jan 7, 2020

> /dev/random assumes that reading 64 bits from it will decrease the entropy in its pool by 64 bits, which is nonsense.

It's not complete nonsense. Suppose for explanation purposes that you had a pool with only 256 bits (2^256 possible states), and you read 64 bits from it. Of these 2^256 possible states, most would not have output that exact 64-bit value you just read; on average only 2^192 of the possible states would have resulted in that output value. Therefore, once you know that 64-bit value, the pool now has only 192 bits of entropy (2^192 possible states). Read three more 64-bit values, and on average only one of the 2^256 originally possible states could have resulted in these four 64-bit values; since there's only one possible state, the pool's entropy is zero.

However, like you said "Linux’s PRNG is based on cryptographically strong primitives": the only known way to find which of the 2^256 possible states could have resulted on these four 64-bit values would be to try all of them. Which is simply not viable, even with all the computing power in the world. That is, once the number of possible states (the pool's entropy) gets above a threshold, even if you later exhaust all the theoretical entropy from it, there's still no way to know the pool's state.

tptacek · on Jan 7, 2020

This is not how the LRNG works, or has ever worked. Think of CSPRNG output as the keystream of a stream cipher, which is what it practically is. Draw 256 bytes of keystream out of a stream cipher, and you have in no meaningful sense depleted that cipher's remaining store of unpredictable keystream bytes. Were it otherwise, no modern cryptography would function.

imtringued · on Jan 7, 2020

Actually you are making exactly the mistake that this change is intended to avoid. The distinction between /dev/random and /dev/urandom makes it appear that /dev/urandom is inferior and /dev/random is the "true" random number generator but that isn't the case. They are equally good since they both wait for an initial amount of entropy and receive additional entropy from the OS even after boot. Additional entropy does make the RNG more secure [0] but reusing existing entropy doesn't make your random numbers less secure because they were created from an unpredictable source.

[0] Let's say you plug in a USB based hardware RNG, after booting. You turn a software entropy source on, after booting. /dev/random would immediately take advantage of the RNG. Already opened /dev/urandom streams wouldn't, until they are closed. (For how many people is this a critical feature?)

ryukafalz · on Jan 7, 2020

> The distinction between /dev/random and /dev/urandom makes it appear that /dev/urandom is inferior and /dev/random is the "true" random number generator but that isn't the case. They are equally good since they both wait for an initial amount of entropy and receive additional entropy from the OS even after boot.

I was under the impression that urandom does not wait for initial entropy on Linux. Am I mistaken/has this changed?

saagarjha · on Jan 7, 2020

Isn’t overcommitment a feature of most modern operating systems?

wahern · on Jan 8, 2020

Not on Windows, where it's optional on a per request basis. It's common on Unix because of fork[1], but AFAIU only Linux and FreeBSD overcommit by default for all anonymous memory allocation (i.e. malloc backing).

[1] Notably, Solaris does not overcommit for fork.

jsmith45 · on Jan 8, 2020

I was not aware of any mechanism that would cause windows to allow overcommits.

The main allocation APIs allow for a "reservation" type of allocation, which reserves some of the virtual address space in the process, without actually allocating it. But those need to be converted to real allocations before use, and doing that adds to the processes commit charge.

I never found any API for actually making memory available to a process without increasing that processes's commit charge. That would obviously be necessary for real overcommit, as to my knowledge it is supposed to be an invariant of Windows that the sumOfAllProcessesCommitCharge < physicalMemory+pageFileSize.

If an API exists for this, I must have missed it.

I suppose a process could simulate overcommit by catching the Access Violation, verifying that the violation was a read/write of a reserved page allocated with some special "virtual overcommit" allocator, and requesting that it be committed, and resuming execution. Needless to say, such user mode page fault handling will be significantly slower than a kernel doing it.

saagarjha · on Jan 8, 2020

macOS overcommits; not sure about some of the other BSDs do by default but I'm fairly sure they have knobs for it.

wahern · on Jan 8, 2020

AFAIU, macOS implements dynamic swap and I assumed it always backed anonymous memory with swap reservations. Do you have a reference?

saagarjha · on Jan 8, 2020

Hmm, maybe I'm misunderstanding? When you make a (single) patently large allocation, macOS will do nothing; it won't show up as used virtual memory and the swap size won't go up. As you write to the memory usage will go up ("Memory" in Activity Monitor) but then it will quickly start swapping out until the limits of your disk, at which point you'll get OOM'd (assuming that your memory doesn't compress well, that is…).

wahern · on Jan 9, 2020

Ah, I see. Thanks!