Side-channel attacks like this are always a little hard to follow, and there's a lot of detail in here, so here's my best synopsis of the technical details behind why this works:
In many modern Intel CPUs (all Sandy Bridge, maybe Nehalem and Core 2 too, not Haswell) you can get cache bank conflicts, where two instructions try to read the same bank (cross-cutting segment) of the CPU cache in quick succession. When that happens one of the instructions has to wait for the other to complete, so takes fractionally longer.
If you trigger a whole bunch of these cache bank conflicts yourself, while OpenSSL is working, you'll get tiny delays in your reads from parts of the cache that OpenSSL is using. If you do this an enormous number of times, you can then look at the accumulated average delays in your requests, and work out how OpenSSL's memory accesses are distributed over these banks.
Depending on the value of one part of the secret that OpenSSL is using to sign or decrypt data (specifically, the multipliers used in the multiplication steps in RSA), its access pattern for the cache is actually slightly biased. This is the bug. Once you know which banks are most active, you know the overall access pattern, and you can use this to drastically reduce the number of possibilities for each multiplier. That gives you enough information about the value of the key that with some subsequent offline processing you can recover the rest (apparently doable in about 3 minutes on a high-end CPU).
The fix is to ensure that the patterns of cache access that OpenSSL makes are totally independent of the secret data being used. The latest patch effectively does this, fixing the bug. You can still use cache bank conflicts to work out which bits of the cache OpenSSL is hitting, but that's no longer useful information.
(I think this is broadly right; I'd love to know if there's details I'm missing though!)
Would implementation of a very minute, random delay (if you give it a range) be an acceptable fix for some of these timing attacks? I'm thinking along the lines of adding a config param to /usr/lib/ssl/openssl.cnf (Ubuntu).
No; if you add a random delay, your adversary can just repeat the attack multiple times and the timing data will average out to the real value, removing your added noise. It might increase the time/number of observed messages needed to perform the attack, but it won't defeat it.
AngrySkillzz aside, I think there's another reason this won't work. From the sound of pimterry's explanation, the timing effects happen in your own code, not OpenSSL's. It's not "hit the cache to slow down OpenSSL", but "hit the cache and see if OpenSSL slows me down". Adding random delays to OpenSSL wouldn't help guard against that.
This is unlikely to help against a determined adversary. The reason is that the bias introduced by the timing differences can still be detected using statistical methods, provided you have enough samples. (In general, I would expect that you'd need more samples when the randomness is large relative to the timing attack difference)
See this blog post [1] for examples of how differences as small as 20 microseconds over the Internet could be detected, despite the effect of latency. Latency can be considered a random variable, albeit maybe not with the same distribution as a RNG, but the same principles likely apply.
FYI, for those are just worried about their own servers, the important part is section five, which I have copied below:
-------------------------------------
Q5: Should normal users or system administrators be worried?
OpenSSL has classified this vulnerability as "low severity", which we agree with. In order to mount the attack, the attacker has to be able to run the attack code on the same machine that runs the victim code. CacheBleed is a complex attack and there are much easier-to-exploit vulnerabilities in personal computers that it is unlikely that anyone would use CacheBleed in such an environment.
Cloud servers that commonly run mutually untrusting workloads concurrently are a more realistic attack scenario. A malicious client of a cloud service could use the CacheBleed attack to spy on other clients. While cloud servers employ techniques to isolate workloads of different clients from one another, these techniques only isolate the software and do not prevent hardware-based attacks, such as CacheBleed. Nevertheless, due to the complexity of the attack and its sensitivity to system noise we believe it is not currently a practical risk to cloud environments.
The attack only works between processes running on two hyper-threads of the same core. Disabling hyper-threading is an effective mitigation against the attack.
> The attack only works between processes running on two hyper-threads of the same core. Disabling hyper-threading is an effective mitigation against the attack.
In a virtual machine on a shared cloud server, do threads from two different VMs ever share a core?
Overview direct from the article - "CacheBleed is a side-channel attack that exploits information leaks through cache-bank conflicts in Intel processors. By detecting cache-bank conflicts via minute timing variations, we are able to recover information about victim processes running on the same machine. Our attack is able to recover both 2048-bit and 4096-bit RSA secret keys from OpenSSL 1.0.2f running on Intel Sandy Bridge processors after observing only 16,000 secret-key operations (decryption, signatures). This is despite the fact that OpenSSL's RSA implementation was carefully designed to be constant time in order to protect against cache-based (and other) side-channel attacks.
While the possibility of an attack based on cache-bank conflicts has long been speculated, this is the first practical demonstration of such an attack."
I have to assume that decryption or signatures themselves are more than one operation. I don't think most people have encrypted 16,000 documents in their life times.
Am I missing something? I'm probably missing something.
The attack itself is based on Covert Channels: inadvertent information leaks that occur when two processes share a resource. They can show up whenever that is true. The way high-assurance systems tried to root them out was using a combination of manual and mathematical means. Kemmerer's Shared Resource Matrix (1983) more accessible because it's manual and straight-forward. Apply it to each layer & component of the system.
Far as hardware, there's a number of ways to mitigate stuff like this. One I only tried once was shutting off the cache or any other risk in CPU. Ouch. Conceptually easiest is to do Red-Black model where you run crypto and untrusted stuff on two different devices with interface that transfers fixed-rate, fixed-time w/ attention to error responses. That was military/NSA's trick. The next is SMP with separation kernels that keep trusted stuff on one CPU and untrusted on another again with fixed-size timing. Another is partitioned caches or caches with hard locks. They have prototypes, maybe production versions, but they're still new. Another I did was asynchronous, randomized execution for system tasks to mask timing. One researcher did the same thing even making a CPU for it.
So, there's your basic solutions to this problem that go way back. Decades, actually, if we're talking red-black or covert channel analysis. Cloud-style deployments will benefit from other techniques... maybe. Hard to tell as risk goes up. Hardware with multi-core, multi-thread, several layers of caching, and so on is not the ideal foundation for INFOSEC. That's just asking for leaks. I have no faith in long-term security of covert channel mitigations for such a setup. Although, one can design such CPU's to protect system integrity pretty well.
I think this is the tldr; When operations use the (Intel) CPU cache they complete detectably faster than when using system RAM. When OpenSSL performs an encryption/decryption (five squaring operations followed by one multiplication) it divides up which cache banks it uses by multiplier number. If an attacker on the same system has filled the CPU cache banks they can detect which section of the cache bank has been written to and how long it takes. That information and knowledge of the OpenSSL cache use obfuscation scheme give the attacker the multiplier numbers used for encryption.
These side-channel attacks are so cool. In 2012, Zhang et al demonstrated a similar cache-based side channel attack across the Xen hypervisor to extract private keys [1]. That is, one virtual machine could extract the keys of a separate virtual machine living on the same hardware.
As cool as these are, most papers indicate the difficulty of pulling these attacks off -- especially given the easier vectors out there.
Is such an attack really possible on a real system with real workloads running something in addition to an OpenSSL implementation processing keys and the attack code?
Yes, it most likely is. Adding noise in the form of other workloads only increases the time it takes. This attack takes some minutes of wall clock time, so it remains a viable attack even if there is noise, it should still be a manageable amount of time.
The important bit, if you have to decide on patching emergency:
OpenSSL has classified this vulnerability as "low severity", which we agree with. In order to mount the attack, the attacker has to be able to run the attack code on the same machine that runs the victim code. CacheBleed is a complex attack and there are much easier-to-exploit vulnerabilities in personal computers that it is unlikely that anyone would use CacheBleed in such an environment.
In many modern Intel CPUs (all Sandy Bridge, maybe Nehalem and Core 2 too, not Haswell) you can get cache bank conflicts, where two instructions try to read the same bank (cross-cutting segment) of the CPU cache in quick succession. When that happens one of the instructions has to wait for the other to complete, so takes fractionally longer.
If you trigger a whole bunch of these cache bank conflicts yourself, while OpenSSL is working, you'll get tiny delays in your reads from parts of the cache that OpenSSL is using. If you do this an enormous number of times, you can then look at the accumulated average delays in your requests, and work out how OpenSSL's memory accesses are distributed over these banks.
Depending on the value of one part of the secret that OpenSSL is using to sign or decrypt data (specifically, the multipliers used in the multiplication steps in RSA), its access pattern for the cache is actually slightly biased. This is the bug. Once you know which banks are most active, you know the overall access pattern, and you can use this to drastically reduce the number of possibilities for each multiplier. That gives you enough information about the value of the key that with some subsequent offline processing you can recover the rest (apparently doable in about 3 minutes on a high-end CPU).
The fix is to ensure that the patterns of cache access that OpenSSL makes are totally independent of the secret data being used. The latest patch effectively does this, fixing the bug. You can still use cache bank conflicts to work out which bits of the cache OpenSSL is hitting, but that's no longer useful information.
(I think this is broadly right; I'd love to know if there's details I'm missing though!)