I think what we really need is a constant-time coprocessor. An FPGA would be ideal, since you can reconfigure it to support new/improved algorithms over time. It's much easier to defend against side-channels when you have control over the hardware. Are there any products or projects in this direction? Ideally, you'd have a drop-in software library replacement, which would redirect all relevant operations to the hardware.
> Ideally, you'd have a drop-in software library replacement, which would redirect all relevant operations to the hardware.
Many years ago OpenBSD had the /dev/crypto interface for exposing cryptographic accelerators to userland, including for hashes and ciphers (not just expensive asymmetric primitives).[1][2] And their fork of OpenSSL included built-in support for /dev/crypto, though I'm not sure whether and to what extent the application had to explicitly ask OpenSSL to use it.
IIRC, it was removed from OpenBSD because of security headaches with exposing accelerators to user land, and because the future seemed to lie in ISA extensions rather than discrete devices.
Offloading all the asymmetric ops would be a good start, since they're generally the most affected by side-channels, and performed relatively infrequently (compared to the symmetric ops)
I don't know that your topic is their specific aim, but I remember Popcorn Linux coming up on HN before: http://www.popcornlinux.org/
To some extent their goal is to enable better improve the interoperability of heterogeneous hardware.
Security is always about raising the costs of breaking through security over what the value of what is being secured. If you find the page "over the top", it is not because the page is over the top, but because you don't have anything to secure that is in the cost range of the attacks being outlined. That's great. Personally, I at best have maybe one secret that is in that value range (and on the low end at that), and the vast majority of what I deal with day to day myself is not in that range. This is probably true for the vast majority of people reading this comment as well.
However, it is not true for everybody. When you're securing something very valuable, as some people do, those problems become a big deal. Some security researchers need to be working at that level, or people with those problems will just be helpless.
I have read all about this attack. I haven't touched any settings on my computer. It isn't a problem for me. But it is a problem for other people. You're just peering in on a world that you don't live in. It's kind of like asking "Who would ever need a bodyguard? Seriously?" Well, I can neither afford one, nor do I seem to particularly need one... but I'm not the only person or type of person in the world.
> If you find the page "over the top", it is not because the page is over the top, but because you don't have anything to secure that is in the cost range of the attacks being outlined.
The page states that *all users* should disable all dynamic clock states and that all OS creators should do the same by default. That's very different than the very obvious info you just shared and in my opinion extremely over the top. What attack scenario do you think *users* would be targeted with?
Edit:
Just to be clear, if you store state secrets, are a cloud provider or otherwise hosts services that could be targeted by hertzbleed, by all means gimp your CPU if your research tells you that it is important.
There's just no reason to do so for the normal user and I think if you are in the class of people that need to secure something like that, it's narcissistic to believe they need to read the rant-ish site the OP posted to realize it.
Most notably one of the authors first points is that overclocking leads to premature failure of hardware, which is neither correct nor relevant to security. Premature failure of hardware depends on operating temperatures and it's trivial today to overclock and substantially decrease operating temperatures for any knowledgeable overclocker.
I'm not versed in timing attacks so I don't really understand the implications there, so I'll defer to the authors expertise in the regard.
Overclocking can vary. People trying to get as the absolute fastest clock rates they can without the system becoming unstable will be trying to push the CPU as close.
It is also fascinating that stability on overclocks is often not the same sort of consideration as it once was. It once was the case that overclocking was limited mostly by whatever the critical path in the processor was, where trying to go too fast meant violating the setup time for a latch or register.
That does technically happen these days too, but the fix is to increase vcore voltage, which allows for shorter setup times basically by way of being faster at charging the parasitic capacitors to the threshold voltage. But this has downsides. In certain configurations this voltage can lead to greater silicon degradation over time. This is especially true if you are running above the designed max voltage. That said recent processor designs tend to have a lot more critical path headroom than old processor designs, often allowing a pretty significant overclock without even touching the voltage, but there will be a limit.
But increased voltage or even increased frequency alone means greater heat production, and there is a practical limit of how much heat you can remove in a unit of time with normal methods like air or water cooling. Go too high, and the heat can start to damage things, or more likely the processor's internal temperature monitoring circuits will command immediate poweroff from the motherboard. Of course, many coolers won't even reach these limits, meaning that you can often solve this by upgrading cooling. And obviously active cooling techniques like liquid nitrogen can largely eliminate this as a bottleneck of interest. (Although there would be a limit even for that scenario, the input power limit will occur first).
The last bottleneck is stable power delivery. There is a limited number of power pins, each with a bottleneck of current they can stably supply. The bottleneck may be PCB traces (indeed, for AMD, PBO compatibility requires motherboards to use bigger traces than are normally mandated for the processor, to allow for greater current), the pins, or even what the VRMs can supply. The VRMs will have a limit in both how much current they can supply, but also in how quickly they can respond to changes in current draw. Trying to pull too much can result in voltage instability, which can damage things (especially if reduced load causes voltage to swing too high).
So there is plenty of room for potentially degrading a processor if you are trying to eek out the maximum overclock you can get. But there is also in most current designs plenty of room for some overclock without having to get things near the limits, which is where you really risk degrading things. After all, processors are self overclocking themselves, which is what boost clock rates really are, and the default enabled overclocking tend to be pretty conservative in terms of how close to the limits they are willing to go. Even AMD's precision boost overdrive 2 automatic overclocking feature remains more conservative than many people are when manually overclocking. I'd wager that PBO2's limits are chosen such that no meaningful degradation is likely to happen, but merely not having nearly as much safety margin as they want to have on a default enabled feature.
In any actual field of engineering, building everything on systems that fail-open this way would be unacceptable. Universally relying on CPUs that leak side channel everywhere because It Go Fast is like building with asbestos and cardboard. Meaning completely safe in a controlled environment, and often accidentally safe outside of it, but dangerous as a factory default, because humans' factory-default is to make questionable decisions without reading the manual.
I can't wait for whatever cock-up finally takes down the internet and kills enough people that, once the smoke clears, "move slowly and don't break anything" becomes a normal career path, rather than a prayer recited by jaded security professionals dual-wielding rolls of duct tape.
Put together a decorator in Python that does time.sleep to get really close to constant time operations. Am I understanding the problem wrong? I'd think that something like that could be implemented in any programming language that needs constant time operations.
> Can't the server just delay every password-checking response for exactly 1 millisecond, independently of how long the password checking took?
It's normal for computers to have many operations that they're trying to get done simultaneously. Often operations interrupt other operations, and interruptions can last for many seconds when computers run out of RAM. Building a fixed-response-time mechanism that robustly accounts for this is an open problem.
> More fundamentally, when the password check finishes and switches to idling until the response time, the computer normally switches to other operations, speeding up those operations. This is often visible through the timing of those operations even if the timing of the password-checking response is controlled.
The problem is your "constant time" operation now spends some time doing work and some time not. When it's not doing work, the CPU can perform work for other processes or threads and this is, at least in principle, measureable. Thus you haven't eliminated the side channel - just changed the way someone can measure it.
If I'm able to influence this computation to take longer than the maximum time allotted for it, this technique will fail open & allow me to measure timings. This is common in the real world, because as the attacker/a user of your application, I generally have a lot of influence on the data being passed to your computation. So I can, for instance, pass giant strings. Or conduct many attacks in parralel to increase load on your system.
You can, of course, patch this by sleeping to the nearest interval. But that brings us to a second problem, this technique is quite expensive in terms of wall-clock time.
There are defenses against those attacks. You can put a maximum on the input for example. The idea here is that you can make the time jitter depend on things besides the thing being calculated. I'd hope that anyone having to employ code like this would think about those side issues.
Random time jitter can be denoised with statistics and a large enough sample size.
Even if it weren't when you sleep the machine now has more resources to process requests and an attacker can observe this to deduce information about the secret.
Ultimately you want the resources consumed by computation handling secret information to always be equivalent. Of course this means swimming upstream of decades of programming language, compiler and hardware optimizations / research. If you really want to get it done you'll find yourself writing assembly and getting really familiar with cpu behaviors.
You didn't read the code. What it tries to do is aim to standardize execution time of a function to a degree larger than the time it takes. Of course that isn't possible. But the jitter will be due to other things than the time it took to execute the workload. Autoscaling will make sure that lack of resources won't reveal secrets and if everything running on the same machine is similarly capped there isn't a side channel where you can measure anything. Throughput shouldn't be affected a lot since we are just artificially delaying the response somewhat, not increasing the computational workload of each request.
May I gently submit that it would be both more respectful of other people's time and more availing to yourself personally to dig into the information people are providing rather than debating them about the efficacy of this particular technique? I feel like there's a pattern here of people bringing up subtle CPU effects and you being dismissive of them. But subtle CPU effects are the bread and butter of this research.
This is a really deep topic that resists easy answers.