Hacker News new | past | comments | ask | show | jobs | submit login
Constant-Time Mul (bearssl.org)
57 points by lelf on July 9, 2019 | hide | past | favorite | 28 comments



In an ideal world, we should teach compilers how to do constant-time operations and/or intrinsics, and then use those compilers to write cryptographic code. The compiler could then use primitives that work in constant time on the target platform (implementing them without hardware support if necessary for a given target) and produce a compile-time error if it can't produce constant-time code for a given target or for a given piece of code.



This could be implemented as a compiler intrinsic.


I'm not an expert at all, but from what I've read about the C standard and how compilers interpret it, it seems dangerous to implement things like this (constant time integer multiplication) as C macros.

It should probably be done in (inline) assembly, to avoid the compiler optimizing away the extra "pointless" bits for instance. Just because a favorite compiler doesn't do it today, doesn't mean it's safe.

I'm sure the actual authors of code like this have analyzed the resulting code and know more than I do about the safety of the operations involved, but I didn't see that covered in the article.


The reason not to implement this in inline assembly is that you would have to implement unique assembly for each architecture supported, which would mean probably dozens of different implementations, especially considering that even on a given architecture there are notable processor-family differences in whether the same multiply instruction is constant time or not.

In any case, I think it's safe to say that 'pornin knows more about writing safe C code (and dealing with strange compiler behavior) than almost anyone.


But that's the core of the problem -- the code has to be architecture specific because the constant nature of integer multiplication depends on the processor architecture. Once you're writing a multiple for an XX-generation Intel processor, you might as well write it as assembly, rather than have a MUL_XX macro that will probably do the right thing.


> I think it's safe to say that 'pornin knows more about writing safe C code > (and dealing with strange compiler behavior) than almost anyone.

Yeah, I agree hehe:

https://stackexchange.com/leagues/162/alltime/security


Constant-time operations feel like such a hack.

I mean, sure, if we could ensure that all operations were always constant-time, that'd be a rather simple, stress-free way to get time-invariance. But the low-level implementation assumptions feel flimsy.

Instead, if we want time-invariance, seems like we should just have dispatchers handle it -- this is, if we want the system to hide timing information, we should explicitly specify this behavior rather than trying to get it as a side-effect.

---

Note: We don't actually need strict time-invariance so much as for any time-variance to not leak information about secrets. For example, if a dispatcher takes an hour to resume a process, then sure an attacker might very easily pick up on that time-variance introduced by the dispatcher, but so long as it doesn't reflect on any secret information, who cares?


One problem I see is in determining the amount of time before resuming. You have to be sure that all possible inputs will take less time than what you request, but you don’t want to wait forever. You also have to be robust against disturbances due to things like system load. If you specify a maximum time based on the no-load scenario, then an attacker discovers that they can bog down your system by flooding it with packets while it does the computation, you’re back to leaking info.


Yeah, a dispatcher would have to be designed to guard its secrets. For example, it'd have to ensure that it can do tasks in whatever window it needs to, have a logic for simulating CPU-load if an attacker might be able to watch the CPU's process load, potentially force the CPU to burn more power if we're trying to protect against power-analysis, etc..

Logically, it'd seem like a more complex task since we'd have to keep track of what secrets we're guarding and planning an execution strategy to meet those criteria. But, it feels cleaner to acknowledge these criteria and ensure that they're being addressed.


It would be interesting to know what the ARM SecurCore SC300 processor[1] does, if that is constant time multiply or if they just expect you will use the hardware crypto accelerators that are commonly integrated with it, e.g. in the ST33 [2].

[1] https://developer.arm.com/ip-products/processors/securcore/s... [2] https://www.st.com/en/secure-mcus/st33-arm-sc300.html


I believe SC300 code is just a C-M3, so timings would be the same as C-M3 (variable based on input). ARM even says the programer's model and all other m3 docs apply on the SC300 site


Ah thanks, that explains why I couldn't find the programmers manual for it directly


Isn't there some way to just pad up the variable time that some sensitive operation takes, by spinning on a fine-grained clock register until it reaches a predetermined value? This could be calibrated by timing a known worst case.


This adds noise to the signal. That makes it harder to recover the original signal, but not impossible. You can overcome it by taking more samples.


”spinning on a fine-grained clock register until it reaches a predetermined value” would require a second thread to do that counting, and isn’t guaranteed to be constant time.

Also, even if it were, one probably could detect that the CPU was just spinning from a side channel, for example by tracking CPU sleep state, cache line pressure or contention for integer units in the CPU.


Can't constant-time multiplication techniques somehow fall victim to that level of visbility?

(If we can track details like contention for integer units in the CPU, why don't we just peek at what bits are churning through the registers?)


No; the core goal of side-channel-free algorithms (of which constant time algorithms are a subset) is that the instructions executed by the CPU should not depend on secret data.

Spinning until a fixed time is reached could conceivably leak information if another thread can time math-unit bandwidth; it would be able to distinguish the spinwait from bona-fide mathematical computation.

However, no (known) method accessible to code running on the machine can distinguish between an adder unit summing zeroes versus key data.


Can't we throw a few math instructions into the spin loop?


-> Can we throw exactly the same number of math (micro)operations into the spin loop?

-> Can we just make sure the exact same operation operations occur?


The first idea seems way easier to achieve and maintain. It can be implemented at higher level around a complex, lumped operation, like calculating a signature, without having to go into all of that operation's subroutines to change everything to constant time. We could conceivably drop it into any language, wrapping it around any crypto, while treating that as a black-box.


Could also be faster. That is to say, variable-time calculations likely have a worst case time that is better than for the equivalent fixed-time calculations, since we just write the obvious code for the CPU using its normal instructions and optimize as usual. If we are able to box the padded time fairly tightly to the worst case, we are better off performance-wise than with the fixed-time implementations.


The first is a real vulnerability on systems with hyperthreading (https://github.com/bbbrumley/portsmash)

On the other hand, as far as I know, looking at register content requires debugging rights, or would have to depend on a defect in the CPU.


Unrelated - I'm browsing this page right now over a proxy that's part of a project I'm working on in Haskell, using BearSSL for HTTPS. I can't of course speak to the correctness of the crypto primitives, but the project is small, self-contained, well written and easy to navigate such that writing the Haskell wrapper for it was relatively straight-forward. Great project.


Thomas Pornin (BearSSL) is incredibly talented.


such a great post full of gems like this:

"… Tanja Lange pointed me to this study by Wouter de Groot, who performed actual measures on some Cortex-M3 devices, from which it appeared that a short cycle count could occur not only in the documented case (operands fit in 16 bits) but also when one or both of the operands is zero or a power of 2. In the case of MUL31(), this means that the underlying umull will complete in 5 cycles, except if one or both of the two operands are zero, in which case the umull will complete in 4 cycles. …"


why can't we solve the constant-time issue by just deciding before the action how long it should take in the worst case, and then just wait for that time before continuing?


edit: see other discussion below, https://news.ycombinator.com/item?id=20395393

presumably, that would make it very slow, since for every multiplication you'd have to check a timer. you could maybe do it at a higher level, but then how would you know how long a combination of operations would take on that processor? but for a web API, adding random jitter seems like it'd force attackers to use a lot more trials to gain any timing insight.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: