No, my response is attached to the correct comment of yours. I directly responded to the two "problems" you brought up, but I will now attempt to do so in more detail, in case the problem is that my comment was not clearly expressed.
You say that adding sleeps only helps if you know what the side channel is. That's true; for example, adding sleeps won't help at all against power analysis, acoustic analysis, or cache timing attacks. But that's also true of other side-channel countermeasures. For example, using constant-time operations (as Bernstein suggests and you endorse) probably won't help against attacks using those other side channels either. Most side-channel countermeasures are only effective against the side channels they're intended to protect, or not even those.
But pdkl95 was specifically talking about over-the-network timing side-channel attacks on a protocol. Adding sleeps will defend against that, even if you don't know which part of your code has variable timing. For example, Futoransky–Saura–Waissbain 2007 published "ND2DB", a feasible over-the-network timing side-channel attack on database indices. Sleeping so that the time in between network transactions does not depend on any of the data being transmitted will defend against ND2DB as well as any timing attack against your cryptosystems, while reimplementing your cryptosystems to use only constant-time operations will not defend against Futoransky's attack.
TheLoneWolfling pointed out correctly that you can only do this imperfectly in the presence of concurrency and caches, because the network response may take longer to send if, for example, your data has been swapped to disk in the interim. That is true, but I think that you can still reduce the information leakage by two to five orders of magnitude with this approach. For example, if you choose 1100ms as the time interval between receiving the request and sending the response, and 0.1% of the time your response is delayed until 1108ms because of paging in a single disk page, you are leaking about 0.01 bits to the attacker per transaction. If your timing varied over about a 50ms range and the attacker can measure latency to within 50μs, you would have been leaking about 10 bits to the attacker per transaction.
Second, you say that you have to make sure that your noise function itself is not a target of attack, which I take to mean that you must ensure that it doesn't leak information either; for example, the predictable cycling of the low-order bits of old "rand()" implementations could tell an attacker whether the process they connected to had handled transactions from other clients in between transactions to the attacker.
While this is true, I agree with you that it is not a big problem. Here's why. As I said, randomness does not help in this case, so you can simply pick a constant time interval, or one that only depends on data the attacker can already observe, such as the response size; or you can use any CSPRNG to generate a time to add to the constant time interval.
I didn't say anything about this before, but I also agree that it's (usually) a non-problem to make a transaction's average-case time the same as its worst-case time, but I don't agree that doing so "defeats the purpose of high-speed cryptography", because high-speed cryptography generally reduces the worst-case time to an acceptable time.
Is that better? Please correct me if I've said something incorrect above.
That's much clearer to me. I think we probably agree more than we disagree.
My point, as you've acknowledged, is simple. I think, for a bunch of reasons which are more and less easy to mitigate depending, it's a good rule of thumb to just avoid key-dependent side channels altogether. If you know what you're doing, you can shrink the risks. I'm dubious about whether they can be eliminated.
But even if one is the kind of person who can argue eloquently and accurately about those risks on a mailing list (nb: not alluding to you here), one is unlikely in my experience to be the kind of person who can actually implement countermeasures to these problems reliably. I've reviewed lots of expertly developed cryptosystems and found exploitable problems. My experience suggests that even good implementors do not tend to design systems from the vantage point of attackers, which squares with my experience of every other kind of software security as well.
A caveat to all of this is that it's easy for me to opine about what I think good conservative rules of thumb are because my technical depth doesn't extend to safely implementing variably-timed crypto operations. A subject matter expert might be able to refute a lot of what I think.
Because this is a fluffy comment, a quick shotgun blast of technical responses, none of which beg for rebuttals (but feel free):
* I can think of cases where randomized delays on one code path failed to eliminate timing leaks on other code paths that turned out to be relevant to attackers.
* I'm skeptical about exploitation of microscopic timing leaks; I don't even worry about memcmp timing, really. Other people are less skeptical, and there's a line of research pursuing statistical filtering and heuristics to bring those timing channels into reach. Also, there are some secrets that are attackable essentially indefinitely.
* My thoughts about worst-case timing are probably poisoned by client reactions to the idea of slowing down crypto operations at all.
You say that adding sleeps only helps if you know what the side channel is. That's true; for example, adding sleeps won't help at all against power analysis, acoustic analysis, or cache timing attacks. But that's also true of other side-channel countermeasures. For example, using constant-time operations (as Bernstein suggests and you endorse) probably won't help against attacks using those other side channels either. Most side-channel countermeasures are only effective against the side channels they're intended to protect, or not even those.
But pdkl95 was specifically talking about over-the-network timing side-channel attacks on a protocol. Adding sleeps will defend against that, even if you don't know which part of your code has variable timing. For example, Futoransky–Saura–Waissbain 2007 published "ND2DB", a feasible over-the-network timing side-channel attack on database indices. Sleeping so that the time in between network transactions does not depend on any of the data being transmitted will defend against ND2DB as well as any timing attack against your cryptosystems, while reimplementing your cryptosystems to use only constant-time operations will not defend against Futoransky's attack.
TheLoneWolfling pointed out correctly that you can only do this imperfectly in the presence of concurrency and caches, because the network response may take longer to send if, for example, your data has been swapped to disk in the interim. That is true, but I think that you can still reduce the information leakage by two to five orders of magnitude with this approach. For example, if you choose 1100ms as the time interval between receiving the request and sending the response, and 0.1% of the time your response is delayed until 1108ms because of paging in a single disk page, you are leaking about 0.01 bits to the attacker per transaction. If your timing varied over about a 50ms range and the attacker can measure latency to within 50μs, you would have been leaking about 10 bits to the attacker per transaction.
Second, you say that you have to make sure that your noise function itself is not a target of attack, which I take to mean that you must ensure that it doesn't leak information either; for example, the predictable cycling of the low-order bits of old "rand()" implementations could tell an attacker whether the process they connected to had handled transactions from other clients in between transactions to the attacker.
While this is true, I agree with you that it is not a big problem. Here's why. As I said, randomness does not help in this case, so you can simply pick a constant time interval, or one that only depends on data the attacker can already observe, such as the response size; or you can use any CSPRNG to generate a time to add to the constant time interval.
I didn't say anything about this before, but I also agree that it's (usually) a non-problem to make a transaction's average-case time the same as its worst-case time, but I don't agree that doing so "defeats the purpose of high-speed cryptography", because high-speed cryptography generally reduces the worst-case time to an acceptable time.
Is that better? Please correct me if I've said something incorrect above.