There are defenses against those attacks. You can put a maximum on the input for example. The idea here is that you can make the time jitter depend on things besides the thing being calculated. I'd hope that anyone having to employ code like this would think about those side issues.
Random time jitter can be denoised with statistics and a large enough sample size.
Even if it weren't when you sleep the machine now has more resources to process requests and an attacker can observe this to deduce information about the secret.
Ultimately you want the resources consumed by computation handling secret information to always be equivalent. Of course this means swimming upstream of decades of programming language, compiler and hardware optimizations / research. If you really want to get it done you'll find yourself writing assembly and getting really familiar with cpu behaviors.
You didn't read the code. What it tries to do is aim to standardize execution time of a function to a degree larger than the time it takes. Of course that isn't possible. But the jitter will be due to other things than the time it took to execute the workload. Autoscaling will make sure that lack of resources won't reveal secrets and if everything running on the same machine is similarly capped there isn't a side channel where you can measure anything. Throughput shouldn't be affected a lot since we are just artificially delaying the response somewhat, not increasing the computational workload of each request.
May I gently submit that it would be both more respectful of other people's time and more availing to yourself personally to dig into the information people are providing rather than debating them about the efficacy of this particular technique? I feel like there's a pattern here of people bringing up subtle CPU effects and you being dismissive of them. But subtle CPU effects are the bread and butter of this research.
This is a really deep topic that resists easy answers.