"The percentile values we observed had quite unusual distribution"
It is pretty normal to have a long tail in the last few percentile/tenths. Fortunately, most latency monitoring tools do account for this now. Another gotcha is that 99.9 isn't the end of your tail either. Sometimes looking at the "100th percentile" request isn't useful/such an outlier, but you should know it exists.
(regarding "Speculative task") Also called "Hedged Request" here in an article called "The Tail at Scale"[1]
Note that overhead that matters for Google doesn't matter as much for most other companies. Don't overcomplicate things just because Google does it like this.
(regarding "Speculative task") Also called "Hedged Request" here in an article called "The Tail at Scale"[1]
[1] http://www-inst.eecs.berkeley.edu/~cs252/sp17/papers/TheTail...