1) extra random is not going to help, in long term any "noise" disappears and signal can be seen. 2) such attacks are infeasible most of the time, so we don't need to worry
I agree that actual "signal" can still be learned over time, but if we chose additional random wait time from a distribution unknown to attacker (or even better, randomly select the distribution itself, and shuffle often), we can reduce the efficiency of attack dramatically.
But another possible way is to add a deterministic amount of additional wait time (instead of random), i.e., "lie" about computation time of all inputs consistently. If designed properly, this won't leak any signal (assuming the strategy on how to "lie" is chosen randomly itself and shuffled often, so attacker doesn't learn it).
Strategy of waiting for maximum time across all possible requests (as mentioned in your post) is sort of a special case of this, where one (deterministically) add a wait time of W - C (where W = Worst case time across all inputs, and C = actual time taken for this input). But we can use a less aggressive approach to hide the actual computation time as well.