> With substantial hardware effort, Google was able to avoid interference, but additional isolation features could allow this to be done at higher efficiency with less effort.
This is surprising to me. Running any workload that's not your own will trash the CPU caches and will make your workload slower.
Consider for example your performance sensitive code has nothing to do for the next 500 microseconds. If the core runs some other best effort work, it will trash the CPU caches, so that after that 500 microseconds, even when that other work is immediately preempted by the kernel, your performance sensitive code is now dealing with a cold cache.
Google's Borg traces contain both cycles per instruction and last-level cache misses for every process in the traced cluster, so if you are interested you can go dive into that.
This is surprising to me. Running any workload that's not your own will trash the CPU caches and will make your workload slower.
Consider for example your performance sensitive code has nothing to do for the next 500 microseconds. If the core runs some other best effort work, it will trash the CPU caches, so that after that 500 microseconds, even when that other work is immediately preempted by the kernel, your performance sensitive code is now dealing with a cold cache.