The problem is your experience involves a hacked up Linux that was far more suit...

dilyevsky · on Nov 8, 2023

This. Some googlers forget there is a whole team of kernel devs in TI that are maintaining patched kernel (including patched CFS) specifically for Borg

dekhn · on Nov 8, 2023

I used Linux for mixed workloads (as in, my desktop that was being used for dev work was also running multi-core molecular dynamics jobs in the background). Not sure I agree completely that the Google linux kernel is significantly better at this.

At work at my new job we run mixed workloads in k8s and I don't really see a problem, but we also don't instrument well enough that I could say for sure. In our case it usually just makes sense to not oversubscribe machines (Google oversubscribed and then paid a cost due to preemptions and random job failures that got masked over by retries) by getting more machines.

jeffbee · on Nov 8, 2023

I think you touch on the key issue which is the upstream scheduler does not have all the stats that you need to have confidence in the solution. You want to know how long threads are waiting to get on a CPU after becoming runnable.