The problem is your experience involves a hacked up Linux that was far more suitable for doing this than is the upstream. Upstream scheduler can't really deal with running a box hot with mixed batch and latency-sensitive workloads and intentionally abusive ones like yours ;-) That is partly why kubernetes doesn't even really try.
This. Some googlers forget there is a whole team of kernel devs in TI that are maintaining patched kernel (including patched CFS) specifically for Borg
I used Linux for mixed workloads (as in, my desktop that was being used for dev work was also running multi-core molecular dynamics jobs in the background). Not sure I agree completely that the Google linux kernel is significantly better at this.
At work at my new job we run mixed workloads in k8s and I don't really see a problem, but we also don't instrument well enough that I could say for sure. In our case it usually just makes sense to not oversubscribe machines (Google oversubscribed and then paid a cost due to preemptions and random job failures that got masked over by retries) by getting more machines.
I think you touch on the key issue which is the upstream scheduler does not have all the stats that you need to have confidence in the solution. You want to know how long threads are waiting to get on a CPU after becoming runnable.