> Does fully loading 4 cores in one server save power over fully loading 2 cores in 2 servers?
That's the premise, and I have no particular reason to doubt it. There are several levels at which it might be true, from using deeper sleep states (core level? socket level?) to going wild and de-energizing entire PDUs.
> You're likely to get better performance out of the two servers though (which might not be great, because then you have a more variable product).
Yeah, exactly, it's a double-edged sword. The fly.io article says the following...
> With strict bin packing, we end up with Katamari Damacy scheduling, where a couple overworked servers in our fleet suck up all the random jobs they come into contact with. Resource tracking is imperfect and neighbors are noisy, so this is a pretty bad customer experience.
...and I've seen problems along those lines too. State-of-the-art isolation is imperfect. E.g., some workloads gobble up the shared last-level CPU cache and thus cause neighbors' instructions-per-cycle to plummet. (It's not hard to write such an antagonist if you want to see this in action.) Still, ideally you find the limits ahead of time, so you don't think you have more headroom than you really do.
That's the premise, and I have no particular reason to doubt it. There are several levels at which it might be true, from using deeper sleep states (core level? socket level?) to going wild and de-energizing entire PDUs.
> You're likely to get better performance out of the two servers though (which might not be great, because then you have a more variable product).
Yeah, exactly, it's a double-edged sword. The fly.io article says the following...
> With strict bin packing, we end up with Katamari Damacy scheduling, where a couple overworked servers in our fleet suck up all the random jobs they come into contact with. Resource tracking is imperfect and neighbors are noisy, so this is a pretty bad customer experience.
...and I've seen problems along those lines too. State-of-the-art isolation is imperfect. E.g., some workloads gobble up the shared last-level CPU cache and thus cause neighbors' instructions-per-cycle to plummet. (It's not hard to write such an antagonist if you want to see this in action.) Still, ideally you find the limits ahead of time, so you don't think you have more headroom than you really do.