Yes, I understand this -- basic risk mitigation by reducing the size of your vul...

Yes, I understand this -- basic risk mitigation by reducing the size of your vulnerability.

(I'll archaic brag a bit by mentioning I used to be a heavy user of Minix - my floppy images came in over an X25 network - and saw Andy Tanenbaum give his Minix 3 keynote at FOSDEM about a decade ago. I'm a big fan.)

Anyway, while reducing risk this way is laudable, and will improve your fleet's health, as per TFA it's a poor substitute, with bad economics and worse politics behind it, than simply stumping up for ECC.

I'll also note that, for example, Google's sitting on ~3 million servers so that ~4k LoC just blew out to 12,000,000,000 LoC -- and that's for the hypervisors only.

Multiply that out by ~50 to include VM's microkernels, and the amount of memory you've now got that is highly susceptible to undetected bit-flips is well into the mind-blowing range.