Yes, I understand this -- basic risk mitigation by reducing the size of your vulnerability.
(I'll archaic brag a bit by mentioning I used to be a heavy user of Minix - my floppy images came in over an X25 network - and saw Andy Tanenbaum give his Minix 3 keynote at FOSDEM about a decade ago. I'm a big fan.)
Anyway, while reducing risk this way is laudable, and will improve your fleet's health, as per TFA it's a poor substitute, with bad economics and worse politics behind it, than simply stumping up for ECC.
I'll also note that, for example, Google's sitting on ~3 million servers so that ~4k LoC just blew out to 12,000,000,000 LoC -- and that's for the hypervisors only.
Multiply that out by ~50 to include VM's microkernels, and the amount of memory you've now got that is highly susceptible to undetected bit-flips is well into the mind-blowing range.
Oh i'm not saying it's the single best solution, I guess I got carried away in argument - It's simply a scenario where the concept shines, yet it's entirely artificial scenario and I agree ECC is the correct way.
(I'll archaic brag a bit by mentioning I used to be a heavy user of Minix - my floppy images came in over an X25 network - and saw Andy Tanenbaum give his Minix 3 keynote at FOSDEM about a decade ago. I'm a big fan.)
Anyway, while reducing risk this way is laudable, and will improve your fleet's health, as per TFA it's a poor substitute, with bad economics and worse politics behind it, than simply stumping up for ECC.
I'll also note that, for example, Google's sitting on ~3 million servers so that ~4k LoC just blew out to 12,000,000,000 LoC -- and that's for the hypervisors only.
Multiply that out by ~50 to include VM's microkernels, and the amount of memory you've now got that is highly susceptible to undetected bit-flips is well into the mind-blowing range.