The people screaming about ECC - meh, honestly you can grab some ecc ram and build a couple of small boxes if you want, but there's no reason you can't host a whole lot on these mini-pcs without ECC. Google had plenty of production going without ECC -- eventually they grew up though.
> In the end, we decided the non-ECC RAM risk was acceptable for every tier of service except our databases.
Jeff recognizes that using ECC is critical where criticality matters.
> we burn in every server we build with a complete run of memtestx86 and overnight prime95/mprime
I'm unable to grok whether Jeff said to do this in a response to not using ECC, or if he's just stating this is good practice regardless of the type of memory, which it is. It's like fixing all compile-time errors only to ignore run-time errors.
> I find it very, very suspicious that ECC – if it is so critical to preventing these random
Jeff decided ECC was critical enough for his RDBMS. This statement seems to be at odds with the previous paragraph.
> memory corrupting bit flips – has not already been built into every type of RAM that we ship in the ubiquitous computing devices all around the world as a cost of doing business.
We can't see "ECC everywhere" because Intel prevented ECC from being everywhere via their market muscle.
As for the studies he's quoted, great, I have no reason to doubt them from a 10,000' view. If ECC is only good for 5%, 10%, 20%, or 25% corrections (or uncorrected but reported errors), that's better than non-ECC RAM which may have let the bitflip silently persist into your Active Directory database, nuking the CEO's account while IT runs around for hours or days struggling to figure out what happened, why, and how to prevent it from happening again.
It's an easy choice to make when the cost of ECC today is low.
Checkout these links
* https://blog.codinghorror.com/the-cloud-is-just-someone-else...
* https://blog.codinghorror.com/the-scooter-computer/
The people screaming about ECC - meh, honestly you can grab some ecc ram and build a couple of small boxes if you want, but there's no reason you can't host a whole lot on these mini-pcs without ECC. Google had plenty of production going without ECC -- eventually they grew up though.
* https://blog.codinghorror.com/to-ecc-or-not-to-ecc/