We store two copies of every piece of data on two different machines. When a single machine goes down, we have code for spinning up a new machine and restoring the data that was on the machine.
Over the course of a month, we usually have about one machine fail.
Over the course of a month, we usually have about one machine fail.