Important to note that S3 does not have any Durability SLA. We promise Durability and take it extremely seriously, but there is no SLA. Much more of an SLO
Also, “durability” is not a property you can delegate to another service. Plenty of corruption is caused in-transit, not just at rest.
If your system handles the data in any way, you must compute and validate checksums.
If you do not have end to end checksums for the data, you do not get to claim your service adopts S3’s Durability guarantees.
S3 has that many 9s because your data is checksumed by the SDK. Every service that touches that data in any way recomputes and validates that (or a bracketed) checksum. Soup de nuts. All the way to when the data gets read out again.
And there is a lot more to Durability than data corruption. Protections against accidental deletions, mutations, or other data loss events come into play too. How good is your durability SLO when you accidentally overwrite one customer’s data with another’s?
Check out some of the talks S3 has on what Durability actually means, then maybe you investigate how durable your service is.
ps: I haven’t looked at the code yet, but plan to. Maybe I’m being presumptuous and your service is fully secured. I’ll let you know if I find anything!
pps: I work for amazon but all my opinions are my own and do not necessarily reflect my employer’s. I don’t speak for Amazon in any way :D
As you allude to in your response, that's usually referred to as durability, not reliability. The home page could probably use an update there to reflect that terminology.
It's an average- presumably they don't smear files across disks byte by byte, since that would be insane. But with drives randomly breaking, at some point every copy of at least one file will go at once. With, say, a terabyte of files over a thousand years, you'd expect to lose a total number of files equal to 100Kb. So probably not even one, with some small chance of losing half a drive.
It's unavoidable that too many disk failures in quick succession lead to data-loss. For example if you store two copies, your durability rests on being able to detect a disk failure and create another copy, before the sole remaining version dies as well.
JuiceFS uses S3 as the underlying data storage, so S3 provides this durability SLA.