the simple basic reality of statistics, a binomial distribution. 5 independent s...

kqr · on Oct 9, 2024

Except the binomial assumption obviously does not hold because

(a) failures are correlated, not independent, and

(b) many failures happen not at the component level but at the plane where components interact, and regardless of how much redundancy there is at the component level, there is ultimately just one plane at which they finally interact to produce a result.

lucianbr · on Oct 9, 2024

Actually making 5 completely independent systems would be exceptionally hard. No shared code or team members, no shared hardware... For example, what 5 computing platforms would you use? x86, ARM, RISC-V and...?

Math rarely applies so easily to real life. Talking about "independent" systems is cheap.

If at all possible. How would you transport yourself to work using two independent systems?

nostrademons · on Oct 9, 2024

It's relatively simple at the organizational level, just expensive (but linearly expensive, while often increasing subcomponent reliability is exponentially expensive!). Just give the same problem statement to two independent teams with two different managers, have a clear output format and success criteria, and let them make all their technical decisions independently.

Your example of "how do you transport yourself to work using two independent systems" is actually very apropos, because I and many other commuters do exactly that. If the highway is backed up, I bypass it with local roads. If everything is gridlock, I take public transportation. If public transportation isn't functioning (and it generally takes a natural disaster to knock out all the roads and public transportation, but natural disasters have happened), I work from home and telecommute. Each of these subsystems is less favored than the alternative, but it'll get me to work.

lucianbr · on Oct 9, 2024

While these are reasonable approaches, I do not think they live up to the mathematical meaning of "independent", and so invalidate the chances calculation.

Your two teams might well both use in some place in the system the same hardware or software component. This will make the probability of failure between the systems not be completely independent, for all that you paid two teams and they worked separately. Spent a lot of money, and the results will not be as expected. If they both use x86 Intel, and a Meltdown kind of thing happens, your "independent" systems will both fail from the same cause.

The transport analogy works great if you somehow imagine the transportation to be instantaneous, and only the decision to matter. But if you are already on a train and the train is delayed, you are not walking back home and taking the car. You have multiple options for transport, but you do not have a system built of independent components. You are not using the train and the car and the highway and the local roads all simultaneously.

I don't think you understand the requirements for the formula you wrote to be valid. Your examples do not fit, for all that they are reasonable and useful approaches. Your actual reliability with these approaches falls way below the multiple nines you think of.