Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

100% is unachievable, especially at Google scale. Your car or coffee shop or whatever also fails 0.0x% of the time. Heck the energy net in western countries doesn't reach 99.999% uptime and I would call it very dependable.

I'd say the biggest problem it the non-existent support. It's fine if 0.01% fails, as long as there is human troubleshooting or help available if necessary.



The distribution is important.

If 100% of users are experiencing 0.01% random errors it probably isn't worth bothering about.

If 0.01% of users are experiencing 100% errors than it is much more important.


Enter "windowed user-uptime"[1], which helps differentiate between these conditions.

[1] https://www.usenix.org/conference/nsdi20/presentation/hauer


Amazon seems to do a pretty good job at it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: