> Seriously, some systems only need to be up when people are actually using them. It doesn't matter if they don't work out of hours or over weekends.
OTOH "out of hours" or "over weekends" is a very good time to make batch processes happen, so for some business services it might be better to not be up during hours, but reliably be up outside of them.
An other issue with that is when the service is used internationally / globally, or even just by 24/7 businesses, and even "over weekend" becomes not necessarily a thing.
> An other issue with that is when the service is used internationally / globally
This also often happens prematurely. If the team in the US needs live data entered in Asia then there's not much you can do, they system has to be global. But an often overlooked option (today) is running multiple instances of the same software with each only needing nine fives in it's region. Even if you do need live data it might be better to have another process shuffling it between international instances. Also helps with latency.
> I've been trying to convince my boss that 'nine fives' is a perfectly acceptable target.
Unless you can force people to use your crap or your competition is barely working, it's not really acceptable. A typical customer has pretty reliable internet connection during normal operations and only rare long outages. Meaning that any unavailability and unresponsiveness that lasts more than a few seconds will be pretty visible and annoying to customers. And assuming your competition can do three-four nines, you'd need like four-five nines to do better, which is actually five-six nines as a target.
I think the parent was talking about being reliably on a specific ~50% of the time (and reliably off the rest of the time), not about outages a significant portion of the time when the customer would expect to use it. I'm not sure there are actually that many cases where that's great, but it's a different thing than what you're criticizing.
Edited to add: Wonderfully, when trying to post this I got "Sorry, we're not able to serve your requests this quickly."
Tell them to switch to OpenVMS. It was achieving in 1980's-1990's what Linux-based clouds still aren't. Example listing lots of its availability-boosting technologies:
There's a comment along those lines in the Google SRE book iirc; author was saying why should it matter to have some kind of service with 5 nines if the user's condition (flaky WiFi, laptop rebooting etc) had a way worse availability
unexpected downtime is unfortunately not planned to be 'out-of-hours' or over weekends. Made more complicated that if it's globally, it's almost never entirely 'out-of-hours' for everyone.
Seriously, some systems only need to be up when people are actually using them. It doesn't matter if they don't work out of hours or over weekends.