Unpopular opinion, but I think many systems would benefit from a regular "downtime window". Not everything needs to be 24/7 high availability.
Maybe not every night, but if you get users accustomed to the idea that you're offline for 12 hours every Sunday morning, they will not be angry when you need to be offline for 12 hours on a Sunday morning to do maintenance.
The stock market closes, more things should close. We are paying too high of a price for 99.999% uptime when 99.9% is plenty for most applications.
The maintenance window will morph into a do-big-risky-changes window, which means everybody in engineering will have to be on-call. Many years ago, when I newly joined a FAANG, I asked, "shouldn't I run this migration after hours when load is low?" and the response was firm, "No, you'll run it when people are around to fix things". It may not always be the answer, but in general, I want to do maintenance when people are present and willing to respond, not nights and weekends when they're somewhere else and can't be found.
Make your maintenance window Tuesday morning, then. The principle holds: not every stupid website needs to be up 99.999% of the time. You are not Amazon, and hell, even Amazon probably doesn't need to be online all the time.
People can come back in a few hours to order dropshipped crap.
> Not everything needs to be 24/7 high availability.
If it makes you more money to be available 24/7 then why wouldn't you?
> Maybe not every night, but if you get users accustomed to the idea that you're offline for 12 hours every Sunday morning
Then I would use a competitor that was online, period.
Imagine Sunday morning if the only time you have to complete a certain school assignment, but Wikipedia is offline? Or you need to send messages to a few folks that they need to see by the evening, but the platform won't come online until 3pm, which means you'll need to interrupt your afternoon family time instead?
Maybe things closing works fine for your needs and your schedule. But it sure won't for everyone else. Having services that are reliable is one of the things that distinguishes developed countries from developing ones.
> If it makes you more money to be available 24/7 then why wouldn't you?
Agreed, but for a government service where you update your license, or tell them about selling a car or something, there's no real 'more' money. Being closed at 3am doesn't lose the opportunity in the way that it would if you were selling widgets. It instead forces the would-be users at 3am to wait until the morning.
> If it makes you more money to be available 24/7 then why wouldn't you?
This is the question, right? Is there anything we won't do to make money?
We used to put 9 year olds down the mines. Those little kids are good in the small tunnels. Why wouldn't you use them most profitably?
> you need to send messages to a few folks that they need to see by the evening, but the platform won't come online until 3pm, which means you'll need to interrupt your afternoon family time instead?
I guess I'm old, but I remember when there were polite hours to call people on the phone and times that you just did not call. The world carried on fine for centuries before 24/7 instant messaging became a thing.
> Having services that are reliable is one of the things that distinguishes developed countries from developing ones.
Do you think that a business that has published hours and adheres to them is "unreliable"?
It only really works where the audience is already limited in country/timezone though. Sure a global service could just stagger the downtime around the world.. but (unless you've already equivalent partitioned the infrastructure) then you're just running 24/7 with arbitrary geofencing downtime on top.
Basically this happens because the DVLA and the stock market don't have any competition. Customers in a competitive market won't be angry when you need to be offline for 12 hours every Sunday morning; they'll just switch to your competitor some Sunday, because the competitor is providing them something they value that you don't provide.
The stock markets definitely have competition. For instance Frankfurt, London, Paris or Amsterdam very much compete with each other to offer desirable conditions for investors, and companies will move their trading from one to another if it is their interest. I think the fact they close at night is a self-preservation mechanism, traders would become insane if they had to worry about their positions 24/7.
Maybe they should regulate Sunday trading hours, or unionized sysadmins should negotiate the end of on-call hours.
The red queen's race that you describe for ever-greater scale, ever-greater availability is an example of the tragedy of the commons. Think how much money and many human minds have been wasted trying to squeeze out that last .0001% of "zero downtime" when they could have been creating something new.
"Keep doing the same thing, but more of it, harder" is a recipe for a barren world of monoculture.
Bergen county NJ has blue laws that make it so non-grocery stores must be closed on Sunday’s. Maybe there’s some value in structuring a time where everybody is off?
Just like at work the only time I really get off is when all of my customers are off. It’s nice when the industry sorta shuts off for a week or so around christmas
Something like that might plausibly be correct, though you've exaggerated it to a level where it's clearly false.
If we steelman it to its most defensible essence, I think what you're saying is that the cost of the human effort needed to provide these higher uptimes exceeds the consumer benefit (the value of being able to buy a camera on Saturday), say. You could imagine, for example, that each incremental improvement in uptime wins over a proportion of the customer base providing a value that vastly exceeds its cost — but only until your competitors improve their own offering to match, so all the surplus from all this uptime improvement ultimately goes to the consumers, not the producers.
There are two related holes in this idea.
The first is that producing consumer surplus is what the economy is for, in a moral sense. The reason producing goods and services is a good thing to do is so that someone will benefit from using them! So if all the effort that sysadmins make goes into making services better for users, that's a good thing, not a bad thing.
The second is that nothing is stopping a new entrant from offering a new, low-cost service that isn't as reliable. If the cost of providing all that extra reliability (bundled into the incumbents' pricing scheme) is higher than the actual benefit to users, the users will switch to the lower-cost, less-reliable service. This has happened many times, in fact: less-reliable minicomputers stole business from mainframes, less-reliable VoIP stole business from ATM and SONET and SDH, all kinds of less-reliable plastic goods have stolen business from all-metal versions, and now solar panels are stealing business from coal power plants even though solar panel "uptime" is like 30%.
So the particular market dynamics we're talking about actually sensitively optimize the amount of effort given to uptime to the economic optimum. There do exist lots of market failures, but the particular dynamic we're discussing is the opposite extreme from something like a dollar auction.
Thank you for at least considering the idea seriously. I think the flaw with your argument is the assumption that price-sensitive consumers are paying directly for these services. Advertisers pay google, not people who use Gmail. In many cases, there's no way for a competitor to compete on price offering a cheaper service - because it's already "free" in the mind of consumers.
Also, please note: scheduled downtime is not the same thing as "less-reliable". A service that is always available when it promises to be might be said to be more reliable than a service that offered more availability but failed unpredictably at random.
I think it's important to consider ideas seriously.
These are interesting points, and it's true that I wasn't thinking about things from those angles. I'm not sure if the differences are relevant, though?
One of the interesting things that came out of Google's "SRE" system is that they deliberately add outages if they don't have enough. They learned years ago that if you build a service that promises 99% uptime and deliver 99.99% uptime, other people in the company will come to depend on that 99.99% uptime unintentionally. So they chaos-monkey it to ensure that the inevitable failures aren't catastrophic.
Maybe not every night, but if you get users accustomed to the idea that you're offline for 12 hours every Sunday morning, they will not be angry when you need to be offline for 12 hours on a Sunday morning to do maintenance.
The stock market closes, more things should close. We are paying too high of a price for 99.999% uptime when 99.9% is plenty for most applications.