I don't want to give too much credit to Github, because their uptime is truly horrendous and they need to fix it. But: I've felt like its a little unfair to judge the uptime of company platforms like this; by saying "if any feature at all is down, its all down" and then translating that into 9s for the platform.
I never use Github Copilot; it does go down a lot, if their status page is to be believed; I don't really care when it goes down, because it going down doesn't bring down the rest of Github. I care about Github's uptime ignoring Copilot. Everyone's slice of what they care about is a little different, so the only correct way to speak on Github's uptime is to be precise and probably focus on a lot of the core stuff that tons of people care about and that's been struggling lately: Core git operations, website functionality, api access, actions, etc.
> I've felt like its a little unfair to judge the uptime of company platforms like this; by saying "if any feature at all is down, its all down" and then translating that into 9s for the platform.
This is definitely true.
At the same time, none of the individual services has hit 3x9 uptime in the last 90 days [0], which is their Enterprise SLA [1] ...
> "Uptime" is the percentage of total possible minutes the applicable GitHub service was available in a given calendar quarter. GitHub commits to maintain at least 99.9% Uptime for the applicable GitHub service.
> If GitHub does not meet the SLA, Customer will be entitled to service credit to Customer's account ("Service Credits") based on the calculation below ("Service Credits Calculation").
The linked document in my previous comment has more detail.
It's worth adding that big (BIG!) business clients will usually negotiate the terms for going below the SLA threshold. The goal is less to be compensated if it happens, and more to incentivize the provider to never let it happen.
Right. Basically, they give you a coupon to lower your cost of future consumption. So, you have to keep consuming the service. If you just leave, you get no rebate. Obviously, very large customers get special deals.
You're right that labelling any outage as "Github is down" is an overgeneralisation, & we should focus on bottlenecks that impact teams in a time sensitive matter, but that isn't the case here. Their most stable service (API) has only two 9s (99.69%).
They're not even struggling to get their average to three 9s, they're struggling to get ANY service to three 9s. They're struggling to get many services to two 9s.
Copilot may be the least stable at one 9, but the services I would consider most critical (Git & Actions) are also at one 9.
Most people complaining about uptime aren't free users or open-source developers. It's people whose companies are enterprise GitHub customers. It's a real problem and affects productivity.
The issue is also that those 27 hours don't happen at once, They happen in small chunks of a couple minutes which makes it happen almost everyday and has a ton of downstream build and retry issues. The resulting downtime is probably 2 orders of magnitudes higher at least.
Those 27 hours only seem to happen during the workday when I’m trying to push branches, run CI pipelines or otherwise use GitHub (I don’t use Copilot). Whatever the yearly figure, it’s been a pain in the ass these last few months and it’s unacceptable, free or no (my company pays for GitHub).
Honestly, you're right - 2̶7̵ 87+ (correction from sibling) hours per year is absolutely fine & normal for me & anything I want to run. I personally think it should be fine for everybody.
On the other hand the baseline minimal Github Enterprise plan with no features (no Copilot, GHAS, etc.) runs a medium sized company $1m+ per annum, not including pay-per-use extras like CI minutes. As an individual I'm not the target audience for that invoice, but I can envisage whomever is wanting a couple of 9s to go with it. As a treat.
This company is part of the portfolio of a $trillion+ transnational corporation. The idea that we can't judge them, when they clearly have more resources than 99% of other companies on this planet, doesn't hold up to any scrutiny.
Why defend a company that clearly doesn't care about its customers and see them as a money spigot to suck dry?
The OP clearly never says we can't judge them. He was speaking to how the uptime is measured. I'm not saying I agree or disgree with the OP but at least address the argument he's making.
It doesn't help that almost all of the big tech companies talking about 5 9s are lying about it; "Does it respond to the API at all, even with errors? It's up!" and so on. If you spend a lot of time analyzing browser traces you see errors and failures constantly from everyone, even huge companies that brag a lot about their prowess. But it's "up" even if a shard is completely down.
The five nines tech people usually are talking about is a fiction; the only place where the measure is really real is in networking, specifically service provider networking, otherwise it's often just various ways of cleverly slicing the data to keep the status screen green. A dead giveaway is a gander at the SLAs and all the ways the SLAs are basically worthless for almost everyone in the space.
See also all of the "1 hour response time" SLAs from open source wrapper companies. Yes, in one hour they will create a case and give you case ID. But that's not how they describe it.
Once you dig into the details what does it mean to have 5 9s? Some systems have a huge surface area of calls and views. If the main web page is down but the entire backend API still is responding fine is that a 'down'? Well sorta. Or what if one misc API that some users only call during onboarding is down does that count? Well technically yes.
It depends on your users and what path they use and what is the general path.
Then add in response times to those down items. Those are usually made up too.
I never use Github Copilot; it does go down a lot, if their status page is to be believed; I don't really care when it goes down, because it going down doesn't bring down the rest of Github. I care about Github's uptime ignoring Copilot. Everyone's slice of what they care about is a little different, so the only correct way to speak on Github's uptime is to be precise and probably focus on a lot of the core stuff that tons of people care about and that's been struggling lately: Core git operations, website functionality, api access, actions, etc.