GitHub was down

sebmellen · 2024-08-14T23:06:01 1723676761

I've never seen an outage this big. Even the homepage doesn't load. We've had recurrent issues with Actions not running, but this seems a lot bigger.

The status page says all is well, though: https://www.githubstatus.com/. Hilarious.

karmakaze · 2024-08-14T23:11:40 1723677100

I get the angry unicorn page "No server is currently available to service your request. Sorry about that. Please try refreshing and contact us if the problem persists. Contact Support — GitHub Status — @githubstatus" with that last link going to https://x.com/githubstatus showing "GitHub Status Oct 22, 2018 Everything operating normally."

kalkin · 2024-08-14T23:18:41 1723677521

I think this is because logged-out Twitter now shows top Tweets of all time from a user, rather than most recent Tweets.

Good reason why companies shouldn't be using Twitter/X for status updates anymore!

xp84 · 2024-08-14T23:37:09 1723678629

Thank you! I was wondering why all I could see was useless content there!

gamer191 · 2024-08-14T23:42:52 1723678972

Use https://xcancel.com/ (eg https://xcancel.com/githubstatus)

TwiztidK · 2024-08-14T23:21:18 1723677678

The era of Twitter/X status pages needs to come to an end given how unusable it is if you aren't logged in.

blt · 2024-08-15T05:17:05 1723699025

Making logins required to view twitter was the ultimate bed shitting move. The whole point of twitter was to be a broadcast medium. Tweets were viewable without following or logging in. There is a huge vacuum in that space now.

raxxorraxor · 2024-08-15T07:18:56 1723706336

For most (social media) platforms really. Management believes it would force users to sign up, but in reality the platform just becomes less relevant because of that limitation. Not even talking about search crawlers.

An all around stupid decision. That said, if management is that shitty, the platform probably won't be attractive for long anyway.

Facebook/Instagram were successful despite that to a degree, but this decision probably still did a lot of damage to their relevancy and user numbers.

blt · 2024-08-15T21:06:06 1723755966

I don't really agree with that. Facebook was originally about mirroring your real-life social network. In the mid 00's nobody was trying to get likes from strangers on Facebook.

Instagram is closer to broadcast, but it was always closely tied to the mobile app experience and the "follower" mentality. People didn't really share links to Instagram posts in other online venues in the beginning.

Twitter was always unique. It existed before smartphones, and there was a good chunk of years where people without smartphones would read twitter posts on desktops. Its producer/consumer distribution is much more skewed, many twitter users never post. Tweets were always getting posted to places like HN, reddit, discussed in news articles, etc.

I think Twitter's (former) position as a broadcast medium à la TV, radio, and newspapers is unique among social networks. There's a reason why Twitter was the place for journalists, politicians, academics, fire departments, web service status alerts, etc.

disgruntledphd2 · 2024-08-15T07:38:39 1723707519

> Facebook/Instagram were successful despite that to a degree, but this decision probably still did a lot of damage to their relevancy and user numbers.

FB/IG/Whatsapp have half of humanity logging into their services once per month, so I'm not sure how much better they could be doing if they didn't have a login wall.

Meanwhile, Twitter (with no login wall) never broke 500mn. Like, personally I totally take your point about status updates but I'd have used my Twitter account a lot more if I'd needed to log in to see the content.

temp0826 · 2024-08-15T00:03:52 1723680232

Used to work ops at AWS. I don't know if it's still the case but it required VERY HIGH management approval to actually flip any lights on their "status page" (likely it was referenced in some way for SLAs and refunding customers).

smsm42 · 2024-08-15T00:14:39 1723680879

That is an excellent illustration to Goodhart's law. We're going to have this avesome status page, but since if we update it the clients would notice the system is down, we're going to put a lot of barriers to putting the actual status on that page.

Also probably a class action suit lurking somewhere in there eventually.

purkka · 2024-08-14T23:16:50 1723677410

I have to wonder how a company at the scale of GitHub can be so bad at keeping track of their status.

Now 4 out of 10 services are marked as "Incident", yet most of the others are also completely dead.

xuancanh · 2024-08-14T23:29:25 1723678165

It's because of the way most companies build their status dashboards. There are usually at least 2 dashboards, one internal dashboard and one external dashboard. The internal dashboard is the actual monitoring dashboard, where it will be hooked up with other monitoring data sources. The external status dashboard is just for customer communication. Only after the outage/degradation is confirmed internally, then the external dashboard will be updated to avoid flaky monitors and alerts. It will also affect SLAs so it needs multiple levels of approval to change the status, that's why there are some delays.

ParetoOptimal · 2024-08-14T23:43:04 1723678984

> The external status dashboard is just for customer communication. Only after the outage/degradation is confirmed internally, then the external dashboard will be updated to avoid flaky monitors and alerts. It will also affect SLAs so it needs multiple levels of approval to change the status, that's why there are some delays.

This defeats the purpose of a status dashboard and is effectively useless in practice most of the time from a consumers point of view.

consteval · 2024-08-15T13:40:42 1723729242

From a business perspective, I think given the choice to lie a little bit or be brutally honest with your customers, lying a bit is almost always the correct choice.

ParetoOptimal · 2024-08-15T16:18:22 1723738702

My ideal would be if regulations which made it necessary that downtime metrics had to be reported with at most somewhere between a 10m and 30m delay as "suspected reliability issue".

If your reliability metrics have lots of false positives, that's on you and you'll have to write down some reason why those false positives exist every time.

Then that company could decide for itself whether to update manually with "not a reliability issue because X".

This lets consumers avoid being gaslighted and businesses don't technically have to call it downtime.

insane_dreamer · 2024-08-15T14:03:45 1723730625

Liability is their primary concern

x86a · 2024-08-14T23:20:31 1723677631

This is intentional. It's mostly a matter of discussing how to communicate it publicly and when to flip the switch to start the SLA timer. Also coordinating incident response during a huge outage is always challenging.

thiagocsf · 2024-08-15T02:30:33 1723689033

That it may be but there’s no excuse.

Declare an incident first, investigate later.

Cheating SLAs by delaying the incident is a good way to erode trust within and without.

antimemetics · 2024-08-15T04:56:30 1723697790

> Declare an incident first, investigate later.

If that would be the best way to deal with it- why is literally no one doing it this way and what does that tell you?

adgjlsfhk1 · 2024-08-15T05:14:02 1723698842

because it involves admitting that you messed up which companies are often disensentivized to do

ErikBjare · 2024-08-15T08:16:11 1723709771

False positives?

saul-paterson · 2024-08-15T06:33:51 1723703631

FWIW, our self-hosted Gitea instance has not had a single second of unplanned downtime in five years we've been running it. And there wasn't much _planned_ downtime because it's really easy to upgrade (pull a new image and recreate the container — takes out the instance for maybe 15 seconds late at night), and full backups are handled live thanks to zfs.

Migration to a new host takes another 15 seconds thanks to both zfs and containers.

I don't know how many GitHub downtime reports I've seen during that time, we're probably into high dozens by now.

chrisallenlane · 2024-08-15T12:07:09 1723723629

I've been running Gitea on my homelab for a few months now. It's fantastic. It's like a snapshot of a point in time when GitHub was actually good, before it got enshittified by all of the social and AI nonsense.

I've been moving most of my projects off of GitHub and into Gitea, and will continue to do so.

Lanedo · 2024-08-14T23:32:48 1723678368

Twitter now has:

We are experiencing interruptions in multiple public GitHub services. We suspect the impact is due to a database infrastructure related change that we are working on rolling back.

https://x.com/githubstatus/status/1823864449494569023

Lanedo · 2024-08-14T23:46:56 1723679216

Github seems to be coming back up:

https://downdetector.com/status/github/

kinduff · 2024-08-14T23:12:53 1723677173

They are flipping the switches now, status page just changed.

ergocoder · 2024-08-14T23:09:58 1723676998

I wonder why the status just doesn't ping github.com for 200. That seems easy to do.

bigiain · 2024-08-15T00:59:13 1723683553

To be fair - I really couldn't care less is the homepage is loading or not.

So long as I can fetch/commit to my repos, pretty much everything else is of secondary, tertiary, or no real importance to me.

(At work, I do indeed have systems running that monitor 200 statuses from client project homepages, almost all of which show better that 99.999% uptimes. And are practically useless. Most of them also monitor "canary" API requests which I strive to keep at 99.99% but don't always manage to achieve 99.9% - which is the very best and most expensive SLA we'll commit to.)

fragmede · 2024-08-14T23:27:35 1723678055

from where? they don't only have one load balancer, so you'd still have the problem of the page showing green when it's not loading for some folk?

ergocoder · 2024-08-14T23:40:05 1723678805

At Github's scale, why wouldn't they put a ping monitor from every continent at least?

Then, you would show the status based on the continent.

fragmede · 2024-08-15T04:09:29 1723694969

Where on the continent? GitHub is undoubtedly doing blackbox testing internally and has multiple such monitors but that's not going to capture every customer's route to them, leading to the same problem - customers experience GitHub being down, despite monitoring saying it's mostly up. Thus the impass. Even doing whitebox testing, where you know the internals and can this place sensors intelligently, even just for ingress, you're still at the mercy of the Internet.

If a sensor that's basically in the same datacenter says you're up, but the route into the datacenter is down, then what? multiply this by the complexity of the whole site, and monitoring it all with 100% fidelity is impossible. Not that it's not worth it to try, there's a team at GitHub that works on monitoring, but beyond motivation about keeping the SLA up, as a customer, unless you notice it's down, is it really down? In a globally distributed system, downtime, except for catastrophic downtime like this, is hard to define on a whole-site basis for all customers.

laserlight · 2024-08-15T05:35:34 1723700134

> 100% fidelity is impossible

I don't think anybody asked for 100% fidelity. We are talking about a complete outage that affected at least North America and Europe. If the status page shows green in such a case, its fidelity is around 50%. People expect better from GitHub.

fragmede · 2024-08-22T05:59:47 1724306387

The amount of moaning that the status page wasn't updated in 0 seconds and had the wrong status for entire minutes is what leads me to believe that no, users do expect 100a% fidelity.

Total outages are rare enough, and there's enough other work, that spending time building a system for that, just doesn't seem like the best use of their time. though I'm biased, having faced that exact question from the inside, at different company.

ergocoder · 2024-08-15T07:34:36 1723707276

> monitoring it all with 100% fidelity is impossible

This is impossible regardless of how godlike the design is... Nobody is asking for 100% fidelity.

intelVISA · 2024-08-15T03:11:31 1723691491

That would be self-defeating given that it's a Rails app.

tinyhitman · 2024-08-14T23:10:34 1723677034

delaying SLA

sebmellen · 2024-08-14T23:10:58 1723677058

This is at least a multi-million dollar payout (if they admit to it).

All GitHub Pages say

> We're having a really bad day.

> The Unicorns have taken over. We're doing our best to get them under control and get GitHub back up and running.

ljahier · 2024-08-14T23:58:47 1723679927

At the moment, all github services seem to be restored, and the github status indicates that the problem is still ongoing. I don't think it's related to the SLA, but rather to the monitoring, which is not live. There are a few minutes of delay.

cbates · 2024-08-14T23:24:47 1723677887

Seems slightly unproffesional for a massive company like Github/Microsoft.

xp84 · 2024-08-14T23:38:58 1723678738

I disagree. This hurts no one, and not everything needs to be sanitized and painted over with bland corporatespeak.

majewsky · 2024-08-15T08:57:42 1723712262

I don't think they were asking for corporate speak. But at least I would find a plain technical error message like "cannot contact file server" much more respectable than something like "unicorns are hugging our servers uwu".

COMMENT___ · 2024-08-15T18:30:38 1723746638

This “ironic” and “humorous” style of errors and UI captions is the actual new corporate speak. I’d prefer dumb error messages rather than some shit someone over the ocean thinks is smart and humorous. And it’s not funny at all when it’s a global outage impacting my business and my $$$.

colimbarna · 2024-08-14T23:51:43 1723679503

It's closer to the truth than you usually get. They're having a bad day, it's completely true. It's the start of my day, but I guess this is the middle of the night for them. There's no such thing as unicorns, but that just highlights the metaphorical nature of the remaining claim - getting Unicorns under control means solving their problems. Normally "professional" corporate speak means avoiding saying anything whose meaning is plain on its face and disconfirmable while avoiding the implication that the company is run and operated by humans. This is a model. (Obviously the came up with the message in advance, which just goes to show that someone in the company is well enough rounded to know that if it is displayed, they're having a bad day.)

wrs · 2024-08-15T01:11:16 1723684276

GitHub is (was?) a Rails application, so it was probably originally running behind Unicorn [0], if it isn’t still. So the unicorns are (were) real.

[0] https://en.wikipedia.org/wiki/Unicorn_(web_server)

rvz · 2024-08-14T23:27:44 1723678064

Looks like we have a full house outage at GitHub with everything down. Much worse than the so-called Twitter / X recent speed-bump that was screeched at and quickly forgotten.

I don't think GitHub has recovered from the monthly incidents that keeps occurring. Quite frankly it is the expectation that something will go down every month at GitHub which shows how unreliable the service is and this has happened for years.

I guess this 4 year old prediction post really aged well after all about self-hosting and not going all in on Github [0]

[0] https://news.ycombinator.com/item?id=22868406

dataspun · 2024-08-14T23:40:48 1723678848

statute of limitations for HN comment predictions is 3 years.

TacticalCoder · 2024-08-15T12:11:39 1723723899

> I've never seen an outage this big.

I remember a time when systems would boast about their "five nines" uptime. It was before anything "cloud" appeared.

rozenmd · 2024-08-15T21:23:49 1723757029

Here, we caught 35 minutes of downtime: https://github.onlineornot.com/incidents/6Yyj8YWD94zE

manquer · 2024-08-14T23:13:28 1723677208

Status page updates with "degraded availability". lol

RIMR · 2024-08-14T23:13:31 1723677211

Wow, the status page only just now started reporting issues, and it still doesn't seem to communicate the scale of the issue.

People use this page for guidance. I guess now we know how much it can be trusted.

ikiris · 2024-08-14T23:15:37 1723677337

It’s used to ease their comms, not a real time status board pointing at their monitoring.

bitbasher · 2024-08-14T23:07:36 1723676856

The timing is pretty uncanny. I just deployed a github page and had a DNS issue because I configured it wrong. I hit "check again" and github went down.

Hope I don't appear in the incident report.

sunrunner · 2024-08-14T23:41:54 1723678914

Perhaps this is a repeat of the Fastly incident with a customer's Varnish cache configuration causing an issue in their systems (I think this is a rough summary, I don't remember the details).

So, you're both responsible and not responsible at the same time :)

> Hope I don't appear in the incident report.

Appearing in an incident report with your HN username could be pretty funny...

RIMR · 2024-08-14T23:21:51 1723677711

This will all clear up when it finishes checking your DNS configuration I bet.

zombot · 2024-08-15T08:58:12 1723712292

So it was you who crashed GitHub?

OutOfHere · 2024-08-14T23:37:12 1723678632

Fwiw, GitHub Pages is down too. The hosted Pages sites are down.

red_Seashell_32 · 2024-08-14T23:25:57 1723677957

Wait. You use github pages for something or actually work on it?

bitbasher · 2024-08-14T23:30:21 1723678221

I use it for something.

I had a github page that was public, but it was made private and the DNS config was removed. Fast forward to today. I made the private repo public again and forced a deploy of the page without making a new commit. It said the DNS config was incomplete, so I tweaked it and hit "check again" and github went down.

Probably unrelated, but the timing was spooky.

dang · 2024-08-15T03:00:41 1723690841

Sorry for the offtopicness - would you mind emailing hn@ycombinator.com so I can check in with you about a couple things regarding https://news.ycombinator.com/item?id=41221186?

paledot · 2024-08-15T02:31:38 1723689098

Your domain isn't `null.example.com` or something, is it?

theovermage · 2024-08-14T23:38:14 1723678694

Bad bitbasher bad! :catbonk:

twp · 2024-08-14T23:05:29 1723676729

https://www.githubstatus.com/ reports no problems, but it's clearly down for a lot of people (including me).

tabbott · 2024-08-14T23:06:44 1723676804

It is kinda amazing how consistently status pages show everything fine during a total outage. It's not that hard to connect a status page to end-to-end monitoring statistics...

blinded · 2024-08-14T23:10:21 1723677021

From my experience this requires a few steps happen first:

- an incident be declared internally to github

- support / incident team submits a new status page entry (with details on service(s) impact(ed))

- incident is worked on internally

- incident fixed

- page updated

- retro posted

Even aws now seems to have some automation for their various services per region. But it doesn't automatically show issues because it could be at the customer level or subset of customers, or subset of customers if they are in region foo in AZ bar, on service version zed vs zed - 1. So they chose not to display issues for subsets.

I do agree it would be nice to have logins for the status page and then get detailed metrics based on customerid or userid. Someone start a company to compete with statuspage.

cortesoft · 2024-08-14T23:08:40 1723676920

There is always going to be SOME delay between the outage and the status page, although 5 minutes is probably enough time where it should be updated

thund · 2024-08-14T23:12:06 1723677126

after several minutes the status page is still showing all is fine.

For a service like GH, anything more than 30 secs is unacceptable

x86a · 2024-08-14T23:40:41 1723678841

That is very unrealistic. Infrastructure monitoring at that scale won't even be collecting metrics at that interval.

And simple HTTP monitoring would be too flappy for a public status page.

aeonik · 2024-08-15T11:38:30 1723721910

What monitoring tools are you using? I know a ton that can do 30 seconds or less at scale. I'm fact, I'm pretty sure all the big players can do that.

frabjoused · 2024-08-14T23:07:38 1723676858

It's simply too soon for the status page to report the anomaly, is my guess. It's been down for 4 minutes.

thih9 · 2024-08-14T23:12:10 1723677130

4 minutes is a long time for something that could have been an automated check.

For the record, the status page eventually got updated - around 7 minutes after this submission was created.

owyn · 2024-08-15T01:33:37 1723685617

Once in the past I did actually have an incident where the site went down so hard that the tool that we used to update the status page didn't work. We did move it to a totally external and independent service after that. The first service we used was more flaky than our actual site was, so it kept showing the site down when it wasn't. So then we moved to another one, etc. Job security. :)

beefsack · 2024-08-15T06:08:36 1723702116

They say you shouldn't host status pages on the same infrastructure that it is monitoring, but in a way that makes it much more accurate and responsive in outages!

kredd · 2024-08-14T23:07:57 1723676877

It went down literally 3 minutes ago (I was in the middle of writing a PR comment), let's see if their cron job kicks in and reports the issue.

thund · 2024-08-14T23:13:00 1723677180

it's starting to show now, about 10 minutes after the issue started

agosz · 2024-08-14T23:13:14 1723677194

It's showing a few incidents now. Some things are still green though that don't seem to be working.

remram · 2024-08-14T23:35:03 1723678503

@dang https://www.githubstatus.com/incidents/kz4khcgdsfdv is probably a better link for this submission now

pietroppeter · 2024-08-15T07:35:51 1723707351

I should have looked for this before posting the same comment. Upvoted :)

erksa · 2024-08-14T23:40:23 1723678823

The mobile app on iOS is a 503 with

```

Received a 503 error. Data returned as a String was: <!DOCTYPE html> <!- -

Hello future GitHubber! I bet you're here to remove those nasty inline styles, DRY up these templates and make 'em nice and re-usable, right?

Please, don't. https://github.co...

```

That's where it's cut off on my screen.

Curious what the link is :)

I like to think, someone did.

arjvik · 2024-08-15T19:17:31 1723749451

Anyone seen the full text of the error page?

kgrax01 · 2024-08-14T23:28:30 1723678110

Could it have been brought down intentionally? Related to this?

https://www.bleepingcomputer.com/news/security/github-action...

johnnypangs · 2024-08-15T05:18:59 1723699139

Seems like it was a config change that cause it. They reverted it reality quickly.

https://www.githubstatus.com/incidents/kz4khcgdsfdv

bamboozled · 2024-08-14T23:44:25 1723679065

How would customer credentials being leaked be part of an outage of this size ?

kgrax01 · 2024-08-14T23:45:30 1723679130

If its enough of a security issue they could have pulled the site while its fixed/cleaned

bamboozled · 2024-08-16T02:05:26 1723773926

If that was the case, you'd think it would've been a less dramatic outage right? Maybe even a status page to indicate that?

fragmede · 2024-08-15T01:24:56 1723685096

Because there are worse things than being down; if the front page got hacked and is spewing gore or CSAM or PII or creds, for example.

low_tech_punk · 2024-08-14T23:32:36 1723678356

All the AI-native developers are twiddling their thumbs because Copilot is out of office.

j3s · 2024-08-14T23:09:50 1723676990

for everyone complaining about the status page - status pages are normally operated by hand by design, and will rarely reflect things in real-time.

give the poor github ops folks a second to get things moving.

manquer · 2024-08-14T23:26:32 1723677992

Most status page products integrate to monitoring tools like Datadog[1], large teams like github would have it automated.

You ideally do not want to be making a decision on whether to update a status page or not during the first few minutes of an incident, bean counters inevitably tend to get involved to delay/not declare downtime if there is a manual process.

It is more likely the threshold is kept a bit higher than couple of minutes to reducing false positives rates, not because of manual updates.

[1] https://www.atlassian.com/software/statuspage/integrations

xyzzy_plugh · 2024-08-14T23:43:08 1723678988

Nah, _most_ status pages are hand updated to avoid false positives, and to avoid alerting customers when they otherwise would not have noticed. Very, very few organizations go out of their way to _tell_ customers they failed to meet their SLA proactively. GitHub's SLA remedy clause even stipulates that the customer is responsible for tracking availability, which GitHub will then work to confirm.

startages · 2024-08-14T23:19:21 1723677561

It's 00:16, just about to go to bed, I ran `git push` and it's not working. Check Github, says it's down, I think it's only me, maybe I'm blocked, Github can't be down. Come here to check and it's down for everyone, such a relief.

testergrave · 2024-08-14T23:47:20 1723679240

Really it was a relief. Same case for me. Now I have no energy to push my code. Tomorrow maybe

gray_-_wolf · 2024-08-14T23:09:07 1723676947

The macho unicorn is kinda dope though.

suyash · 2024-08-14T23:11:59 1723677119

RIP to all those who host their websites on GitHub pages :(

bogwog · 2024-08-14T23:36:05 1723678565

Anybody who publishes an app on the Google Play store and hosts their privacy policy on Github pages may have their app taken down because Google's bots won't be able to verify it exists.

That happened to me a while back with an app listing that was almost 10 years old because the server I was hosting the policy on went down. Ironically, I switched it to Github pages so it wouldn't happen again.

testergrave · 2024-08-14T23:46:26 1723679186

I have my client's app policy on GitHub. I have to check if anything happened to it. The websites are working fine

jacooper · 2024-08-14T23:35:19 1723678519

First time in a while that pages goes down with github itself, it's usually separate from the main site.

iNate2000 · 2024-08-14T23:58:11 1723679891

I checked, and my pages-based site was down, but it is back up now.

dlahoda · 2024-08-14T23:11:08 1723677068

I see more and more people use less Github, but some other git solutions. I am afraid to think what to do when GitHub is down for hours (need to learn maillists?).

Another reason is that MS may be in phase when it will ask to pay for using GitHub just for reads (rate limiter).

curtis3389 · 2024-08-14T23:22:44 1723677764

I recently looked into using Git in a decentralized way. It's actually pretty easy!

When you would usually create a PR, you use `git format-patch` to create a patch file and send that to whoever is going to merge it.

They create a branch and use `git am` to apply the patch to it, review the changes, and merge it to main.

It is nice that git supports multiple remotes, though. It feels good to know that `git push` might not work for my project right now, but I know `git push srht` will get the code off of my laptop.

Gormo · 2024-08-15T12:33:29 1723725209

> I recently looked into using Git in a decentralized way. It's actually pretty easy!

Well, that's how it was designed to work! The whole point of Git is that it's a distributed version control system, and doesn't need to rely on a centralized source of truth.

outworlder · 2024-08-14T23:59:13 1723679953

I used to work at a company with very draconian policies. Whenever I needed to update some code on a public GitHub repository, I would just push to a remote that was a flash drive. Plug it in my machine at home, pull from that remote, push to origin.

I also had to setup a bidirectional mirror back when bandwidth to some countries was restricted. We would push and pull as normal, and a job would keep our mainline in sync.

It is sad that most organizations forget that git is distributed by nature. We often get requests to setup VPNs and all sorts of craziness, when a simple push to a bare mirror would suffice. You don't even need anything running, other than SSH.

__float · 2024-08-15T01:10:28 1723684228

Draconian policies...but not security ones? Why were USB drives not blocked?

katzinsky · 2024-08-15T02:40:13 1723689613

Git without github is pretty much the same as with it. It's just PRs that are different.

mhh__ · 2024-08-15T00:11:43 1723680703

emailing patches is fairly easy.

The real reason not to use github anyway though is that it's terrible (the basic "github model" for doing code review was basically made up on the back of a napkin IMO)

darkangelstorm · 2024-08-15T12:23:14 1723724594

Services that explicitly needed the API were also down, and it wasn't pretty. For example: Minecraft Mod packs that rely on SerializationIsBad all went kerplunk! I'm sure a lot of people were scratching their heads yesterday wondering why they couldn't do anything for a time.

What made me laugh though was when the "X is functioning normally" immediately followed by "X is degraded, continuing to monitor" messages that kept popping up then right back to "normal" again, all in the same 30 second timespan... made me giggle

dghlsakjg · 2024-08-14T23:24:53 1723677893

https://www.githubstatus.com/

This is a pretty good place to check. The lag is pretty minimal traditionally.

At the time of posting everything is broken.

croemer · 2024-08-14T23:29:47 1723678187

Not sure what you mean by minimal lag. The status page showed all green for at least 10min while everyone got unicorns.

__float · 2024-08-15T00:58:35 1723683515

I got unicorns on the status page, even.

croemer · 2024-08-15T02:35:59 1723689359

It's unicorns all the way down!

t_believ-er873 · 2024-08-19T12:38:11 1724071091

Unfortunately, outages happen... This situation is a very good reminder of why having backups and a solid Disaster Recovery plan is crucial. Of course, it’s easy to assume that cloud services are always up, but we should never forget about outages. Setting up automated backups for repos and metadata can save a ton of headaches when things go wrong. Plus, having a Disaster Recovery plan means you’re not stuck waiting for the service to come back online—you can keep working with minimal disruption.

Here is a good article on how to prepare for the situations like that, when GitHub is down: https://gitprotect.io/blog/github-restore-and-github-disaste...

EADDRINUSE · 2024-08-14T23:12:34 1723677154

Things seems to be ack'ed: ``` Investigating - We are investigating reports of degraded availability for Actions, Pages and Pull Requests Aug 14, 2024 - 23:11 UTC ```

red_Seashell_32 · 2024-08-14T23:05:29 1723676729

Status page, like usually all green -> https://www.githubstatus.com/

Ygg2 · 2024-08-14T23:12:53 1723677173

It's yellow/red now.

martins_irbe · 2024-08-14T23:09:40 1723676980

would be better if it's down too :D

miloszgp2000 · 2024-08-20T16:12:38 1724170358

This was not the first or last GitHub outage. Unfortunately, this was a big one. That why now it is even more important to have backups of your work and restore capabilities in case of a scenario like this outage. This article sheds light on the importance of backups along with the best practices to follow:

https://gitprotect.io/blog/github-backup-best-practices/

reneberlin · 2024-08-14T23:17:32 1723677452

Yes. https://www.githubstatus.com/

sweca · 2024-08-14T23:12:33 1723677153

This will have a fun post mortem

flkiwi · 2024-08-14T23:15:40 1723677340

Me: I think I'll update nixos*

Nix: barfs voluminous errors I've never seen before

Me: whaaaat the farrrrk

* nixos updates are pulled from a github repo

arianvanp · 2024-08-15T10:03:22 1723716202

They're pulled from our CDN by default. Only if you use experimental flakes is GitHub in the loop. And even if GitHub isn't down you can't pull nixpkgs more than twice per hour without running into rate limits and get your IP banned. Don't rely on GitHub for critical infrastructure.

https://github.com/NixOS/nix/issues/6975

dlahoda · 2024-08-14T23:25:05 1723677905

Yeah, need more caches and backup git links (including local clones).

Also they had IPFS attempts, but not finished.

timetraveller26 · 2024-08-14T23:26:59 1723678019

Its all down according to https://www.githubstatus.com/

  Update - Issues is experiencing degraded availability. We are continuing to investigate.
  Aug 14, 2024 - 23:19 UTC
  Update - Git Operations is experiencing degraded availability. We are continuing to investigate.
  Aug 14, 2024 - 23:19 UTC
  Update - Packages is experiencing degraded availability. We are continuing to investigate.
  Aug 14, 2024 - 23:18 UTC
  Update - Copilot is experiencing degraded availability. We are continuing to investigate.
  Aug 14, 2024 - 23:13 UTC
  Update - Pages is experiencing degraded availability. We are continuing to investigate.
  Aug 14, 2024 - 23:12 UTC

jrop · 2024-08-14T23:37:43 1723678663

A coworker and I just had to use `git format-patch` and `git am` to exchange work. Git is super cool!

remram · 2024-08-14T23:45:17 1723679117

`git bundle` is another option for this (I'm not trying to imply it's preferable)

willchen · 2024-08-14T23:27:02 1723678022

I'm wondering why this isn't on the front page? It has a lot of points in 23 minutes.

mvdtnz · 2024-08-14T23:40:50 1723678850

HN has a strange philosophy built into its ranking algorithm that an item with a large number of comments early on should be de-ranked because the conversation is likely to be of poor quality.

j-wags · 2024-08-14T23:45:55 1723679155

Wonder if this is related to the big cyberattack on Iran earlier today https://www.jpost.com/breaking-news/article-814715

vivgui · 2024-08-14T23:13:18 1723677198

Down in the Dominican Republic as well, was just trying to commit and end my day

_andrei_ · 2024-08-14T23:22:27 1723677747

Yes, everything is down https://www.githubstatus.com/

Dibby053 · 2024-08-14T23:15:39 1723677339

Feels bad to have one's job interrupted. Looking on the bright side this is the excuse I needed to check out Radicle...

dataspun · 2024-08-14T23:33:18 1723678398

checked out radicle: doesn't do windows

keyle · 2024-08-14T23:22:24 1723677744

Yes it went down about 5 mins ago, I got the angry unicorn. Since then the status page is increasingly red.

Seeing it all kind of went sideways at the same time, my money is on the typical load balancer config rollout snafu.

      "As part of a routine configuration deploym..." [splat]

rvz · 2024-08-14T23:12:59 1723677179

And so goes all your packages, private repositories, pages, AI intern copilot bot and Github Actions; and soon your AI models once you host them there - all being unavailable and going down with GitHub.

Time to consider self-hosting like the old days instead of this weekly chaos at GitHub.

dylanz · 2024-08-14T23:33:02 1723678382

Who's the Bozo Doofus maintainer? https://yhbt.net/unicorn/LATEST. I love that we can still see Unicorn in action. I rarely had problems with it back in the day.

croemer · 2024-08-14T23:32:48 1723678368

Cause seems to be database related per most recent update (23:29 UTC):

> We are experiencing interruptions in multiple public GitHub services. We suspect the impact is due to a database infrastructure related change that we are working on rolling back. Aug 14, 2024 - 23:29 UTC

pietroppeter · 2024-08-15T07:34:37 1723707277

@dang maybe the link could now be updated with this one https://www.githubstatus.com/incidents/kz4khcgdsfdv

bijant · 2024-08-14T23:13:23 1723677203

time to go to bed then. Wasn't getting any useful work done any more anyways...

neuronexmachina · 2024-08-14T23:33:32 1723678412

Latest update at 23:29 UTC says: "We are experiencing interruptions in multiple public GitHub services. We suspect the impact is due to a database infrastructure related change that we are working on rolling back."

zacym · 2024-08-14T23:41:22 1723678882

Aug 14, 2024 - 23:29 UTC Update - We are experiencing interruptions in multiple public GitHub services. We suspect the impact is due to a database infrastructure related change that we are working on rolling back.

shashankkoppar · 2024-08-14T23:33:48 1723678428

We are experiencing interruptions in multiple public GitHub services. We suspect the impact is due to a database infrastructure related change that we are working on rolling back. hope it is back up soon

shashankkoppar · 2024-08-14T23:33:25 1723678405

We are experiencing interruptions in multiple public GitHub services. We suspect the impact is due to a database infrastructure related change that we are working on rolling back. hope it comes back soon

ChrisArchitect · 2024-08-15T01:08:47 1723684127

00:30 UTC Resolved https://www.githubstatus.com/incidents/kz4khcgdsfdv

cropcirclbureau · 2024-08-14T23:22:01 1723677721

There goes Pages, there goes the CDN for release artifacts, there goes any package manager hosting repositories on GitHub. Is this outage just contained to github or is it an Azure outage?

cwilby · 2024-08-14T23:45:28 1723679128

It looks to be contained to just GitHub, azure service page shows no outages at this time.

Khaine · 2024-08-14T23:11:49 1723677109

Down in Australia as well

ralusek · 2024-08-14T23:27:06 1723678026

Down under?

sergiogjr · 2024-08-14T23:34:58 1723678498

Yep, angry unicorn. If the copilot debacle wasn't reason enough to make people migrate or diversify the code repo efforts with, let's say, GitLab, this should.

goranmoomin · 2024-08-14T23:42:37 1723678957

It feels so wrong that there are so many blogs and websites that are based on GH Pages and they all died at once…

Seems like they’re back up though. Or at least the Rust blog is back up.

mhh__ · 2024-08-15T00:09:40 1723680580

This reminds me that for some reason I am logged into my gaming machine's windows store with my GitHub account thanks to the bizarre way that microsoft do auth.

jftuga · 2024-08-14T23:13:20 1723677200

Down Detector link:

https://downdetector.com/status/github/

maximilianroos · 2024-08-14T23:15:27 1723677327

Even GitHub-hosted Pages are down — https://prql-lang.org/ is also a unicorn

wavemode · 2024-08-14T23:42:46 1723678966

I sure wish this had happened before I logged off from work for the day...

"Why isn't this project done yet?"

"Didn't you hear? GitHub is down!"

and I get to go out for a long lunch

fabbari · 2024-08-14T23:17:26 1723677446

Nothing useful on the status page: https://www.githubstatus.com/

exfil · 2024-08-14T23:26:18 1723677978

I am happy that my project is pushed also to codeberg.

lucb1e · 2024-08-14T23:52:25 1723679545

And has a website so anyone could just ask me if something went wrong on github's side and I can send them a complete copy. Decentralised version control is nice!

PoignardAzur · 2024-08-15T05:05:59 1723698359

This is weird. I've been using Github all night (in France) and didn't notice anything was wrong. Was the outage in North America?

abhishekjha · 2024-08-14T23:21:09 1723677669

Yes : https://github.com/psf/black/ is 502

BeefySwain · 2024-08-14T23:30:37 1723678237

Interesting that CoPilot is down as well. I would have assumed it was really only part of GitHub as a branding/marketing thing.

damiankennedy · 2024-08-14T23:12:37 1723677157

Investigating - We are investigating reports of degraded availability for Actions, Pages and Pull Requests Aug 14, 2024 - 23:11 UTC

damiankennedy · 2024-08-14T23:15:27 1723677327

Update - Pages is experiencing degraded availability. We are continuing to investigate. Aug 14, 2024 - 23:12 UTC

damiankennedy · 2024-08-14T23:15:37 1723677337

Update - Copilot is experiencing degraded availability. We are continuing to investigate. Aug 14, 2024 - 23:13 UTC

damiankennedy · 2024-08-14T23:16:32 1723677392

Update - We are investigating reports of issues with GitHub.com and GitHub API. We will continue to keep users updated on progress towards mitigation. Aug 14, 2024 - 23:16 UTC

EDIT: The reply link is no longer available.

Update - Packages is experiencing degraded availability. We are continuing to investigate. Aug 14, 2024 - 23:18 UTC

damiankennedy · 2024-08-14T23:21:18 1723677678

The reply link is now available?

Update - Issues is experiencing degraded availability. We are continuing to investigate. Aug 14, 2024 - 23:19 UTC

Update - Git Operations is experiencing degraded availability. We are continuing to investigate. Aug 14, 2024 - 23:19 UTC

EDIT: The reply link is no longer available again.

damiankennedy · 2024-08-14T23:28:05 1723678085

The reply link is now available again?

Everything is red now. Nearly lunch time in New Zealand.

higgins · 2024-08-14T23:22:11 1723677731

Yes https://www.githubstatus.com/

josvdwest · 2024-08-14T23:22:34 1723677754

Would this explain why "npm install next-sanity" doesn't work properly, or am I hitting a user error?

count_countules · 2024-08-14T23:35:06 1723678506

could be if the package is hosted on github

that_other_one · 2024-08-14T23:31:47 1723678307

Welp, that’s as good a time as any to call it a day!

Good luck to the devs and dev-analogues involved in getting the ship righted.

peterlk · 2024-08-14T23:29:24 1723678164

Can't wait for the writeup! So many services down at once... Something very interesting must have happened

inmanturbo · 2024-08-14T23:39:42 1723678782

They were manually ran a hot patch on the distributed production database and forgot to use a transaction

unicorner · 2024-08-14T23:36:36 1723678596

I've never seen such a serious outage before. Even GitHub Pages hosted sites aren't accessible.

dataspun · 2024-08-14T23:15:07 1723677307

Thank goodness for HN status reports.

bangaladore · 2024-08-14T23:17:09 1723677429

And other services like copilot...

[error] [auth] Response content-type is text/html; charset=utf-8 (status=503)

goranmoomin · 2024-08-14T23:40:30 1723678830

Seems like sites based on GH Pages were down, but are back up (i.e. the Rust blog).

th3w3bmast3r · 2024-08-14T23:26:17 1723677977

This is my first time seeing the angry unicorn! Hopefully it’ll be gone soon :(

kimboox · 2024-08-14T23:30:08 1723678208

Time to go out and see people

zachdoescode · 2024-08-14T23:31:35 1723678295

I swear even my VSCode intellisense is broken now... Rip to a real one.

aarkay · 2024-08-14T23:37:13 1723678633

yep very strange. You can disconnect from Wifi to get it to work. Vscode probably keeps pinging github/microsoft before every operation.

aarkay · 2024-08-14T23:40:11 1723678811

You can also disable telemetry and that seems to work too. Settings-> search for telemetry and select "off" from the telemetry dropdown.

elashri · 2024-08-14T23:08:52 1723676932

I think it maybe global but at least it is down in US-East (for sure).

Duende1 · 2024-08-14T23:38:33 1723678713

Down in Vancouver, Canada

frabjoused · 2024-08-14T23:41:30 1723678890

Seems to be back online.

Flop7331 · 2024-08-14T23:09:15 1723676955

So that's why I haven't heard back on my applications

java-man · 2024-08-14T23:08:13 1723676893

The status page should have a button "Report Outage".

upbeat_general · 2024-08-14T23:08:53 1723676933

Down for me as well. Thought my SSH agent was broken.

wdb · 2024-08-14T23:39:57 1723678797

Feels like Github is down more often than Gitlab

shcheklein · 2024-08-14T23:39:03 1723678743

Seem to be up again, I also wonder what is was.

omoikane · 2024-08-14T23:42:42 1723678962

Maybe it's fixed already? Works for me.

rhabarba · 2024-08-14T23:19:21 1723677561

I love how the same people who try to drag me towards using Git are the only people who seem to have serious problems working on their code when a website goes down.

waveBidder · 2024-08-14T23:25:37 1723677937

Git is not the same thing as github. It's designed to be decentralized, even if it isn't getting used that way atm

rhabarba · 2024-08-14T23:29:14 1723678154

I am quite familiar with the basic functionality of Git. However, I am always amused by how it works in practice.

c23gooey · 2024-08-14T23:27:01 1723678021

Status page showing a complete outage now

BenjiWiebe · 2024-08-14T23:20:56 1723677656

What'll it be this time? DNS or BGP?

2YwaZHXV · 2024-08-14T23:24:14 1723677854

given that it seems like the entire thing is busted, can anyone explain how the unicorn page is being served?

esrauch · 2024-08-14T23:39:58 1723678798

They probably have a reverse proxy in front of all their http endpoints and that is still up and able to show the unicorn if the backends aren't responsive.

The static content on the error page might also be on akami or cloudflare side.

2YwaZHXV · 2024-08-14T23:59:24 1723679964

makes sense, thanks.

the images on the page are all just base64 encoded right into the html

bill_lumbergh · 2024-08-15T03:00:38 1723690838

https://github.blog/news-insights/the-library/unicorn/

  Unicorn has a slightly different architecture.
  Instead of the nginx => haproxy => mongrel cluster setup
  you end up with something like: nginx => shared socket => unicorn worker pools
  
  When the Unicorn master starts, it loads our app into memory. As soon as it’s ready to serve requests it forks 16 workers. Those workers then select() on the socket, only serving requests they’re capable of handling. In this way the kernel handles the load balancing for us.

2YwaZHXV · 2024-08-15T15:29:00 1723735740

amazing, thanks!

neoyagami · 2024-08-15T03:50:48 1723693848

and I was in the middle of commiting a hot fix D:, I had to push the image directly to the registry D: