Hacker News new | past | comments | ask | show | jobs | submit login
GitHub Packages Is Down (githubstatus.com)
120 points by stonekyx on March 1, 2023 | hide | past | favorite | 78 comments



I'm disappointed that this is an issue for some package management systems. 20 years ago I helped run a mirroring service, it's still running today. Distributions such as Debian have hundreds of mirrors. This is a solved problems, but we just decided to but everything in the hands of one for-profit company.


I can’t speak for homebrew, but cargo/crate.io is fundamentally different from debian. The rate of churn is significantly higher, packages are published constantly and people expect them to be available. You can’t really do that with a system of mirrors like debian does. You can do edge caching and crates does that, but you want some central authority of which package versions are available. And every cargo run queries that index.

It’s acceptable for debian mirrors lag a few minutes or hours behind. The same thing is much harder to accept when the rate of change is much higher. Different requirements, different tradeoffs.


I guess I don't really understand the need for something like Cargo to be up to date to the seconds, or even minutes. My assumption is that you build your code with certain package versions in mind and release that to testing. Unless it's a security update, if won't matter if your 24 hours behind.

Say that version 1.2.1 of a library is release right as you do your build, that won't go into production within that 24 hour window anyway. If it is a security fix, then, like Debian, you pull that from another repository, which is under tighter control.


This thread is glossing over some important details. A package repo has two distinct storage concerns: the index (a list of which versions of packages have ever been released), and the actual packages themselves. It's convenient to have the index centralized, for maximum consistency. But the packages themselves can be stored however you want, and if you try to access a stale mirror without the most recent version of a package, then the client should have the option of using a different mirror or else accepting the old version.

For crates.io specifically, the packages are stored in S3, whereas the index is currently stored as a bog-standard Github repo (not as a Github Package), and in the near future the crates.io index will also move to crates.io itself (https://blog.rust-lang.org/inside-rust/2023/01/30/cargo-spar...).


Thanks, that wasn't clear to me. Why not just dump the index on the same storage as the packages? If text files are insufficient then do an SQLite database.


There are sound technical reasons to give the index special treatment.

First, the index is very large, and it only ever gets larger over time. I just cloned and compressed the crates.io index (https://github.com/rust-lang/crates.io-index), which resulted in a 58 MB archive (note that I did remember to delete the .git directory).

Second, the index changes very often. Every time anyone ever publishes a new version of a package, that changes the index. For crates.io, this happens hundreds or thousands of times per day.

Third, the index is append-only.

Fourth, the index is extremely frequently requested. Any time the user manually asks for an update, or any time the user adds a new dependency, the local copy of the index needs to be updated.

Putting it all together, since the index is constantly changing and since users will constantly be asking for the latest version, this means that it would be very inefficient to serve the whole thing each time. Instead, a fine-grained solution is more efficient. In the early days of crates.io, this problem was solved by just storing the index in a git repo and letting git take care of fetching new diffs to the index (and the problem of "who pays for hosting" was solved by using Github). Now that the crates.io index is outgrowing this solution, it's moving to a more involved protocol where clients will not have local copies of the full index, but instead will only lazily fetch individual index entries as necessary, which is much faster (especially for fresh installs (including every CI run!)).


I think in the case of Debian, packages are vetted and approved by repository maintainers before being hosted (the repository is curated). I think most application dependency repositories let anyone in and the onus is on the author and user to determine the legitimacy.

I imagine it's easier to get people to mirror curated, signed packages than, effectively, random code


i definitely push stuff to npm and then pull it in as dep on a different project seconds later. mostly because I'm too lazy to eff around with local package resolution which has bitten me before and also implies you're linking against live code instead of a specific snapshot


> cargo/crate.io is fundamentally different from debian

Given the OP, note that packages on crates.io don't (and can't) reference Github. Crates.io has its own storage, and the only way to upload a crate to crates.io is if 100% of its dependencies are also on crates.io.


Right. Although crates.io links to Github repositories, it doesn't get the code from them. They can be out of sync, which caused me some trouble yesterday.


Indeed, anyone can list whatever URL they want as the "repository" on the crates.io page for any page they link. There's not much of an alternative, given that crates.io is designed to be immutable, and the internet in general is not. (At best, crates.io could provide a link to a browser-rendered directory tree of the code that crates.io has on hand for any given version.)


If only there were some way to make git distributed!

/s


You mean something like a git annex enabled branch tracking mirror locations of each release artifact like HTTP URLs, (webseeded) torrents, maybe even something content addressed like IPFS? sigh


uhhh wait can you explain what you mean?


Move fast and break things ... /s


just yesterday I was stuck for an hour because a debian package mirror went down. took a long time to talk the user through changing their sources.list so that another mirror was chosen, and the mirror chosen out of that pool was down also. finally I had to manually check for a good mirror and give them the URLs.

the user's take was "why don't they use GitHub packages?"

"still running today" doesn't mean 100.0% uptime.


This is not how this works anymore. The system that is behaving this way must be relatively old at this point since almost all modern Debian based distros use the "mirror://" URI syntax now that automatically falls back to another mirror if one fails.


I don't think a clean Debian stable install uses that today.

But even so, at least the mirrorlist.txt file that appears in the mirror:// URI must be available for it to work, right?


You are correct. While it's supported and part of the APT version in Debian, they don't make much use of it themselves, whereas most downstream distros are making use of it (eg. Ubuntu)

https://manpages.debian.org/bullseye/apt/apt-transport-mirro...

You can still use it in vanilla Debian, but they don't make their mirror list available easily in the correct format, so you would have to basically curl + awk the URLs into a text file and use that.

My guess is that Debian itself probably sees less than 1% of the traffic on their mirrors compared to Ubuntu and they haven't been as motivated to make this change.


What I find funny and unexplainable is that this class of problem was solved decades ago with distribution mirrors. It's not really clear to me why, within the last decade or so, we collectively decided to centralize hosting on one specific cloud service whose downtime now affects builds across nearly every company.

What's perhaps even more surprising to me is that, after a repeated track history of severe and frequent Microsoft-Github outages in the last three years, it is still a hard dependency for so much of the modern software stack.


Those take more time and money to set up, host, maintain, and use, than a single provider. Maybe more than it's worth to avoid occasional downtime.


I think it's exactly this. We often see posts lamenting the lack of financial support for open source projects. How much more or less likely would it be for a mirror of a for-profit corporation's servers to receive financial support? How would they even reach out to potential sponsors without annoying users (ala donation requests in npm install output)?


It's not even close to the same thing. Universities hosting mirrors piggybacked off academic networks; not just the computer kind but the social kind, where professors would regularly meet professors from other institutions at academic conferences. It was in the collective interests of the universities to set up mirrors to solve the pre-eminent issue of slow WAN networks.

Today most companies need private package registries. Legacy networks are a resource drain. Nobody else uses your private packages nor do you want anybody else to host a mirror and authentication is required anyway.

Plus the idea that GitHub is hosting everything in a single datacenter is laughable on its face.


> the idea that GitHub is hosting everything in a single datacenter is laughable on its face.

Personally I find the notion that GitHub is somehow magically superior to the rest of the entire internet a bit silly.

I’ve worked on distributed systems my entire career, I have yet to find a single one that is completely immune to a datacenter outage, there is always some single point of failure not considered- often it is even known, everyone has the “special” datacenter.

Its also true that “market forces” push for better cost optimisation, which can, in cases, lead to being not sufficiently sized to cope with an outage of a whole DC- made worse are people who think cloud will solve this; because every customer will be doing the same thing as you during a zonal outage.

Regardless of that; you are basically suggesting that github, as a centralised system, is better equipped to deal with the distribution of packages than a literal distribution of package repositories?

That’s odd to me, maybe not “laughable”, but certainly odd.


> you are basically suggesting that github, as a centralised system, is better equipped to deal with the distribution of packages than a literal distribution of package repositories?

No, that's not what I'm saying. I'm explaining why "inferior"-quality alternatives sometimes win: the market prefers a different metric. In this case, ease of operation, ease of setup, and price are more important than sheer uptime.


Mostly agreed, but I'd hazard a guess that the scale of github is far larger than distribution mirrors of old.


There are many distribution mirrors that are financed by universities that are on the the Internet backbone in the US

Heck, even in Asia I did not have trouble with finding a good mirror.


I suspect this is the kind of advice that works for anyone, but would fail for everyone. That is, for most, it is a valid cost/benefit tradeoff to use the central option. Specifically, not just for them, but for everyone. If everyone was following this advice, it would likely start hitting scale/cost problems that would make running the mirrors of dubious value.


If you install packages on your linux infrastructure or docker images to provision anything, and those things are based on the “default” install, you are relying on the mirrors. That infrastructure is already “web scale”. It’s just a matter whether you make one image once and copy it thousands of times or if you actually spawn thousands of instances that talk to the mirrors.

Setting up your own mirrors for internal use isn’t overly difficult either, and it is definitely a trade-off as you pointed out.

However, it basically works for everyone, whether or not they are fully aware of it.

I have also run my own mirrors with minimal fuss. I haven’t had a business need to use GitHub packages, but I am glad it exists, as it is another tool to do a thing that needs doing in the right circumstances.


I meant for the sheer scale of how many are publishing to the mirrors more than the numbers that are pulling from them. But, fair that they are probably capable of more than I would expect, all told.


IMO we all realized that it doesn't actually matter that much, most of the time. Here we are, indeed, after three years of severe and frequent outages! But everything is... basically fine? Life is full of tradeoffs.


People who really care make their own dependency/build caches, eg, we had Docker containers we could fall back to. If you really needed to patch, build on top of an existing artifact image — and then rebuild when vendor service comes back. In practice, I just waited a few hours.

Problem solved(ish).


It's a bit harder to "mirror" an active web app like a container registry than it is a directory of RPM or Debian packages + distro metadata.


FYI: this breaks homebrew


Man, this disappoints me. I was a tech lead on the packages project about 3 years ago, specifically on the redesign for OCI container support and making anonymous downloads of public packages as reliable as possible was a top priority.

If that flow was broken there’s only a handful of things it could be; specifically azure blob store or azure MySQL, but both of those should have layers of redundancy. Public anonymous download bypasses most everything else; no auth services, rails monolith, or metered billing. It does emit some events to the message bus for metrics, but that’d effect much more than packages if there was an issue with it.

As far as I’m aware this is the first time anonymous public package download broke since I left a few years back.


Does it break all of Homebrew or just some packages? I never knew homebrew started using GitHub Packages.


Homebrew uses it to store bottles (the built assets).


Fun fact: Linux distributions (and some older open source programming language package managers) use hundreds of mirrors distributed around the world to distribute their assets. If any mirror goes down, you just pick a different one. Even when they could just use SourceForge as a mirror (formerly the largest repository for open source software), they still used hundreds more mirrors. Distribution was made easy with rsync, and mirrors could choose what files they mirrored (just the latest release, or all releases, or just binaries and not source code)


GitHub uses pools of mirrors, too, they're just transparent.

twice this year I've had to spend an hour or more with a user because a mirror was down. that's one more time than I've had to deal with a GitHub packages outage this year.

the latest was yesterday. they chose another pool of mirrors and the mirror continually chosen from that pool was down as well. finally I manually checked a mirror, made sure it was up and that signatures matched, then gave them that specific hostname.

the Linux package distribution system is not better. it's just different.


So this is why hb started asking for access to my keychain all the time? (macos) Does it need to log in to gh to install open source software now?


you don't have to, but hb will ask you to if you have a saved credential because your quota on GitHub for downloading packages is much higher if you are logged in.

anonymous stuff on GitHub is usually limited to 60 requests per hour per ip address. if you're authenticated, it's several hundred if not several thousand.


That sounds... problematic for people on big NATs? (e.g. universities?)


NATs are problematic already.

Every office I've ever worked at has that one guy who is really good at tickling Google with scripts until it puts you all behind a CAPTCHA.


anything you don't install via `--build-from-source`


I think it will break any of the open source package managers that rely on GitHub's proprietary hosting and distribution. Cargo, etc.


no, this is GitHub packages, not GitHub repositories. Cargo doesn't use this. It doesn't use GitHub repositories either, they store the crates internally.

EDIT: I was wrong, the crates index does use a GitHub repo


The crates.io index actually is a GitHub repository, so I think a GitHub outage that affected repositories (not just packages) could break Cargo. Only metadata is stored there, though, not the actual crates. I'm not 100% sure why it works like this; there seems to be a plan to change things soon so that Cargo running on users' machines doesn't talk directly to GitHub by default (https://blog.rust-lang.org/inside-rust/2023/01/30/cargo-spar...), though the GitHub repo would still be the source of truth.

crates.io also uses GitHub as an OAuth provider (and it's currently the only one offered), so if that broke then people wouldn't be able to publish crates, though downloading existing ones would presumably still work since you don't have to log in to do that.


Cargo is hard-coded to use GitHub for the crates.io index [1]

[1] https://github.com/rust-lang/cargo/blob/master/src/cargo/sou...


That's a big yikes...


That's being addressed with a new index protocol, specified a while ago[1], available for testing since the middle of the last year[2], and slated for release in a week's time[3].

[1] https://rust-lang.github.io/rfcs/2789-sparse-index.html

[2] https://blog.rust-lang.org/2022/06/22/sparse-registry-testin...

[3] https://blog.rust-lang.org/inside-rust/2023/01/30/cargo-spar...


I stand corrected—I was thinking of the crates themselves.


I did not know package managers relied on Github, this is the most unwise thing to do from a package manager perspective.

Anyone could just change username/organization and break thousands/millions of build.


Hello and welcome to PEP-508!

In Python, we don't say "we don't host packages on a proprietary platform", we say "we have absolutely no clue where they are hosted and nobody audits them anyways, and we don't enforce package signing, and we'll just build from source with no build isolation what so ever, unless you remember to specify an obscure command-line option when installing... and have a nice day!"


The entire python package management situation can be summed up with "we have no clue".


it's amazing how we never learn. Things that Perl and CPAN and Linux distros figured out decades ago are constant issues today. It wasn't that long ago that NPM didn't even have checksums. CPAN runs unit tests on install. I can't imagine how slow that would be with npm.

Package signing is, well... I suppose that's another lesson from the '90s people will learn about soon enough. With a web of trust as broad as python or npm you'll just have everyone running around with signing keys and "trusting" any key they come across because none of it is built on personal relationships. When Archlinux asks me to confirm adding package keys, what am I going to do? Say no? I don't know these people, but I want my shit to work.


When it comes to my personal laptop, I also, typically, blindly trust the keys coming from developers because I don't have time for that. Not so much if I have to deploy a system into environment that several orders of magnitude more expensive than my laptop...

With systems like Python, I'd imagine that a solution to web of trust would be that some group of developers would organize a curated set of packages. So, for the cases where you need better security assurances, you'd use that. I mean, of course there's no guaranteed solution for the web of trust, but, in practical terms, something like that would be good enough for regulators.

There's already stuff like NumFOCUS. They don't particularly focus on the technical side of things, or endorsing more secure practices, but, in principle, they could. Maybe there will also be others once we have been bitten more times by some security breaches.


> Anyone could just change username/organization and break thousands/millions of build.

GitHub redirects you to the new name in the event of a rename and you look up the old one.


until someone claims the old name as a new org/repo


For now, hopefully


yeah. if GitHub is even real.


Well, we know GitHub is real. But we should also remember that they can change their API at anytime and basing a package manager on their priorities is not the best situation for the long term success of that package manager, unless they are owned by Microsoft


GitHub redirects when this happens. if you move/rename within GitHub, anyway. I don't know how long those redirects last but they probably last until the name is used again.

so it's not quite as bad as you're imagining but still not great.

fortunately GitHub is starting to require 2FA for very popular projects (starting with NPM) because of supply chain attacks like what you describe.


Nix too I presume


You presume incorrectly. GitHub Packages, the package registry and binary hosting service, does not even support Nix. As for GitHub in general—

The binary hosting at https://cache.nixos.org/ is independent of GitHub, and so are the old-style channels at https://channels.nixos.org/. The new-style flake registry used to be fetched from GitHub but has now been moved to https://channels.nixos.org/flake-registry.json. Admittedly in a new-style situation you’re likely to be using unlocked flake references that refer to GitHub (e.g. Nixpkgs), but it’s on you to lock them and pull them into your Nix store in that case.

Of course, you also get GitHub references for upstreams that host their code there, but that applies to almost any distro(’s build system) except the oldest of the old-timers which host the source for the whole distro on their own infrastructure, like Debian. (I happen think the old-timers are right here, but that’s beside the point.)


Nope.

This is actually the second (maybe third; I didn't even know about GitHub's outage a week and a half ago so idk how Nix was or was not affected) time in three months or less that a partial GitHub outage or GitHub change has taken down Homebrew while leaving Nix unaffected.


Only if you pull from master


how so? It's working for me.


affirmative


I’m also having issues pulling images from ghcr.io


Oh dear, Last time this happened was 11 days ago when more than just packages went down. [0] Perhaps relying and going all in on GitHub doesn't seem to be good in the long run. Especially GitHub Actions.

This is where OpenAI is now feeling the effects of instability [1] on Azure since their recent outage. I expect them to also have issues like GitHub has every month.

[0] https://news.ycombinator.com/item?id=34843748

[1] https://news.ycombinator.com/item?id=34958375


Packages had an incident two days ago, also: https://www.githubstatus.com/incidents/sn4m3hkqr4vz. I noticed it when a Terraform provider download was failing, citing a 404 from objects.githubusercontent.com.


Microsoft, historically, has not had much competitive pressure to be reliable. For decades, they've had a lock on the PC OS market, and there basically is no larg2 business that doesn't rely on MS. Even world governments rely on MS.

Now even in the few areas where MS actually does have this pressure (e.g. Azure) they're struggling to make it part of the culture.


It's never been up in the first place, for those of us who are waiting for python package support.

Poke, poke!


Annoyingly, pypi has a pretty simple API and you can host it directly from a static webserver like Apache or Nginx if you skip some of the optional features in the spec


Did this include the container registry service?


[flagged]


The idea of the instability of a Microsoft platform moving you back to.. Microsoft, is funny.

Notwithstanding that services on third party systems are a weird reason to buck a first party platform.


The "is" seems to be used confusingly here. This reads better as either:

* GitHub's Packages Service is down

* GitHub Packages are down


You are getting down voted. I'll explain.

The name is GitHub Packages. It's singular. The use of "is" is correct here. GitHub uses "is" in similar circumstances as well.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: