What I find funny and unexplainable is that this class of problem was solved decades ago with distribution mirrors. It's not really clear to me why, within the last decade or so, we collectively decided to centralize hosting on one specific cloud service whose downtime now affects builds across nearly every company.
What's perhaps even more surprising to me is that, after a repeated track history of severe and frequent Microsoft-Github outages in the last three years, it is still a hard dependency for so much of the modern software stack.
I think it's exactly this. We often see posts lamenting the lack of financial support for open source projects. How much more or less likely would it be for a mirror of a for-profit corporation's servers to receive financial support? How would they even reach out to potential sponsors without annoying users (ala donation requests in npm install output)?
It's not even close to the same thing. Universities hosting mirrors piggybacked off academic networks; not just the computer kind but the social kind, where professors would regularly meet professors from other institutions at academic conferences. It was in the collective interests of the universities to set up mirrors to solve the pre-eminent issue of slow WAN networks.
Today most companies need private package registries. Legacy networks are a resource drain. Nobody else uses your private packages nor do you want anybody else to host a mirror and authentication is required anyway.
Plus the idea that GitHub is hosting everything in a single datacenter is laughable on its face.
> the idea that GitHub is hosting everything in a single datacenter is laughable on its face.
Personally I find the notion that GitHub is somehow magically superior to the rest of the entire internet a bit silly.
I’ve worked on distributed systems my entire career, I have yet to find a single one that is completely immune to a datacenter outage, there is always some single point of failure not considered- often it is even known, everyone has the “special” datacenter.
Its also true that “market forces” push for better cost optimisation, which can, in cases, lead to being not sufficiently sized to cope with an outage of a whole DC- made worse are people who think cloud will solve this; because every customer will be doing the same thing as you during a zonal outage.
Regardless of that; you are basically suggesting that github, as a centralised system, is better equipped to deal with the distribution of packages than a literal distribution of package repositories?
That’s odd to me, maybe not “laughable”, but certainly odd.
> you are basically suggesting that github, as a centralised system, is better equipped to deal with the distribution of packages than a literal distribution of package repositories?
No, that's not what I'm saying. I'm explaining why "inferior"-quality alternatives sometimes win: the market prefers a different metric. In this case, ease of operation, ease of setup, and price are more important than sheer uptime.
I suspect this is the kind of advice that works for anyone, but would fail for everyone. That is, for most, it is a valid cost/benefit tradeoff to use the central option. Specifically, not just for them, but for everyone. If everyone was following this advice, it would likely start hitting scale/cost problems that would make running the mirrors of dubious value.
If you install packages on your linux infrastructure or docker images to provision anything, and those things are based on the “default” install, you are relying on the mirrors. That infrastructure is already “web scale”. It’s just a matter whether you make one image once and copy it thousands of times or if you actually spawn thousands of instances that talk to the mirrors.
Setting up your own mirrors for internal use isn’t overly difficult either, and it is definitely a trade-off as you pointed out.
However, it basically works for everyone, whether or not they are fully aware of it.
I have also run my own mirrors with minimal fuss. I haven’t had a business need to use GitHub packages, but I am glad it exists, as it is another tool to do a thing that needs doing in the right circumstances.
I meant for the sheer scale of how many are publishing to the mirrors more than the numbers that are pulling from them. But, fair that they are probably capable of more than I would expect, all told.
IMO we all realized that it doesn't actually matter that much, most of the time. Here we are, indeed, after three years of severe and frequent outages! But everything is... basically fine? Life is full of tradeoffs.
People who really care make their own dependency/build caches, eg, we had Docker containers we could fall back to. If you really needed to patch, build on top of an existing artifact image — and then rebuild when vendor service comes back. In practice, I just waited a few hours.
What's perhaps even more surprising to me is that, after a repeated track history of severe and frequent Microsoft-Github outages in the last three years, it is still a hard dependency for so much of the modern software stack.