Do you think you'd get better uptime with your own solution? I doubt it. It woul...

wavemode · 2025-11-18T21:55:27 1763502927

Uptime is much, much easier at low scale than at high scale.

The reason for buying centralized cloud solutions is not uptime, it's to safe the headache of developing and maintaining the thing.

manquer · 2025-11-19T02:15:13 1763518513

It is easier until things go down.

Meaning the cloud may go down more frequently than small scale self deployments , however downtimes are always on average much shorter on cloud. A lot of money is at stake for clouds providers, so GitHub et al have the resources to put to fix a problem compared to you or me when self hosting.

On the other hand when things go down self hosted, it is far more difficult or expensive to have on call engineers who can actual restore services quickly .

The skill to understand and fix a problem is limited so it takes longer for semi skilled talent to do so, while the failure modes are simpler but not simple.

The skill difference between setting up something locally that works and something works reliably is vastly different. The talent with the latter are scarce to find or retain .

tyre · 2025-11-18T22:26:54 1763504814

My reason for centralized cloud solutions is also uptime.

Multi-AZ RDS is 100% higher availability than me managing something.

wavemode · 2025-11-18T22:40:42 1763505642

Well, just a few weeks ago we weren't able to connect to RDS for several hours. That's way more downtime than we ever had at the company I worked for 10 years ago, where the DB was just running on a computer in the basement.

Anecdotal, but ¯\_(ツ)_/¯

sshine · 2025-11-19T00:02:57 1763510577

An anecdote that repeats.

Most software doesn’t need to be distributed. But it’s the growth paradigm where we build everything on principles that can scale to world-wide low-latency accessibility.

A UNIX pipe gets replaced with a $1200/mo. maximum IOPS RDS channel, bandwidth not included in price. Vendor lock-in guaranteed.

jakewins · 2025-11-18T21:55:27 1763502927

“Your own solution” should be that CI isn’t doing anything you can’t do on developer machines. CI is a convenience that runs your Make or Bazel or Just or whatever you prefer builds, that your production systems work fine without.

I’ve seen that work first hand to keep critical stuff deployable through several CI outages, and also has the upside of making it trivial to debug “CI issues”, since it’s trivial to run the same target locally

IshKebab · 2025-11-19T08:29:43 1763540983

> should be that CI isn’t doing anything you can’t do on developer machines

You should aim for this but there are some things that CI can do that you can't do on your own machine, for example running jobs on multiple operating systems/architectures. You also need to use CI to block PRs from merging until it passes, and for merge queues/trains to prevent races.

jakewins · 2025-11-19T14:17:43 1763561863

Yeah agreed, CI infra provides tons of value.

Ended up expanding this little quip into a blogpost to refer to in the future, feedback welcome! https://tech.davis-hansson.com/p/ci-offgrid/

CGamesPlay · 2025-11-19T01:17:21 1763515041

Yes, this, but it’s a little more nuanced because of secrets. Giving every employee access to the production deploy key isn’t exactly great OpSec.

1718627440 · 2025-11-19T14:10:19 1763561419

Every Linux desktop system has a keychain implementation. You can of course always use your own system, if you don't like that. You can use different keys and your developers don't need access to the real key, until all the CI servers are down.

deathanatos · 2025-11-18T23:26:48 1763508408

Yes. I've quite literally run a self-hosted CI/CD solution, and yes, in terms of total availability, I believe we outperformed GHA when we did so.

We moved to GHA b/c nobody ever got fired ^W^W^W^W leadership thought eng running CI was not a good use of eng time. (Without much question into how much time was actually spent on it… which was pretty close to none. Self-hosted stuff has high initial cost for the setup … and then just kinda runs.)

Ironically, one of our self-hosted CI outages was caused by Azure — we have to get VMs from somewhere, and Azure … simply ran out. We had to swap to a different AZ to merely get compute.

The big upside to a self-hosted solution is that when stuff breaks, you can hold someone over the fire. (Above, that would be me, unfortunately.) With Github? Nobody really cares unless it is so big, and so severe, that they're more or less forced to, and even then, the response is usually lackluster.

tcoff91 · 2025-11-18T21:46:37 1763502397

Compared to 2025 github yeah I do think most self-hosted CI systems would be more available. Github goes down weekly lately.

Aperocky · 2025-11-18T22:06:51 1763503611

Aren't they halting all work to migrate to azure? Does not sound like an easy thing to do and feels quite easy to cause unexpected problems.

macintux · 2025-11-19T00:53:48 1763513628

I recall the Hotmail acquisition and the failed attempts to migrate the service to Windows servers.

drykjdryj · 2025-11-19T02:33:36 1763519616

Yes, this is not the first time github trying to migrate to azure. It's like the fourth time or something.

davidsainez · 2025-11-18T21:47:50 1763502470

Doesn’t have to be an in house system, just basic redundancy is fine. eg a simple hook that pushes to both GitHub and gitlab

prescriptivist · 2025-11-18T22:30:44 1763505044

It's fairly straightforward to build resilient, affordable and scalable pipelines with DAG orchestrators like tekton running in kubernetes. Tekton in particular has the benefit of being low level enough that it can just be plugged into the CI tool above it (jenkins, argo, github actions, whatever) and is relatively portable.

nightski · 2025-11-18T21:50:26 1763502626

I mean yes. We've hosted internal apps that have four nines reliability for over a decade without much trouble. It depends on your scale of course, but for a small team it's pretty easy. I'd argue it is easier than it has ever been because now you have open source software that is containerized and trivial to spin up/maintain.

The downtime we do have each year is typically also on our terms, not in the middle of a work day or at a critical moment.

Borg3 · 2025-11-19T09:08:34 1763543314

10:08:19 up 2218 days, 22:11, 4 users, load average: 0.00, 0.00, 0.00

It just workz [;

1718627440 · 2025-11-19T14:06:51 1763561211

With a build system that can run on any Linux machine, and is only invoked by the CI configuration? Even if all your servers go down, you just run it on any developers machine.

Zambyte · 2025-11-20T14:02:29 1763647349

Reproducible builds have a pretty good track record for uptime :-)