Hacker News new | past | comments | ask | show | jobs | submit login
My IRC client runs on Kubernetes (xeiaso.net)
137 points by xena 4 months ago | hide | past | favorite | 127 comments



Well I think it's neat. The bit I find most provoking is the "if you already have Kubernetes..." premise. I find myself having a hard time not wanting to shove everything into the Kubernetes framework simply to avoid having to document what solutions I've chosen. `kubectl get all` gives me an overview of a project in a way that is impossible if every single project uses a different or bespoke management system.

"simple/complex" is not the right paradigm. The real SRE controversy is "unique/standard". Yes, the standard approach is more complex. But it is better _in practice_ to have a single approach, rather than many individually-simpler, but in-aggregate-more-complex approaches.

Kubernetes is never the perfect solution to an engineering problem, but it is almost always the most pragmatic solution to a business problem for a business with many such problems.


Yeah k8s is great. It gives you an infinite rope generator to let you hang yourself with ever increasing complexity, but with a bit of restraint you can orchestrate pretty much anything in a simple or at least standard way.

I'd take a stack of yaml over a stack of bespoke aws managed service scripts any day.


Speaking of rope. Right this moment GKE clusters can't provision large volumes (~4TiB) because their CSI driver receive OOMKill when formatting volume. Problem was reported back in Apr and still not fixed.


This is a great response and pretty much sums it up.

With k8s you can easily install a bunch of helm charts and get a bespoke platform that becomes a full time role for multiple people.

There are pros and cons against this approach but you if you're worried k8s is complex, just use the cloud native integrations.


Not a day goes by where there isn’t some greybeard having a huge sulk about how nobody wants to use his collection of esoteric bash scripts that nobody else will ever understand, but HE does.


Erulabs top comment nailed it so hard. There's been such a vocal force in comments for so long, of revulsion to "complexity", believing "simpler" is better.

> Yes, the standard approach is more complex. But it is better _in practice_ to have a single approach, rather than many individually-simpler, but in-aggregate-more-complex approaches.

There still remains such a strong reflexive "you don't need Kubernetes for that" response that shows up.

But we really are seeing a lot more people who get the value of there being a standard way of doing things, of having common platform & expectations underneath. The cost of starting up may be significant, but it results in a more legible understandable result, and we can keep reusing that base of capabilities again and again and again.

Its such an exciting change. Open source has been around for a long time, but running software has been the wild West, until very very recently. We now have this awesome, scalable, extensible way to manage containers, volumes, load balancers, databases, and whatever else have we.

Love to see it. Folks aren't wrong about it having complexity, but so much of the complexity already existed, was just not clearly visible or well harnesses (different example is like how systemd makes it so easy to restrict services, give them temporary/unprivileged users). We are closer to a real shared understanding, to having so many particular decisions offloaded & so much more we can work together atop. Love it so much.


Complex solutions exists for complex needs. If you just need to crop an image, you don’t install Photoshop to do so. If you only have a simple web app for a few thousands users, you don’t go and setup k8s. Yes, having photoshop means you may do what all the cool kids are doing, but if you’re only cropping image, Preview.app is good enough. Eschewing simpler solutions because “standard” is how you start to see everything as a nail for your hammer.

There’s another thing that no one who advocates for these systems wants to mention: The cost of maintenance. I’m ok with systemd as 98% is outsourced to the maintainers. But I’d be more comfortable if k8s was a more monolithic system a la BSD. At least linux have distros.


> for a few thousands users

It makes absolutely no sense to base this decision on the number of users. We have some applications that don't even have 10 users but still use k8s.

Try to understand the point that was made in the original comment: Kubernetes is a way to actually make infrastructure simpler to understand for a team which maintains lots of different applications, as it can scale from a deployment with just one pod to hundreds of nodes and a silly microservices architecture.

The point is not that every application might need this scalability, no the point is that for a team needing to maintain lots of different applications, some internal, some for customers, some in private datacenters, some in the cloud, Kubernetes can be the single common denominator.

Hell, I'm a hardcode NixOS fan but for most services, I still prefer to run them in k8s just as it is more portable. Today I might be okay having some service sitting on some box, running via systemd. But tomorrow I might want to run this service highly available on some cluster. Using k8s that is simple, so why not do it from the start by just treating k8s as a slighly more powerful docker-compose.


K8s has distros, most of them certified to be compatible, and some of them are monolithic(-ish (single binary)), such as k3s/rke2.


> But I’d be more comfortable if k8s was a more monolithic system a la BSD. At least linux have distros.

k3s is essentially a single binary and installed, up and running with a single command.

With a handful of command line flags you can have a full cluster up and running with some master nodes and some agents and high availability through etcd and load balancing and network policies and ingress and distributed clouds via WireGuard/Tailscale… They also have “blessed” options (that you can generally install with a single command) for things like persistent storage, etc.

I would not want to build a k8s cluster from scratch. Setting up and maintaining a k3s cluster is pretty trivial.


> Complex solutions exists for complex needs. If you just need to...

Personally, to me, this sounds like madness.

Deciding you need to custom tailor and brew up a specific solution that exactly matches each use case begets endless complexity. Constantly negotiating a position & changing approaches begets endless complexity.

And the complexity needle moves over time, such that your initial assumptions often don't stand. Getting something running usually isn't too hard. But creeping demands of observability, monitoring, rotating certs, backups, and other concerns are going to start getting tacked on. If you only ever do one thing, and there aren't many of ya'll & never will be, and very few newcomers will ever join, maybe maybe maybe you find a win today by doing less, by hand picking your options & getting in there & cobbling your stack of tools together. Convincing yourself to start small, to avoid "complex" Kubernetes to do less, is in most cases though going to backfire even for a single piece of software, as situations evolve.

There is such a fear that Kubernetes is complex, but it just feels like so many are missing the point, are resisting the standards & practices that erulabs' spoke to.

But man, it is so not complex to prop up a big k3s node & push some containers at it. And anyone whose spent more than 1 hours with kube will be able to look around, see what's running on the node, see how everything works; you won't have to explain what you did, it'll be obvious. And they'll be able to keep going. And you'll be able to keep encorporating concerns & platform & new apps in. And those skills - mastering that complexity - will transfer to every other thing you want to do, will scale way up and way down.

(I do think systemd offers a good number of similar advantages of replicatability, consistency, paved-roadness, and is pretty solid an option, but involves more finding-your-practice than kube)


> There still remains such a strong reflexive "you don't need Kubernetes for that" response that shows up.

If you're already using docker-compose, the jump from there to Docker Swarm is arguably tiny in comparison to the jump from there to Kubernetes, while still bringing most of its value.

So yes, I will keep evaluating the tradeoffs every time and choose accordingly.


Look up how much bash is in the k8s codebase. :)


To be fair, the problem isn't the greybeard. It is that nobody understands bash.


Is the mountain of k8s code on top of your cloud of choice not strictly more complex than the cloud services it's orchestrating? Like I think k8s is good software but to hear that you're not choosing it because it's good engineering is surprising. To me that's the only reason you choose it, because you want or need the control-loop based infrastructure management that you can't get on say AWS alone.


That's the same with.. the Linux kernel? I'd wager that most services people write have less complexity than the the kernel it runs on. We choose to do that not because we need its full feature set, but because it gives us useful abstractions.


I don’t know that that’s the right analogy. End users don’t have to write a mountain of configuration in order to effectively use the Linux kernel.

If k8s were as complex as it was, but didn’t force that complexity onto the user, I’d agree.


Exactly, I think I could have stated it better. Using k8s is strictly more complex from the perspective of the operator, and the mountain I'm referring to isn't actually from k8s itself but from all the 3rd party stuff you have to layer on to actually make it useful-- things like cluster-autoscaler, vpa, kube-prometheus, aws-load-balancer-controller, misc. operators, that fall on my side of the wall when it comes to fixing it when it's broken.

I can understand that it doesn't come off that way but I actually like k8s and the surrounding ecosystem. But deploying k8s makes your infrastructure as complicated as k8s itself, which can be a massive win because it works in both directions.


If it can’t be monolithic like *BSD, at least offer distributions like the Linux Sphere. Like a ready to go LAMP stack. Or a collection of services (Database, Storage, Logs,…)


There are distributions of k8s for years now. Starting from simple side like k3s, through more complex (k0s, rancher, etc), to complete OS images (Talos, Metropolis)


I’m running Talos locally and run into basically none of these issues — use one of the provided storage providers, dump a container into it, bing bang boom you’re done.


And similarly, you don't need any configuration at all to just run k8s itself, you do need to configure your application and pay attention when exposing things, but the same holds true for an OS on a single machine.

I believe the OS comparison is the most apt, both in the sense of the layer it represents, i.e. abstracting away physical machines and managing resources for applications, but also from what it means: one used to have to rewrite a program for each new computer as they were all unique (incl. CPU), just like each corp had its own way of managing infrastructure, clusters of nodes and automation.

Any new application needed to be integrated from the ground up, which might be simpler in that you only touch what you actually need, but you also have to do it every single time, and it's different for each project/corp - with the vast majority missing out on the more complex bits that don't fit into an nginx config and some shell scripts.


You can deploy quite a bit to k8s with just a few lines of helm installs, and it will work with very little configuration.


There’s always a mountain of code on top of the cloud provider.


Yes but that code isn't typically running in production the way k8s is. Having a lot of terraform is just a fancy method of statically provisioning infrastructure. The code is gone once it applies. Bringing k8s into the mix is a whole other application layer that's providing services to your code. Things like sgs for pods, cni plugins, coredns, is code that wouldn't be there in a non k8s world.

Your problems now include k8s networking as well as VPC networking.


> The code is gone once it applies

Until you need to update a change or upgrade something or patch a security issue. The most dangerous code is code that doesn't run often. Like backup restores. Ticking time bombs unless you're really careful.

> Bringing k8s into the mix is a whole other application layer that's providing services to your code. Things like sgs for pods, cni plugins, coredns, is code that wouldn't be there in a non k8s world.

The same applies to everything running around your tech stack that you didn't write. That includes the language runtime, the compiler, the operating system, all OS libraries you use, and the libraries your app imports. The ship has sailed and K8S isn't much of an addition.

I haven't hit a CNI bug in probably 5 years and that was only on Amazon's horrible CNI code. I've hit a ton of bugs in my software stack. And the libraries I depend on. And my deployment stack. And my cross-service networking stack at the application layer.


Furthermore, there's always a mountain of code aggregating heterogeneous systems into a uniform interface. Look at the linux kernel, or LLVM, or pretty much any other (successful) project that provides abstractions to many different backends.


> But it is better _in practice_ to have a single approach, rather than many individually-simpler, but in-aggregate-more-complex approaches.

Very much depends on the point of view. It's great from an SRE point of view but not necessarily application/developers, who are being constrained to a platform's idiosyncracies and expressions of platform egos.

The individually simpler solutions are only complex from a high level, ivory tower, or middle management perspective, not from the perspective of people who have to use and manage the application itself.


I think just running a process and exposing a port is fine, but the second you get into running a bunch of services together, or caring about environments, the k8s abstraction is simpler.

In the last 6 months my job has been to get production ready vault instances on azure. There's a bunch of complex, unreliable and not very fun APIs here.

Much like AWS, there isn't really a StatefulSet ( PetSet ) abstraction. To you need to write a bunch of bespoke logic where you figure which IP addresses, names, IDs, and Disks you need to attach to a new VM.

Whilst iterating, the azure APIs are eventually consistent that cause all sort of niggly problems. Resources that are created don't get added to TF when there's a failure.

I create a new vault on a PR and it takes 20 minutes to deploy.

The problem took a couple of months and I created a bunch of code.

On k8s I can just deploy the helm chart in 2 minutes. The abstractions are cleaner, it's more reliable, and way more fun.

At the end of the azure project, the team agreed to do a AKS poc, where we gave the task to the junior of the team, and the entire thing was done in a week.

If you're not doing this type of work, maybe you don't need k8s. But if you're not doing ephemeral test environments, do you actually not see them as a positive, or is it an effort thing? Because it takes me no real effort.


If you only have a hammer, everything looks like a nail.

Then, it mostly works, so management makes "economy of scales" and removes all screw drivers, since they need more skills anyway.

Now, you are only using nails.

Until someone else rediscovers screws and starts another hype cycle.


Except in manufacturing, this actually works. Economies of scale are real and trump pretty much every other consideration, and they're what enable pretty much everything we have around us that isn't biological.


The bit I find the most provoking is calling k8 "standard."

And not a collection of strung-together controllers.


I've been self-hosting a lot of things on a home kubernetes cluster lately, though via gitops using flux (apparently this genre is now home-ops?). I was kind of expecting this article to be along those lines, using the fairly popular gitops starting template cluster-template: https://github.com/onedr0p/cluster-template

I set one of these up on a few cheap odroid-h4's, and have quite enjoyed having a fairly automated (though quite complex of course) setup, that has centralized logging, metrics, dashboards, backup, etc. by copying/adapting other people's setups.

Instead of weechat, I went with a nice web based irc client (the lounge) to replace my irccloud subscription. kubesearch makes it easy to find other people's configs to learn from (https://kubesearch.dev/hr/ghcr.io-bjw-s-helm-app-template-th...).


I really wish The Lounge supported something like a PostgreSQL/MySQL backend. Having to keep state in files on a persistent volume is a pain for any app, it's so much nicer when I can just connect to a DB _elsewhere_. The *arr media apps recently added support for PostgreSQL


Definitely, while I have volsync backing it up, and the PV is replicated for local availability.... still annoying.


TIL about Talos (https://github.com/siderolabs/talos, via your github/onedr0p/cluster-template link). I'd been previously running k3s cluster on a mixture of x86 and ARM (RPi) nodes, and frankly it was a bit of a PiTA to maintain.


Talos is great. I'd recommend using Omni (from the same people) to manage Talos. I was surprised how easy it was to add new machines with full disk encryption managed by remote keys.


cannot praise talos highly enough, it makes so much annoying crap easy


No shell in the OS? Seems insane but interesting at the same time..


Practically, its not a problem as you can always create a privileged container and mount the root filesystem into it. I have an alias I use for exactly such things.


kubectl-node-shell is also pretty good to do that automatically.

https://github.com/kvaps/kubectl-node-shell


Kubevirt is great, but, I'm not sure why you wouldn't just run weechat inside a container.

There's nothing so special about weechat that it wouldn't work and you can just exec into the container and attach to tmux.

Running tmux/screen inside a container is definitely cursed, but, it works surprisingly well.


Author of the article here. One of the main reasons is that I want to update weechat without rebuilding the container, taking advantage of weechat's soft upgrade method: https://weechat.org/files/doc/weechat/stable/weechat_user.en...

And at that point, it may as well just be a normal Ubuntu server.


Ah I didn't know wechat could do that, but, I remember that from my irssi days.

I would personally consider that a bit of an anti-pattern (I would always want to have software upgrades tied to the container) but that makes sense!


As a long-time Docker-slonker I agree, however then that requires me to remember to upgrade the container images which means that it's harder to ensure that it's up to date. If you set up a cronjob to update the VMs packages and then a script that runs `/upgrade` on the weechat server every time the weechat package version changes, then you don't need to think.


Consider having Renovate handle this for you - using a git repo to apply your manifests will make it pull updated container images


I have this very easily setup to a set of simple commands which I can run to deploy new versions of packages.

Basically 3 paths for:

- things with well-maintained upstream helm charts (checks the latest version of the helm chart and updates helmfile)

- things where I wrote a helm chart (checks for the latest container image and updates the values.yaml)

- things where there is no (reasonable) public docker image, or my own software, where I have full CI/CD to build and deploy a new container image.

Probably takes me 15 minutes a week to keep my two personal clusters fully up-to-date.


So why not just make it a VM and skip k8s altogether?


Because I already have a kubernetes cluster with distributed storage. This way makes it somewhat standard with the rest of my setup. In SRE things the real enemy is when parts are unique in ways that conflict with the normal way of doing things. Look up at the "unique/standard" comment above and then take a moment to think about it. It sounds absurd from the first read; but when you realize that it allows me to reuse the hardware I have and take advantage of the second order properties I've done, then it makes more sense.


This is also why I design all my config files to use YAML. I've already got powerful tools to deal with its quirks (JSONSchema, yq, yamllint)


Mostly because VM tooling is stuck in the past.


Why not use Persistent Volumes and Nix container?


You can do that in a container, no need for a VM.

    docker run -itd ubuntu:24.04
    ...
    docker exec -it df36 /bin/bash
    ...
    root@df365a3d2257:/# apt update
    ...
    root@df365a3d2257:/# apt upgrade
    ...


Then when the container gets rescheduled I'm suddenly running the old version of weechat


Wouldn’t an init container in the pod definition help here? Nice article btw!


I can't imagine a worse combination than Kubernetes and stateful connections.


It only hurts when you actually have meaningful load and then suddenly needs to switch. Especially if the "servlets" that those stateful connections are connected to require some heavy-ish work on startup, so you're vulnerable to the "thundering herd" scenario.

But the author only uses it to keep alive a couple of IRC connections (which don't send you history or anything on re-connects) and to automatically backup their "huge" chat logs (seriously, 5 GiB is not huge, and if it's text then it can be compressed down to about 2 GiB — unless it's already compressed?).


You dont have to roll all the pods at the same time - there are built-in controls to avoid doing that and it’s the default. You will have to diy this if you’re using something else so, in fact, tp is wrong that k8s is somehow a bad fit for this use case


> You dont have to roll all the pods at the same time

That's not really the problem — if, say, one of your nodes drops dead (or just drops off the network), the clients' connections also drop, and they all try to reconnect. That just happens and there is not much you can do to prepare for it except by having some idle capacity already available.

Unless you're talking about rolling out strategies for deployment updates, and to be fair I don't remember the controls for that being all that useful but that was 2 years ago, so perhaps things are better now.


Having idling capacity is standard industry practice though. A secondary node in a primary/secondary setup of a typical monolith design is basically idle capacity except more expensive because it’s 100% over-provisioning which is not required with k8s


It's only a problem if your nodes go up/down often, or you have other things causing pods to be pre-empted/etc.

If you have a static number of nodes and don't have to worry too much about things autoscaling, I don't see why it couldn't be really stable?


You don’t?

Check out how services, load balancers and the majority of CNI actually work then.

Kubernetes was designed for stateless connections and it shows in many places.

If you want it to do stateful connections you could use something like Agones which intentionally bypasses a huge amount of kubernetes to use it only as a scheduler essentially.


> You don’t?

No, why do yours? :D

If you're using cluster autoscaling with very small (or perfectly sized) nodes, I could see it being more of an issue on a busy cluster.

But even then, I wouldn't set up a database to auto-scale. A new node could get created, but it doesn't mean the db pods will be moved to it. They'd ideally stay in the same location. And on a really busy cluster, I'd prefer a separate node pool for stateful apps.

Using something like Stackgres makes it relatively painless to run postgres in k8s too, it handles setting up replicas and can do automatic failover.


A lot of the CNI/load balancer stuff was added as band aid for applications that don't cooperate nicely with k8s.

Applications that act "native" and don't need a lot of the extras...

Well, they arguably mostly use just the scheduler then :D


Wait? you can run kubernetes with no CNI? My clusters have never even been able to register nodes as healthy without one.

Maybe I’m doing it wrong?


TL;DR - today the CNI itself is interface to network implementation, so you'd need a minimal one.

But you do not need a "complex" CNI. Originally k8s pretty much worked with assumption you can route few subnets in good old static way to the cluster and that's it, and it works with that kind of approach still - each node gets a /24, there's a separate shared /24 (or more) for services, etc.

The complexities came from the fact that a lot of places that wanted to deploy kubernetes couldn't provide such a simple network infrastructure to hosts, then later what was a workaround got equipped with various extra bells&whistles


I looked at Agones - the docs on architecture are non-existent but from their ops docs it looks like a crd extension on top of vanilla kubernetes to automate/simplify scheduling. What specifically in cni or its most popular implementations prevents long running connections in your opinion?


First: they force kubernetes into a position where pods can’t be evicted.

Second: they use a version of node ports that bypasses CNI, so you directly connect to the process living on the node. This means there’s no hiccups with CNI if another node (or pod) gets unscheduled that had nothing to do with your process.

In most cases, web services will be fine with the kinds of hiccups I’m talking about (even websockets); however UDP streams will definitely lose data - and raw TCP ones may fail depending on the implementation.


What you're describing sounds like implementation bugs in the specific CNIs you've used not anything to do with the k8s networking design in general. At former gig I ran a geo-distributed edge with long, persistent connections over Cilium and we had no issues sustaining 12h+ RTMP connections while scaling/downscaling and rolling pods on the same nodes. I've consulted for folks who did RTP (for WebRTC) which is UDP-based also with no issues. In fact, where we actually had issues was cloud load-balancing infra which in a lot of cases is not designed for long-running streams...


Not a single mention of Quassel in the article or in the comments, which is honestly surprising. It's a client-server architecture IRC client specifically made to make it easy to save logs and persist IRC sessions, since you can host the server part on an actual server/VPS and connect to it from all of your different devices.


Weechat can also be used in a client/server architecture. It can run headless and expose a relay protocol (full weechat control and state) and/or an irc server (traditional bouncer).

Though, ironically, there are no CLI clients for its relay protocol, only for desktop/web/android.


I'd use Quassel, but I have custom weechat scripts. I can't easily implement those in Quassel.


Fair enough, it's just that Quassel immediately came to mind when talking about persistent IRC logs/sessions :)


  curl 'https://news.ycombinator.com/item?id=41332427' | grep -i just
(16)


This is funny but six them are in one comment, another four are in two replies :P

Need tables or something



Is IRC still a thing? I mean seriously, do communities hang around there? I stopped using in 2016.


Yes and yes.


Older social media tends to rot as they age, and there's always a few people who don't leave. Nearly all the IRC communities I used to hang around in have gone completely rotten. But they're still there.


> Is IRC still a thing?

Yes.

> I mean seriously, do communities hang around there?

Yes. They've been around for years.


Is your homelab geographically distributed? Because if it is not then you aren’t going to get much better durability than a single host. I bet this was an interesting project but just backing up your files to S3 or some other offsite storage is a lot simpler and much more durable to real failures.


Yep, mine is and I'm sure some others' are as well. Truly overkill but it's a fun hobby project.


I’m trying to resist the urge to move all my homelab setup to kubernetes too, mainly also because I don’t want to have to remember every dumb thing I did to customize my server and want everything to be deployable from a git repo, in case I need to rebuild it, etc.

But I’ve had the same Linux box for over 15 years now (through various hardware changes, I’ve kept the same Ubuntu install since circa 2008, using do-release-upgrade for major updates and making sure to keep up with security updates) and I’m not sure it’s really worth it to optimize for easy recovery from a fresh install. I back up my homedir and etc and some other important data, and if I had to rebuild my OS I’m sure I’d do it far cleaner this time and wouldn’t want to fully do things the way I had them anyway.

Even as a hobby it just doesn’t seem worth it to move away from the “pets, not cattle” model of servers when there’s just one or two of them (technically my router is separate but it’s a very simple router so there’s not much to do there)


I moved all my stuff to k3s for exactly the reasons it seems you’re describing. I’m not saying “do it” but… it was worth it for me.

I had a ton of stuff that was important to me running in VMs on a Proxmox server. I had backups, but if anything had happened to that it would still be a nightmare. Most of it had been there long enough that my technical knowledge and knowledge of how I had set it up was rusty to nonexistent. Which meant that I was afraid to touch it. Which made me get rustier… It was starting to be a constant source of stress and anxiety.

Now I’ve got single repository that has everything that isn’t “actual user data”. I can (and have tested) taking a couple machines, PXE booting them, and within 15 minutes with no interaction getting to a fresh cluster. Then redeploying all my stuff takes me a couple hours (mostly due to slow internet) with, again, no interaction besides hitting “go” on a bunch of things. And god help me if I forget how Kubernetes works because then I’m probably out of a job too.

I switched to running on four Lenovo ThinkCentre Tiny PCs (like $40 each I think) as opposed to a single large box to provide some redundancy as well. Makes things like hardware failures not as much of an issue, upgrades less risky, etc.

I’m not afraid to do things with it anymore. I never worry that a piece of hardware failing is going to suddenly and unexpectedly mean a lost weekend. It’s made it all fun again instead of stressful and like work.


Absolutely, I think pets are great at home.

Personally, I find docker compose to be the sweet spot for reproducibility and ease of use. There's some bespoke setup, but I can go from backups and fresh hardware to running apps in a couple of hours.

I switched to k8s entirely because I wanted to learn to use it, and I don't think I'd recommend it for anything else.


For what it's worth, my homelab machines are pets.


What do you use for distributed storage (if anything)? Storage has always been my biggest headache when trying to make geographically-redundant clusters of any kind.


Rook/Ceph is the standard solution


I like ceph in some cases, but ceph hasn't worked very well IME with multi-region clusters or just higher-than-lan-speed latency.

I never tried the actual multi-site support, but for my own homelab use-case - I don't have enough servers in each region to make their own ceph cluster.


I don’t think any strongly consistent posix data store will work well in a multi region setup.

Each write needs to be propagated before the application is unblocked. When you have tens of milliseconds of site to site latency this means every IO request is going to have worse performance than spinning rust.

Object storage can work better in this regard because the IO chunks tend to be larger and therefore the round trip time is traversed less per bit written.

Another alternative is that you run your regions in active passive setup so there is only one active region at a time and when the active region is changed all in process state in the previous active region can be sync to the new active region. Writes only need to be synchronously written within the active region.

Honestly, if you can do that you can make the big bucks at FAANG.

I personally just run a single home NAS which backs up to S3. Very strong durability in that S3 is never going to lose my data at the same time my NAS goes bad, but not great availability in that I will need to rebuild the NAS and pull down files from S3. When it comes to my home tech strong durability is a must but availability, not much so.


Awesome


Yaml is a language that protects daemons of your application from running away in the wild. It doesn’t matter that much if you draw your pentagrams on k8s or cloudformation or other dialects. Dark magic of modern times.


For those days when you really fancy writing YAML...


That's when you grab ansible and suffer writing bash in yaml.

Signed, someone happily not writing YAML for k8s despite using k8s extensively


Add Ansible for flavor


VMs on K8s using KubeVirt is a great way to lift and shift legacy software - i suspect this doesn’t need kubevirt!


Why not proxmox?


I already have a Kubernetes cluster with distributed storage set up. Adding Proxmox makes that setup unique. I don't want unique. I want to reuse the infrastructure I already have.


Would be funny to do a new version on proxmox to satisfy the proxmoxers. Then at the end reveal that proxmox was actually running on kubevirt with nested virt the entire time.


overdone in a perfect nerdy way


[flagged]


Can you please edit out swipes from your HN comments? That's in the site guidelines: https://news.ycombinator.com/newsguidelines.html.

... as is this:

"'That is idiotic; 1 + 1 is 2, not 3' can be shortened to '1 + 1 is 2, not 3."

https://news.ycombinator.com/newsguidelines.html


I am aware of the rules re: swipes. I just don't believe it is a swipe to describe running VMs inside kubenetes for an IRC client just to back up logs as "silly" (not "idiotic" as your example implies). It's accurate. This is a a silly/fun article. As for cargo-culting, that's when you implement some complex sequence of operations you've become familiar with hoping to achieve benefits from that complexity even if it has nothing to do with the process implemented. Here's it's done for seemingly self-aware recreation but it is still cargo cult.

It is not a serious post or something that anyone would recommend. You know this to be true (if you've read the article). And you'll note the tone of my post was accepting re: different tastes for recreational software stacks.

Instead of, "'That is idiotic; 1 + 1 is 2, not 3' can be shortened to '1 + 1 is 2, not 3."

It is more like, "That is silly; 100502340243 + 1 is 100502340244 and 100502340244 is a big awkward number. 2 is smaller."

....whereas in your version 1+1 is a normal situation one would actually encounter and recommend and this 100502340243 + 1 is the intrinsic weirdness needing pointing out. Your claimed swipe is actually the argument itself. It's not extraneous.


"Silly cargo culting" is for sure a swipe in the sense that the HN guidelines use the word. This is not a borderline call.

These things land at least 10x harder on the receiving end than the sender thinks they will.

Everyone underestimates their own provocations and overestimates the ones coming from others. That's probably the biggest problem with this genre of communication. https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...


Got it. No pointing at the emperor's new clothes even if the emperor does so themselves in the linked article; context doesn't matter. I will refrain from doing so in the future.


> also want it to be a bit more redundant so that if one physical computer dies then it'll just be rescheduled to another machine and I'll be back up and chatting within minutes without human intervention.

It seems an explicit requirement was not having to manually restore backups. You may or may not be wrong, but I find most arguments against tools like Kuberenetes boil down to "if you ignore the requirements, you could have done it simpler with bash!", which, yea.


I think it's fine if you are happy with your crontab/systemd based system. But the author already has a k8s cluster with storage and VM automatically provisioned in a declarative way, so what's wrong with that? crontab/systemd would be more complicated for them.


I wouldn't say it's cargo-culting, but, it's definitely silly (and intended to be).

Not having to manage systems anymore, and fully relying on kubernetes to configure and manage everything is great, especially leveraging the redundancy and reliability of it.


People so very often are quick to look down upon anything using k8s these days, often citing complexity and some hand-wavy statement about "you're not web scale" or similar. These statements are usually accompanied by complex suggestions as "more simple" alternatives.

On one hand you can manually provision a VM, configure the VM's firewall, update the OS, install several packages, write a bash script which can be used with cron, configure cron, setup backups, and more, followed by routine maintenance that is required to keep this system running and secure. This dance has to be done every time you deploy the app... oh, and don't forget that hack you did one late night to get your failing app back online but forgot to document!

Or, on the other hand, you can write 1-3 relatively simple yaml files, just once, that explicitly describe what your app should look like within the cluster, and then it just does it's thing forever.

k8s is very flexible and very useful, even for simple things. Why would you not use it, should be the default question.


Setting up and maintaining along the happy path is not an issue, it’s well-documented and has existing automations if desirable so you can set up a cluster in just a few minutes (if you have the experience already).

And YAML files are nearly trivial in most cases. Boilerplate is not pretty but it’s easy.

The only Problem (with a capital P) with K8s is when it suddenly and unexpectedly breaks down - something fails and you need to figure out what is going on. That’s where its complexity bites you, hard - because it’s a whole new OS on top of another OS - so, a whole new giant behemoth to learn and debug. If you’re lucky it’s already documented by someone so you can just follow the recipe, but if luck runs out (it always eventually does) it’s not a common case you’re going to have some very unpleasant time.

I’m a lazy ass, I hate having to debug extremely complex systems when I don’t really need to. All those DIY alternatives are there not because people are struggling with NIH syndrome but because it is orders of magnitude simpler. YMMV.

Or if you want to make K8s do something it isn’t designed or not presently capable of, so you need to hack on it. Grokking that codebase requires a lot of brainpower and time (I tried and I gave up, deciding I don’t want to, until the day finally comes and I must.)


This is the other common pushback on using k8s, and it's usually unfounded. Very rarely will you ever need to troubleshoot k8s itself, if ever.

I highly doubt most situations, especially ones like this article, will ever encounter an actual k8s issue, let alone discover a problem that requires digging into the k8s codebase.

If you are doing complex stuff, then ya, it will be complex. If you are not doing complex stuff and find yourself thinking you need to troubleshoot k8s and/or dig into the codebase - then you are very clearly doing something wrong.

A homebrewed alternative will also be complex in those cases, and most likely more complex than k8s because of all the non-standard duct tape you need apply.


Even using managed k8s such as GKE, at $DAYJOB, we have run into issues where the default DNS configuration with kube-dns just falls over, and DNS resolution within workloads starts hanging/timing out. We were doing absolutely nothing special. Debugging it was challenging, and GCP support was not helpful.


Hah. Okay, here's an anecdote.

First time I had to look under the hood (and discover it's not exactly easy to get through the wiring) is when I set up my first homelab cluster. I wanted to run it over an IPv6-only network I had, and it while the control plane worked data plane did not - turned out that some places were AF_INET only. That was quite a while ago, and it's all totally okay. But it was documented exactly nowhere except for some issue tracker, and it wasn't easy to get through CNI codebase to find enough information to find that issue.

So, it wasn't doing something wrong, but I had to look into it.

So I ran that cluster over IPv4. But it was a homelab, not some production-grade environment - so, commodity hardware, flaky networks and chaos monkeys. And I had it failing in various ways quite frequently. Problems of all kind, mostly network-related - some easy and well-documented like nf_conntrack capacity issues, some were just weird (spontaneous intermittent unidirectional connectivity losses that puzzled me, and the underlying network was totally okay, it was something in the CNI - I had to migrate off Weave to, IIRC, Calico, as Weave was nicer to configure but too buggy and harder to debug).

I also ran a production GKE cluster, and - indeed - it failed much less often, but I still had to scrape and rebuild it once, because something had happened to it, and by then I knew that I'm simply not in capacity of proper diagnostics (but creating new nodes and discarding old ones, so all failing payloads migrate is easy - except, of course, that it solves nothing in the long term).

In the end, for the homelab, I realized I don't need the flexibility of K8s. I briefly experimented with Nomad, and realized I don't even need Nomad for what I'm running. All I essentially wanted was failover, some replicated storage, and a private network underneath. And there are plenty of already robust age-proved tools for doing all of those. E.g. I needed a load balancer, but not specifically K8s load balancer. I know nginx and traefik a little bit, including some knowledge of of their codebase (I debugged both, successfully) so I just picked one. Same for the networking and storage, I just built what I needed from the well-known pieces with a bit of Nix glue, much more simpler and direct/hardcoded rather than managed by CNI stack. Most importantly, I didn't need fancy dynamic scaling capacities at all, as all my loads were and still are fairly static.

As for that GKE cluster - it worked, until the company failed (for entirely non-technical reasons with upper management).

If you know what I did wrong, in general - please tell. I'm not entirely dismissing K8s (it's nice to have its capabilities), but currently have a strong opinion that it's way too complex to be used when you don't have a team or person who can actually understand it. Just like you wouldn't run GNU/Linux or FreeBSD on server back in '00s without having an on-call sysadmin who knows how to deal with it (I was one, freezing my ass in the server room, debugging NIC driver crashing kernel at 2am - fun times (not really)!, and maybe I grew dumber but a lot of Linux source code is somewhat comprehensible to me, while K8s is much harder to get through.)

This is all just my personal experience and my personal attitude to how I do things (I just hate not knowing how things in my care/responsibility work, because I feel helpless). As always, YMMV.


1) If you're dipping into Calico and friends then I'd argue your setup is not simple, so it is not surprising your config/experience were also not simple. Configuring a VPC and setting up routes etc for any cloud provider by hand will also be quite complicated. In my opinion, this is not a k8s issue, but rather just a complexity issue with your setup. Sometimes complexity is necessary...

2) IPv6 is an issue even today with many systems. There's also not a large need to run IPv6 inside your cluster, but I have not actually tried and cannot comment on how well it works. It's possible it was a "round hole, square peg" issue at the time.

3) Regarding your GKE Cluster, I find it improbable that k8s itself was borked, especially so if it was "managed" k8s from the cloud provider (meaning they provide the control plane, etc). It seems to me it's much more likely something inside the cluster broke (ie. one of your apps/services) and in the heat of the moment it was easier to just throw everything out. Cycling nodes has the effect of redeploying your apps/service (they are not migrated the way something like xen or vmware does), which probably indicates they were the issue if you did not also modify the manifests at the same time. Were your services configured with the relevant health endpoints k8s uses to determine if your app is still alive & ready to receive requests? Without it, a failing app will stay failed, and k8s won't necessarily know to cycle it.


> If you're dipping into Calico and friends

That was my homelab (all bare metal, except for one node that was an VPS), so full DIY setup without any clouds, meant to try things out and learn how they work. I tried a bunch of CNI options, I remember at least Weave, Calico and Flannel. I just opened K8s page about networking and looked at the presented options, trying them out.

The idea was to learn stuff. I’ve yet to see a complex system that never fails, so I wanted to see how it works (and what are the limitations, thus the v6 experiment, heterogeneous node architecture with one aarch64 node, etc.), how it can fail (accelerated through less homogeneous and less reliable conditions) and what does it take to debug and fix it. On my terms, when I can tolerate downtime and have time to research, not when a real production system decides it wants to satisfy Murphy laws at the worst possible moment. And - no surprise - it had its failures, and I learned that debugging those aren’t easy at all. Was nice when it worked, of course (or I wouldn’t have bothered at all).

> Regarding your GKE Cluster, I find it improbable that k8s itself was borked

Well, what can I say? It did, and it certainly wasn’t an application-level error.

Actually, now I remember (it was five years ago or about so - quite a while so my memory is blurry) it did that not once but twice. One time I diagnosed the issue - it was a simple conntrack table overflow, so I had to bump it up. Another time, I have no idea what was wrong - I just lost database connectivity, but I’m certain it wasn’t the application or the database but something in the infra.


> Actually, now I remember (it was five years ago or about so - quite a while so my memory is blurry) it did that not once but twice. One time I diagnosed the issue - it was a simple conntrack table overflow, so I had to bump it up. Another time, I have no idea what was wrong - I just lost database connectivity, but I’m certain it wasn’t the application or the database but something in the infra.

Neither of these are k8s issues though. Where were you playing with `conntrack`? On the backplane?

The issues you describe here are issues you created for the most part. They are not issues people run into in production with k8s, I can assure you of that.

> I just lost database connectivity, but I’m certain it wasn’t the application or the database but something in the infra

Most likely something with your cloud provider that you did not understand fully and therefore blamed k8s, the thing you understood the least at the time.


> Neither of these are k8s issues though. Where were you playing with `conntrack`? On the backplane?

Yes, on the host (GKE-provisioned VPS) where the application container ran.

While it's certain this is not in K8s itself, I'm not really sure where to draw the line. I mean, IIRC, K8s relies on kernel's networking code quite a lot (e.g. kube-proxy is all about that), so... I guess it's not precisely clear if it's in or out of scope.

But either way, they're still certainly GKE issues, because the whole thing was provisioned as GKE K8s cluster, where I think I wasn't really supposed to actually SSH to individual nodes and do something there.

> The issues you describe here are issues you created for the most part. They are not issues people run into in production with k8s, I can assure you of that.

Entirely irrespective of K8s or anything else... people don't create weird issues for themselves in production? ;-) I honestly suspect making sub-optimal decisions and reaping their unintended consequences is one thing that makes us human :-) And I'm sure someone out there right now tries some weird stuff in production because they thought it would be a good idea. Maybe even with K8s (but not exactly likely - people hack on complex systems less than on simple systems).

By the way, if you say connectivity hiccups aren't a thing in production-grade K8s, I really wonder what kind of issues people run into?

> Most likely something with your cloud provider

I remember that node-to-node host communications had worked and database was responsive, but the container had connection timeouts, which is why I suspect it was something with K8s.

But, yes, of course, it's possible it wasn't exactly a K8s issue but something with the cloud itself - given that I don't know what was the problem back then, I can't really confirm or refute this.


I have had similar experiences on GKE. If a managed service can’t even get k8s right…


> Or if you want to make K8s do something it isn’t designed or not presently capable of, so you need to hack on it.

I ask totally nonargumentatifely: can you think of any examples?

I’ve twisted and contorted Kubernetes into some frankly horrifying shapes over the years and never run into a situation where it wouldn’t do something I needed it to do.

It’s, essentially, just a whole big pile of primitives and most of the functionality is implemented within the platform anyway (not as part of the platform).

The furthest I’ve had to go is some custom resources, setting up some webhooks, writing a controller, and integrating against the API.

Wondering what gaps I have here and what I might try or run in to that’s gonna run me up against that wall.


Running everything over an IPv6-only network, I believe, was not possible five years ago. IIRC I had some issues with data plane, but it was a while ago so I don't really remember the details. Maybe it is now, maybe there are still some issues - I honestly haven't looked since then.


Thanks for replying--that's actually really good to know right now. Was just embarking on a project to finally try and get IPv6 running on my network... I'll maybe plan to do a PoC with IPv6 in the cluster before getting myself committed.


If you stood up k8s just for an IRC client that would definitely be silly, but honestly if you already have it (in this context presumably as the way you self-host stuff at home) then I don't really think there's anything silly about it.

If anything it would be silly to have all your 'serious' stuff in k8s and then a little systemd unit on one of the nodes (or a separate instance/machine) or something for your IRC client!

(Before I'm accused of cargo-culting: I don't run it at home, and just finished moving off it at work last week.)


> if it works for him

Don't think they're a him


Thanks, default assumption on my part having only read the article itself. Changed.


[flagged]


just a bit of fun


Just use irccloud... It's worth the $5/mo


Where's the fun in that though?


I love irccloud! Only way I could use IRC and not go insane




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: