Bare-Metal Kubernetes, Part I: Talos on Hetzner

MathiasPius · on Sept 9, 2023

I recently rebuilt my Kubernetes cluster running across three dedicated servers hosted by Hetzner and decided to document the process. It turned into a (so far) 8-part series covering everything from bootstrapping and firewalls to setting up persistent storage with Ceph.

Part I: Talos on Hetzner https://datavirke.dk/posts/bare-metal-kubernetes-part-1-talo...

Part II: Cilium CNI & Firewalls https://datavirke.dk/posts/bare-metal-kubernetes-part-2-cili...

Part III: Encrypted GitOps with FluxCD https://datavirke.dk/posts/bare-metal-kubernetes-part-3-encr...

Part IV: Ingress, DNS and Certificates https://datavirke.dk/posts/bare-metal-kubernetes-part-4-ingr...

Part V: Scaling Out https://datavirke.dk/posts/bare-metal-kubernetes-part-5-scal...

Part VI: Persistent Storage with Rook Ceph https://datavirke.dk/posts/bare-metal-kubernetes-part-6-pers...

Part VII: Private Registry with Harbor https://datavirke.dk/posts/bare-metal-kubernetes-part-7-priv...

Part VIII: Containerizing our Work Environment https://datavirke.dk/posts/bare-metal-kubernetes-part-8-cont...

And of course, when it all falls apart: Bare-metal Kubernetes: First Incident https://datavirke.dk/posts/bare-metal-kubernetes-first-incid...

Source code repository (set up in Part III) for node configuration and deployed services is available at https://github.com/MathiasPius/kronform

While the documentation was initially intended more as a future reference for myself as well as a log of decisions made, and why I made them, I've received some really good feedback and ideas already, and figured it might be interesting to the hacker community :)

cjr · on Sept 9, 2023

Great write up and what I especially enjoyed was how you kept the bits where you ran into the classic sort of issues, diagnosed them and fixed them. The flow felt very familiar to whenever I do anything dev-opsy.

I’d be interested to read about how you might configure cluster auto scaling with bare metal machines. I noticed that the IP address of each node are kinda hard-coded into firewall and network policy rules, so that would have to be automated somehow. Similarly with automatically spawning a load-balancer from declaring a k8s Service. I realise these things are very cloud provider specific but would be interested to see if any folks are doing this with bare metal. For me, the ease of autoscaling is one of the primary benefits of k8s for my specific workload.

I also just read about Sidero Omni [1] from the makers of Talos which looks like a Saas to install Talos/Kubernetes across any kind of hardware sourced from pretty much any provider — cloud VM, bare metal etc. Perhaps it could make the initial bootstrap phase and future upgrades to these parts a little easier?

[1]: https://www.siderolabs.com/platform/saas-for-kubernetes/

MathiasPius · on Sept 9, 2023

When it comes to load balancing, I think the hcloud-cloud-controller-manager[1] is probably your best bet, and although I haven't tested it, I'm sure it can be coerced into some kind of working configuration with the vSwitch/Cloud Network coupling, even if none of cluster nodes are actually Cloud-based.

I haven't used Sidero Omni yet, but if it's as well architected as Talos is, I'm sure it's an excellent solution. It still leaves open the question of ordering and provisioning the servers themselves. For simpler use-cases it wouldn't be too difficult to hack together a script to interact with the Hetzner Robot API to achieve this goal, but if I wanted any level of robustness, and if you'll excuse the shameless plug, I think I'd write a custom operator in Rust using my hrobot-rs[2] library :)

As far as the hard-coded IP addresses goes, I think I would simply move that one rule into a separate ClusterWideNetworkPolicy which is created per-node during onboarding and deleted again after. The hard-coded IP addresses are only used before the node is joined to the cluster, so technically the rule becomes obsoleted by the generic "remote-node" one immediately after joining the cluster.[3]

[1] https://github.com/hetznercloud/hcloud-cloud-controller-mana...

[2] https://github.com/MathiasPius/hrobot-rs

[3] https://github.com/MathiasPius/kronform/blob/main/manifests/...

smartbit · on Sept 9, 2023

Have you tried KubeOne? Also with the benefits of machine-deployments. Works like a charm, didn’t go through your blogs, but KubeOne on Hetzner [0] seems easier than your deployment. And yes, also Open Source and German support available.

[0] https://docs.kubermatic.com/kubeone/main/architecture/suppor...

MathiasPius · on Sept 9, 2023

Hetzner Cloud is officially supported, but that means setting up VPSs in Hetzner's Cloud offering, whereas this project was intended as a more or less independent pure bare-metal cluster. I see they offer Bare Metal support as well, but I haven't dived too deep into it.

I haven't used KubeOne, but I have previously used Syself's https://github.com/syself/cluster-api-provider-hetzner which I believe works in a similar fashion. I think the approach is very interesting and plays right into the Kubernetes Operator playbook and its self-healing ambitions.

That being said, the complexity of the approach, probably in trying to span and resolve inconsistencies across such a wide landscape of providers, caused me quite a bit of grief. I eventually abandoned this approach after having some operator somewhere consistently attempt and fail to spin up a secondary control plane VPS against my wishes. After poring over loads of documentation and half a dozen CRDs in an attempt to resolve it, I threw in my hat.

Of course, Kubermatic is not Syself, and this was about a year ago, so it is entirely possible that both projects are absolutely superb solutions to the problem at this point.

baz00 · on Sept 9, 2023

Ah man just looking at that list makes me glad for EKS. But thanks for the effort, I will read to learn more.

msm_ · on Sept 10, 2023

If you ever want to have fun with setting up your own k8s, I recommend to start small. The author is already knowledgeable, so they probably knew from the start what they want, but a lot of this complexity is not essential.

When I deployed my first kubernetes "cluster", I just spinned a single-node "cluster" using kubeadm (today k3s is an option too) and started deploying services (with no distributed storage - everything stored using hostPath). You only need to know kubernetes basics to do this. Then you probably want to configure CNI (I recommend flannel when starting, later cilium), spin an ingress controller (I recommend nginx or traefik), deploy cert-manager (this was hard for me when I started) and you can go a long way. With time I scaled up, decided to use GitOps, and deployed many more services (including my own registry - I started with docker's own, then migrated to Gitea. Harbor is too heavy for me). And of course over time you add monitoring, alerting etc - the fun never ends (but it's all optional, you should to decide when is the right time).

MathiasPius · on Sept 9, 2023

Absolutely! If at all possible, go managed, preferably with a cloud provider that handles all the hard things for you like load balancing and so on.

*Sometimes* however, you want or need full control, either for compliance or economic reasons, and that's what I set out to explore :)

js4ever · on Sept 9, 2023

Agreed, this is probably the best ad for managed k8s, this and horrors stories about self managed k8s clusters falling appart.

ralala · on Sept 9, 2023

Interesting read. I have just setup a very similar cluster this week: 3 node bare metal cluster in a 10G mesh network. Decided for Debian, RKE2, Calico and Longhorn. Encryption is done using LUKS FDE. For Load Balancing I am using the HCloud Load Balancer (in TCP mode). At first I had some problems with the mesh network as the CNI would only bind to a single interface. Finally solved it using a bridge, veth and isolated ports.

fireflash38 · on Sept 9, 2023

Using containerd I assume? I've been trying to get RKE2 or k3s play nicely with CRI-O and it's been a long exercise in frustration.

KyleSanderson · on Sept 9, 2023

which distro? it should just work out of the box.

fireflash38 · on Sept 10, 2023

Initially Ubuntu 20.04, but I upgraded to 22.04. Finally got it working -- turns out a lot of things that reference `--cgroup-driver="systemd"` are doing it as if it were run in shell, which means that the quotes around "systemd" get removed by shell, and would lead to an error & ignored options.

Nothing was showing whatsoever when using 20.04, so I wonder if there were some missing dependencies somewhere there...

I'll probably write up everything I discovered at some point, there's a lot of pieces that you have to cobble together from pretty disparate sources (network plugins, config files (which!?), etc).

AndrewKemendo · on Sept 9, 2023

Thank you for the amazing write up!

mythz · on Sept 9, 2023

Thankfully we've never had the need for such complexity and are happy with our current GitHub Actions > Docker Compose > GCR > SSH solution [1] we're using to deploy 50+ Docker Containers.

Requires no infrastructure dependencies, stateless deployment scripts checked into the same Repo as Project and after GitHub Organization is setup (4 secrets) and deployment server has Docker compose + nginx-proxy installed, deploying an App only requires 1 GitHub Action Secret, as such it doesn't get any simpler for us and we'll look to continue to use this approach for as long as we can.

[1] https://servicestack.net/posts/kubernetes_not_required

seabrookmx · on Sept 9, 2023

I used to do something similar at a previous company and this works well if you don't have to worry about scaling. YAGNI principal and all that. When you run hundreds of containers for different workloads, k8s bin packing and autoscaling (both on the pod and node level) tips the balance in my experience.

mythz · on Sept 9, 2023

Yeah if we ever need to autoscale then I can see Kubernetes being useful, but I'd be surprised if this a problem most companies face.

Even when working at StackOverflow (serving 1B+ pages, 55TB /mo [1]) did we need any autoscaling solution, it ran great on a handful of fixed servers. Although they were fairly beefy bare metal servers which I'd suspect would require significantly more VMs if it was to run on the Cloud.

[1] https://stackexchange.com/performance

swozey · on Sept 9, 2023

I was a k8s contrib since 2015, version 1.1. I even worked at Rancher and Google Cloud. If you don't need absolutely granular control over a PAAS/SAAS (complex networking w/ circuit breaking yadda yadda, deep stack tracing, vms controlled by k8s (kubevirt etc), multi-tenancy in cpu or gpu) you don't need k8s and will absolutely flourish using a container solution like ECS. Use fargate and arm64 containers and you will save an absolute fortune. I dropped our AWS bill from $350k/mo to around $250k converting our largest apps to arm from x86.

GKE is IMO the best k8s solution PAAS wise that exists, but quite frankly few companies need that much control and granularity in their infrastructure.

My entire infrastructure now is AWS ECS and it autoscales and I literally never, ever, ever have had to troubleshoot it outside of my own configuration mishaps. I NEVER get on call alerts. I'm the Staff SRE at my corp.

walth · on Sept 10, 2023

Can completely agree. We went k8s several years ago and we battle the complexity continually.

Want to change the pod spec to alter a variable under Vitess? That will be restarting several thousands of pods.

DNS lookups failing as IPVS tuple collision?

API server memory balooning to OOM due to watch/list cm storm?

It all basically boils down to several FTE servicing K8s instead of building our platform. And on call is a nightmare.

CoolCold · on Sept 10, 2023

> Ceph is designed to host truly massive amounts of data, and generally becomes safer and more performant the more nodes and disks you have to spread your data across.

I'm very pessimistic on CEPH usage in the scenario you have - may be I've missed it, but seen nothing about upgrading networking, as by default you gonna have 1Gbit on single interface used for public network/internal vSwitch.

Even by your benchmarks, write test is 19 iops (block size is huge though)

    Max bandwidth (MB/sec): 92
    Min bandwidth (MB/sec): 40
    Average IOPS:           19
    Stddev IOPS:            2.62722
    Max IOPS:               23
    Min IOPS:               10

while single HDD drive would give ~ 120 iops. single 3 years old NVMe datacenter edition, gives ~ 33000 iops with 4k block + fdatasync=1

CEPH would be very limiting factor in 1Gbit networking I believe - I'd put clear disclaimer on that for fellow sysadmins.

P.S. The amount of work you done is huge and appreciated.

sureglymop · on Sept 10, 2023

Here's what I don't really get.. So, let's say you have three hosts and create your cluster. But now, you still need a reverse proxy or load balancer in front right? I mean not inside the cluster but to route requests to nodes of the cluster that are not currently down. So you could set up something like HAProxy on another host. But now you once again have a single point of failure. So do you replicate that part also and use DNS to make sure one of the reverse proxies is used? Maybe I'm just misunderstanding how it works but multiple nodes in a cluster still need some sort of central entry point right? So what is the correct way to do this.

MathiasPius · on Sept 10, 2023

My solution for this setup is having ingress controllers on all three nodes, and then specifying all three IPs in all DNS records. That way the end user will "load balance" based on the DNS randomization.

Of course, if a node goes down, a third of the traffic will be lost, but with low TTLs and some planning, you can minimoze the impact of this.

sureglymop · on Sept 10, 2023

It's an interesting approach. I did it a bit differently. I set up three Proxmox nodes on three hetzner servers. Then I deployed virtual routers. I then set up HAProxy and k3s nodes as LXC containers. What's nice about the whole setup is that a proxmox node can go down and it all still works. I will now set up keepalived as mentioned in the other reply so the HAProxies will also be fully HA. Proxmox also works well with zfs and backups. I set up the proxmox nodes manually and did the rest with terraform + ansible. One `terraform destroy` cleans up everything nicely. I wonder how the performance difference is between bare metal and k8s node in LXC.

ralgozino · on Sept 10, 2023

You almost answered your own question. One common solution is to have 2 nodes with haproxy (or similar) sharing a virtual IP with keepalived that load balance de traffic to the control plane nodes and to the nodes where your ingress controller runs.

There are other options, like running the haproxy in the control plane nodes.

sureglymop · on Sept 10, 2023

Thank you, this was very helpful! I read up on keepalived and the used protocols now!

wg0 · on Sept 9, 2023

I've come to the conclusion (after trying kops, kubespray, kubeadm, kubeone, GKE, EKS) that if you're looking for < 100 node cluster, docker swarm should suffice. Easier to setup, maintain and upgrade.

Docker swarm is to Kubernetes what SQLite is to PostgreSQL. To some extent.

husarcik · on Sept 9, 2023

The docker swarm ecosystem is very poor as far as tooling goes. You're better off using docker-compose (? maybe docker swarm) and then migrating to k3s if you need a cluster.

My docker swarm config files are nearly the same craziness as my k3s config files so I figured I might as well benefit from the tooling in Kubernetes.

Edit for more random thoughts: being able to use helm to deploy services helped me switch to k3s from swarm.

amazingman · on Sept 9, 2023

This is almost exactly my experience with Docker Compose, which is lionized by commenters in nearly every Kubernetes thread I read on HN. It's great and super simple and easy ... until you want to wire multiple applications together, you want to preserve state across workload lifecycles for stateful applications, and/or you need to stand up multiple configurations of the same application. The more you want to run applications that are part of a distributed system, the uglier your compose files get. Indeed, the original elegant Docker Compose syntax just couldn't do a bunch of things and had to be extended.

IMO a sufficiently advanced Docker Compose stack is not appreciably simpler than the Kubernetes manifests would be, and you don't get the benefits of Kubernetes' objects and their controllers because Docker Compose is basically just stringing low-level concepts together with light automation.

wg0 · on Sept 9, 2023

Then the Helm and layers of kustomize are not easy to reason with either.

That's system configuration and that'll become tedious for sure.

doctorpangloss · on Sept 9, 2023

Helm and Kustomize are low-budget custom resource definitions. They serve their purpose well and they have few limitations considering how much they can achieve before you write your own controllers.

In my opinion, the complexity is symptomatic of success: once you make a piece of some kind of seemingly narrowly focused software that people actually use, you wind up also creating a platform, if not a platform-of-platforms, in order to satisfy growth. Kubernetes can scale for that business case in ways Docker Swarm, ELB, etc. do not.

Is system configuration avoidable? In order to use AWS, you have to know how a VPC works. That is the worst kind of configuration. I suppose you can ignore that stuff for a very long time, you'll be paying ridiculous amounts of money for the privilege - almost the same in bandwidth costs, transiting NAT gateways and all your load balancers, whatever mistakes you made, as you do in compute usage. Once you learn that bullshit, you know, Kubernetes isn't so tedious after all.

doctorpangloss · on Sept 9, 2023

Any sufficiently complicated Docker Swarm, Heroku, Elastic Beanstalk, Nomad or other program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of vanilla Kubernetes.

wg0 · on Sept 9, 2023

Most smaller teams do not need a full fledge kubernetes anyways.

There's no one size fits all approach. There are trade offs. The Kubernetes tractor needs lots of oiling and what not for all the bells and whistles.

Trade offs is the keyword here.

ori_b · on Sept 9, 2023

Unfortunately, the above statement also applies to kubernetes.

amazingman · on Sept 9, 2023

A pithy response to be sure, but is it true? Every Kubernetes object type exists within a well-specified hierarchy, has a well-specified specification, an API version, and documentation. Most of the object families' evolution are managed by a formal SIG. Not sure how any of that qualifies as ad-hoc or informal.

ori_b · on Sept 9, 2023

"It's not a mess! it was designed by committee!"

I'm not sure what to say here. The kubernetes docs and code speak for themselves. If you actually think that it's clean, simple, well designed, and easy to operate, with smooth interop between the parts, I can't change your mind. But in practice, I have found it very unpleasant. It seems this is common, and the usual suggestion is to pay someone else to operate it.

amazingman · on Sept 9, 2023

First you were complaining that it was ad hoc and informal. Now you seem to be complaining that it's too formal and designed by committee.

Also I never said Kubernetes was well-designed, easy, or simple.

ori_b · on Sept 9, 2023

You say that as though bureaucracy is equivalent to formalism. It's not.

mardifoufs · on Sept 9, 2023

Kubernetes is anything but adhoc. That's the best thing, but can also be the most annoying, part about it

blowski · on Sept 9, 2023

I agree in part - the features and simplicity of Docker Swarm are very appealing over k8s, but it also feels like so neglected that I'd be waiting every day for the EOL announcement.

wg0 · on Sept 9, 2023

It's built from another separate project called swarm-kit. So if it comes to that where it is abandoned, the forks would be out in the wild soon enough.

I see more risk of docker engine as a whole pulling some terraform/elastic search licensing someday as investors get desperate to cash out.

linuxdude314 · on Sept 9, 2023

Docker is largely irrelevant in modern container orchestration platforms. Kubernetes dropped docker support as of 1.24 in favor of CRI-O.

Docker is just one of many implantations of the Open Container Initiative (OCI) specifications. It’s not even fully open source at this point.

Under the hood Docker leverages containerd which in tern leverages runc which leverages libcontainer for spawning processes.

Linux containers at this point will exist perfectly fine if Docker as a corporate entity disappears. The most impact that would be felt would be Dockerhub being shutdown.

They also sort of already did pull something like Hashicorp with their Docker Desktop product for MacOS.

That’s a little different than if Docker disappeared completely, but one could easily switch to Podman (which has a superset of the docker syntax).

yjftsjthsd-h · on Sept 9, 2023

> Docker is just one of many implantations of the Open Container Initiative (OCI) specifications. It’s not even fully open source at this point.

How so? I know Docker Desktop wraps its own stuff around docker, but AFAIK docker itself is FOSS.

KronisLV · on Sept 9, 2023

> I've come to the conclusion (after trying kops, kubespray, kubeadm, kubeone, GKE, EKS) that if you're looking for < 100 node cluster, docker swarm should suffice. Easier to setup, maintain and upgrade.

Personally, I'd also consider throwing Portainer in there, which gives you both a nice way to interact with the cluster, as well as things like webhooks: https://www.portainer.io/

With something like Apache, Nginx, Caddy or something else acting as your "ingress" (taking care of TLS, reverse proxy, headers, rate limits, sometimes mTLS etc.) it's a surprisingly simple setup, at least for simple architectures.

If/when you need to look past that, K3s is probably worth a look, as some other comments pointed out. Maybe some other of Rancher's offerings as well, depending on how you like to interact with clusters (the K9s tool is nice too).

bionsystem · on Sept 9, 2023

When I was deploying swarm clusters I would have a default stack.yml file with portainer for admin, traefik for reverse-proxying, and prometheus, grafana, alertmanager, unsee, cadvisor, for monitoring and metrics gathering. All were running on their own docker network completely separated from the app and were only accessible by ops (and dev if requested, but not end users). It was quite easy to deploy with HEAT+ansible or terraform+ansible and the hard part was the ci/cd for every app each in its tenant, but it worked really really well.

hn_user82179 · on Sept 9, 2023

I’ve been at a company running swarm in prod for a few years. There have been several nasty bugs that are fun to debug but we’ve accumulated several layers of slapped bandaids trying to handle swarm’s deficiencies. I can’t say I’d pick it again, nor would I recommend it for anyone else.

linuxdude314 · on Sept 9, 2023

Node count driven infrastructure decisions make little sense.

A better approach is to translate business requirements to systems capabilities and evaluate which tool best satisfies those requirements given the other constraints within your organization.

Managed Kubernetes solutions like GKE require pretty minimal operational overhead at this point.

nyljasdfw342 · on Sept 9, 2023

amount of nodes is a poor position to take... it should be the features and requirements you need for the cluster.

If Docker Swarm satisfies, then yes.

riku_iki · on Sept 9, 2023

> Docker swarm is to Kubernetes what SQLite is to PostgreSQL. To some extent.

curious what do you mean? To me Postgresql doesn't have disadvantages over SQLite, everything is just better..

mulmen · on Sept 9, 2023

PostgreSQL is more complex to use and operate and requires more setup than SQLite. If you don’t need the capabilities of PostgreSQL then you can avoid paying the setup and maintenance costs by using the simpler SQLite.

riku_iki · on Sept 9, 2023

In simplest case, you do sudo apt install ... in both cases, connect to database and do your work..

mulmen · on Sept 9, 2023

I have never had a Postgres install go that easily. There’s still initialization and setup of the server and users. And you’ll have to do something about upgrades as well. Postgres isn’t difficult to set up but SQLite is just a file. It’s much simpler.

And that’s only the installation. Interaction with SQLite as a database is also simpler.

They both have uses but it’s strange to me to assert that they’re equally complex.

riku_iki · on Sept 9, 2023

> There’s still initialization and setup of the server and users.

that command will create postgres user in the system, you do su to that user, run psql and you all set.

> And you’ll have to do something about upgrades as well

apt will take care of it too

> but SQLite is just a file

there is some "file" in postgresql distribution, people just don't use it, because why they would?

> Interaction with SQLite as a database is also simpler.

any specifics?

mulmen · on Sept 10, 2023

SQLite datatypes are advisory so it pretty much just accepts anything you give it.

PostgreSQL definitely says no to schema violations.

These are both features.

Patrickmi · on Sept 9, 2023

I was using docker swarm cause of the simplicity and easy setup but the one feature that I really really need was to be able to specify which runtime to use, either I use runsc (and docker plugins don’t work with runsc) or runc as the default and it was too inefficient to have groups of node with certain runtime, I really do like swarm but it misses too much features that are important

MathiasPius · on Sept 9, 2023

I haven't had much opportunity to work with Docker Swarm, but the one time I did, we hit certificate expiration and other issues constantly, and it was not always obvious what was going on. It soured my perception of it a bit, but like I said I hadn't had much prior experience with it, so it might have been on me.

vbezhenar · on Sept 9, 2023

I didn’t try anything but kubeadm and it worked just fine for me for my 1 node cluster.

wg0 · on Sept 9, 2023

Besides my local cluster of virtual box cluster, I have tried Kubernetes on three clouds with at least a dozen different installers/distributions and operational pain would be a factor going forward has always been my gut feeling.

That's where the author also has following to say:

>My conclusion at this point is that if you can afford it, both in terms of privacy/GDPR and dollarinos then managed is the way to go.

And I agree. Kubernetes managed is also really hard for those of offering it and have to manage it for you behind the scenes.[0]

[0]. https://blog.dave.tf/post/new-kubernetes/

stavros · on Sept 9, 2023

I was of the same opinion, so I rolled my own thin layer over Compose:

https://harbormaster.readthedocs.io/

GordonS · on Sept 9, 2023

This looks really nice, but the main feature of Docker Swarm rather than, Docker Compose, is the ability to run on a cluster of servers, not just a single node.

stavros · on Sept 9, 2023

Ah, you're right, brain fart, sorry. Hm, I wonder how easily I could change Harbormaster to deploy on Swarm instead of using Compose...

InvaderFizz · on Sept 9, 2023

I'm going through you series now. Very well done.

I thought I would mention that age is now built in to SOPS, thus needs no external dependencies and is faster and easier than gpg.

MathiasPius · on Sept 9, 2023

Have seen age pop up here and there, but haven't spent the cycles to see where it fits in yet, so I just went with what I knew.

Will definitely take a look though, thanks!

xelxebar · on Sept 9, 2023

Speaking of k8s, anyone here know of ready-made solutions for getting XCode (i.e. xcodebuild) running in pods? As far as I'm aware, there are no good solutions for getting XCode running on Linux, so at the moment I'm just futzing about with a virtual-kubelet[0] implementation that spawns MacOS VMs. This works just fine, but the problem seems like such an obvious one that I expect there to be some existing solution(s) I just missed.

[0]:https://github.com/virtual-kubelet/virtual-kubelet/

yjftsjthsd-h · on Sept 9, 2023

https://blog.darlinghq.org/2023/08/21/progress-report-q2-202... talks about running darling in flatpak, so it's not too much of a stretch to imagine it in a pod someday, but I don't think it's there today.

doctorpangloss · on Sept 9, 2023

There are no good ready made solutions.

Someone has submitted patches to containerd and authored “rund” (d for darwin) to run HostProcess containers on macOS.

The underlying problem is poorly familiarity with Kubernetes on Windows among Kubernetes maintainers and users. Windows is where all similar problems have been solved, but the journey is long.

dhess · on Sept 9, 2023

What performance numbers are you seeing on pods with Ceph PVs? e.g., what does `rados bench` give?

MathiasPius · on Sept 9, 2023

I rand rados benchmarks and it seems writes are about 74MB/s, whereas both random and sequential reads are running at about 130MB/s, which is about wire speed given the 1Gbit/s NICs.

Complete results are here: https://gist.github.com/MathiasPius/cda8ae32ebab031deb054054...

dhess · on Sept 9, 2023

Thanks!

MathiasPius · on Sept 9, 2023

I haven't had an excuse to test it yet, but since it's only 6 OSDs across 3 nodes and all of them are spinning rust, I'd be surprised if performance was amazing.

I'm definitely curious to find out though, so I'll run some tests and get back to you!

wiktor-k · on Sept 9, 2023

Very nice write-up!

I wonder if it's possible to combine the custom ISO with cloud init [0] to automate the initial node installation?

[0]: https://github.com/tech-otaku/hetzner-cloud-init

MathiasPius · on Sept 9, 2023

I believe the recommended[1] way to deploy Talos to Hetzner Cloud (not bare metal) is to use the rescue system and Hashicorp Packer to upload the Talos ISO, deploying your VPS using this image, and then configuring Talos using the standard bootstrapping procedure.

This post series is specifically aimed at deploying a pure-metal cluster.

[1] https://www.talos.dev/v1.5/talos-guides/install/cloud-platfo...

wiktor-k · on Sept 9, 2023

Ah, I see. Thanks for the explanation!

dave-at-koor · on Sept 12, 2023

Great post. We (Koor) have been going through something similar to create a demo environment for Rook-Ceph. In our case, we want to show different types of data storage (block, object, file) in a production-like system, albeit at the smaller end of scale.

Our system is hosted at Hetzner on Ubuntu. KubeOne does the provisioning, backed by Terraform. We are using Calico for networking, and we have our own Rook operator.

What would have made the Rook-Ceph experience better for you?

lemper · on Sept 9, 2023

I thought it was about talos the power9 system. intrigued by kubernetes on them.

zkirill · on Sept 9, 2023

Me too. That would be very cool and I'm surprised nobody is offering this as a service.

mulmen · on Sept 9, 2023

Just finished reading part one and wow, what an excellently written and presented post. This is exactly the series I needed to get started with Kubernetes in earnest. It’s like it was written for me personally. Thanks for the submission MathiasPius!

mkagenius · on Sept 9, 2023

From this, if people get the idea that they should get a Bare Metal on Hetzner and try. Don't. They will reject you probably, they are very picky.

And if you are from a developing country like India, don't even think about it.