Great write up and what I especially enjoyed was how you kept the bits where you ran into the classic sort of issues, diagnosed them and fixed them. The flow felt very familiar to whenever I do anything dev-opsy.
Iād be interested to read about how you might configure cluster auto scaling with bare metal machines. I noticed that the IP address of each node are kinda hard-coded into firewall and network policy rules, so that would have to be automated somehow. Similarly with automatically spawning a load-balancer from declaring a k8s Service. I realise these things are very cloud provider specific but would be interested to see if any folks are doing this with bare metal. For me, the ease of autoscaling is one of the primary benefits of k8s for my specific workload.
I also just read about Sidero Omni [1] from the makers of Talos which looks like a Saas to install Talos/Kubernetes across any kind of hardware sourced from pretty much any provider ā cloud VM, bare metal etc. Perhaps it could make the initial bootstrap phase and future upgrades to these parts a little easier?
When it comes to load balancing, I think the hcloud-cloud-controller-manager[1] is probably your best bet, and although I haven't tested it, I'm sure it can be coerced into some kind of working configuration with the vSwitch/Cloud Network coupling, even if none of cluster nodes are actually Cloud-based.
I haven't used Sidero Omni yet, but if it's as well architected as Talos is, I'm sure it's an excellent solution. It still leaves open the question of ordering and provisioning the servers themselves. For simpler use-cases it wouldn't be too difficult to hack together a script to interact with the Hetzner Robot API to achieve this goal, but if I wanted any level of robustness, and if you'll excuse the shameless plug, I think I'd write a custom operator in Rust using my hrobot-rs[2] library :)
As far as the hard-coded IP addresses goes, I think I would simply move that one rule into a separate ClusterWideNetworkPolicy which is created per-node during onboarding and deleted again after. The hard-coded IP addresses are only used before the node is joined to the cluster, so technically the rule becomes obsoleted by the generic "remote-node" one immediately after joining the cluster.[3]
Iād be interested to read about how you might configure cluster auto scaling with bare metal machines. I noticed that the IP address of each node are kinda hard-coded into firewall and network policy rules, so that would have to be automated somehow. Similarly with automatically spawning a load-balancer from declaring a k8s Service. I realise these things are very cloud provider specific but would be interested to see if any folks are doing this with bare metal. For me, the ease of autoscaling is one of the primary benefits of k8s for my specific workload.
I also just read about Sidero Omni [1] from the makers of Talos which looks like a Saas to install Talos/Kubernetes across any kind of hardware sourced from pretty much any provider ā cloud VM, bare metal etc. Perhaps it could make the initial bootstrap phase and future upgrades to these parts a little easier?
[1]: https://www.siderolabs.com/platform/saas-for-kubernetes/