Unless I’ve missed something, this isn’t a big deal in an AWS-style cloud where extra storage volumes (EBS, etc) have essentially no incremental cost, and maybe it’s okay on bare metal if the bare metal is explicitly designed with a completely separate boot disk (this includes Raspberry Pi using SD for boot and some other device for actual storage), but it seemed like a mostly showstopping issue for an average server that was specced with the intent to boot off a partition.
I suppose one could fudge it with NVMe namespaces if the hardware cooperates. (I’ve never personally tried setting up a nontrivial namespace setup.)
Has anyone set up Talos in a useful way on a server with a single disk or a single RAID array?
This is my main issue with it right now, I'm wasting a whole physical disk in one of my home lab machines because Talos demands control over a full disk for its uses.
This is one of the more annoying problems with any "immutable" distro (or at least the ones I've tried). They demand a very specific partition layout that basically forces you to devote a whole disk to them, which as you said in a cloud environment (or any other environment where VMs are used in place of physical machines) doesn't matter, but ends up mattering a lot when using physical machines.
I’m referring to a server without a dedicated boot disk. This is not especially rare.
On an edge-style machine, you generally have a very limited number of M.2 slots, and dedicating one to a boot device uses up potentially 100% of your potential storage capacity, not to mention that an extra drive would be a non negligible fraction of total cost.
On modern servers, there may not be a usable SATA controller, and NVMe usually costs 4 PCIe lanes. Those aren’t free. You also end up paying the absurd OEM premium for an extra disk if you go with a big name OEM. Sadly, there is no industry standard cheap, reliable boot device standard, at least as far as I’ve ever seen. Maybe someone should push USB3 for this use case — the price is certainly right, and performance is likely just fine.
Server machines usually came with an SD-card slot on the board (and with ability to boot from it). This was often used for ESXi-like environments, which also required dedicated device for themselves.
There are also SATA-DOMs or USB-DOMs. You could use one of these modules with your machine.
is there a reason you can't just boot from usb? it sounds like the perfect match. I've built immutable bootsticks in the past that run in ram only (but with other distros). hetzner for instance lets you rent usb-sticks for their dedicated servers
FWIW, it’s been a little while since I’ve messed with the low-level details of BIOS/UEFI to kernel+united to USB root disk handoff, but I’m always slightly concerned that the system will mess up and hand off to the wrong device. For a system that uses USB for anything else, this opens up an attack/screwup vector in which the wrong disk gets used, leading to all manner of problems.
Also, an external dangly thing is asking for trouble (getting dislodged). An internal device solves this.
In any case, the Talos people seem to recognize this as a problem and are working on it.
I've leaned into it: all application state is on ephemeral storage and its constantly replicated "off-site" to a NAS running minio.
To accomplish this, I have restricted myself to SQLite as storage and use Litestream for replication. On start, Litestream reconstructs the last known state before the application starts. [Source](https://github.com/LukasKnuth/homeserver/blob/912cbc0111e44d...)
It works very well for my workloads (user interaction driven web apps) but there are theoretical situations in which data loss can occurr.
You need to store data somewhere. If you’re willing to have that location be off-site and to pay for remote storage and possibly egress, and you’re okay with your local installation not coming up if the off-site storage is inaccessible and with the latency involved in bringing your on-site data back, fine.
Of course, if you really lean in to a complete lack of local persistent state and you configure your network or some other critical service like this, good luck recovering from an upgrade and complete loss of ephemeral state.
I'm not sure I understand your second point. Yes, if the external replication target is unavailable, I can't bring my local service back online. Same goes for if the replication target becomes unavailable and I don't realize it, there is potential for a lot of data to be lost if the application restarts.
For my personal usecase this is fine. I also have monitoring setup to look for just this case. It's a tradeoff between resilience and simplicity that works for some use cases - mine included.
My second point is that, in a setup of any complexity, a black start is nontrivial. You have a network, with routes, DNS config, maybe VLANs, maybe a whole SDN. If that’s down because the machines running it are trying to pull their own configuration over the network, it won’t come back up. You can get pretty far into the weeds with situations like this.
Facebook supposedly got locked out of their own datacenter due to a network outage preventing the access control system from accessing whatever service it needed to allow anyone to open the door.
I see what you mean. For my case, the network is much simpler. I'm also fine if an unavailable replication target means I can't start an application.
The upside of my solution is that there is no scheduling requirement on which node the PVC was initially created. There is also a certain guarantee that I have a working, recent backup of the application data. Starting from scratch every time is also basically a backup recovery operation. It gives me confidence that there is a recent backup which is restorable.
This is worrying because I am just in the process of migrating old CentOS clusters to Talos and we need one additional disk other than the system disk. It's used for the host based ceph cluster.
But if I read this correctly there shouldn't be an issue adding one blank disk to the Talos VM, the issue is only more granular disk and partitions management.
I'd suggest to run it on top a virtualization environment, like Proxmox. That solves a lot of problems, and not just related to Talos. Basically, you split k8s and the resource management, networking, disks, etc, as well as getting backup, migration, etc.
With NetBoot.xyz, it’s just a DHCP server setting, setting up a TFTP server and very easy to set up. Much easier than dealing with a bunch of USB boot drives and keeping them up to date.
Machines with UEFI support http boot since around ~2017. Then you can forget special vulnerable server (tftp) and just use plain webserver, together with dns (and have it in different subnet, if necessary).
Thanks for the interest in Talos Linux! I work at Sidero (creators of Talos) and there are lots of “secure, immutable, and minimal” Linux distos out there.
Something that Talos does differently is everything is an API. Machine configuration, upgrades, debugging…it’s all APIs. This helps with maintaining systems way beyond the usual cloud-init and systemd wrappers in other “minimal” distros.
The second big change is Talos Linux is only designed for Kubernetes. It’s not a generic Linux kernel+container runtime. The init system was designed to run the kubelet and publish an API that feels like a Kubernetes native component.
This drastically reduces the Linux knowledge required to run, scale, and maintain a complex system like Kubernetes.
I’ve been doing a set of live streams called Talos Linux install fest walking new users through setting up their first cluster on Talos. Each install is in a new environment so please check it out.
We use Talos really extensively in production. It’s been an amazing solution for our Kubernetes clusters. Highly recommended for a really smart, really directed Linux distro.
We've been using it for a while, and I'm absolutely happy with the project.
Before that, we had a Kubespray based setup. It's a bunch of Ansible script and it allows to make any custom setup, like absolutely anything as you in control of the machines. But the other side of this is that it's extremely easy to break everything. Which we did a couple of times. And so any upgrade is a risk of loosing the whole cluster, so we decided it must be run in VM with full backup before each upgrade. Another problem that it takes about an hour to apply a change, because Ansible has to apply all the scripts each time.
Then we migrated to Talos, and it's a day and night. The initial setup took like an hour, including reading the docs and a tutorial. Easy to setup, easy to maintain, easy to upgrade (and it takes minutes). Note that we run the nodes as VMs in Proxmox, so the disk and network setup are outside of Talos scope, as well as backups, and it's actually simplifies everything. So it "just works" and we can focus on your app not the cluster setup.
The documentation seems to be lacking. I am specifically interested in gvisor and kata support, but cannot find information on installing additional runtimes
Depends on what you want to do with it. Alpine Linux is pretty popular for size optimized containers. I played around with Tiny Core Linux (TCL, used to just be tiny linux) back in the 5.0 days it is size optimized and supports desktop use in some forms.
It's signed with their own keys, so you can use it with Secure Boot, but you need to enroll their keys on the first boot. They did't sign it with MS keys.
Oh I see what you mean! I think I'll keep with either centos or redhat due to the ability to receive updates on vulnerabilities then. FedRamp stuff does require CVEs to be handled in a timely fashion.
Compliance and security are extremely important. Because of Talos’ single purpose nature and extreme small size it hasn’t needed patches for the recent “big” CVEs (xz utils, SSH, etc) because we don’t even have that software present.
FWIW Talos has DoD users from multiple countries. The areas that need a lot of security have repeatedly chosen Talos when they compare it to traditional Linux distros.
If you can't login to it then it is not good for development. If it is not good for development it is not good for production because ideally your dev and production environment should be the same.
I’ve been writing and managing development platforms on Kubernetes for 6-7 years.
In this time, I remember having to SSH into a host node exactly once. This was me, the platform engineer - not an application developer. Even then, having is a strong word. I could have just as well done with a privileged container with host access.
Application developers have nothing to do on the host. As in, they gain nothing from it, and could potentially make everything worse for themselves and the other applications and teams on the platform.
You should not be logging in to production unless something has gone seriously wrong. I've not seen a company where developers (minus a handful of "blessed" staff) even know how to access prod, let alone log in.
During development you will need to login to the pod, to review settings, directory contents and so on. If the OS running in the pod does not allow you to do that - during development - then that's severely limiting.
Talos is OS running in the host. You can run in the pod whatever you need.
Also you shouldn't really use pod OS to debug it. Kubernetes supports debug containers: you launch a separate container (presumably with convenient debug environment) and mounts selected container rootfs inside, so you can inspect it as needed. It also helps, when the target container does not work and you can't just exec into it.
There's a recommendation to remove everything from the container that's not necessary for running a given program, that reduces attack surface.
You can still exec yourself into the pods. No one said you cannot.
There is no shell or ssh on the hosts for you to login to, but still if you absolutely must you can create a privileged container and mount /. Whole point is you shouldn't.
> If it is not good for development it is not good for production because ideally your dev and production environment should be the same.
Correct: Your dev environment should also not let you do stuff on the host machines. In an k8s environment, you run everything in pods. Don't compromise on security and operational concerns just because it's a dev environment.
>
If you can't login to it then it is not good for development.
You develop inside pods, and you are more than welcome to install any shell and other programs you want inside containers. (Or for working at the k8s level it doesn't matter; you `kubectl apply` or run helm against the k8s API, it doesn't matter what's happening on the host.)
Some customers of an internal Kubernetes platform complain that their pods keep getting evicted because their nodes keep running out of disk space. The platform's maintainers' first instinct was that customers' pods should not write to ephemeral storage, e.g., should not write to log files on the filesystem unless mounted from external storage. But that turns out not to be the case: customers' pods did not write anything to disk at all. So why were nodes running out of disk space? Prometheus metrics show which partitions use how much disk space but cannot go into more detail. The team wanted to inspect the node's filesystem to figure out what exactly is using so much disk space. The first thing they tried is to run a management pod that contains standard tools such as 'df' and that mounts the host's filesystem. Unfortunately the act of scheduling a pod on that node causes it to experience disk pressure, and so the management pod gets evicted.
So, being dogmatic about "the host should not have any tools installed" is good and all, but how do you debug this scenario without tools on the host?
We eventually figured it out. By logging into the host OS and using the shell tools there.
> So, being dogmatic about "the host should not have any tools installed" is good and all
Less dogma, more the lived experience that letting people log into hosts ends badly. Though I grant there's a cost/benefit both ways and perhaps there could be edge cases.
> but how do you debug this scenario without tools on the host?
Cordon the node, evict any one pod to free up just enough room, and then schedule your debug pod with a toleration so it ignores the error condition? I confess I've never had to do this but it seems workable.
Actually, no, the host OS was Amazon BottleRocket, a specifically container-focused OS.
The cause was indeed images being too big. Images — not only the raw images, but also their extracted contents on the filesystem — count towards ephemeral storage too. In their case they can't even control the size of the images because those are supplied by a vendor.
The solution was to increase the node's disk space.
Interesting, I use Bottlerocket on my work clusters too. I think we had issues like this using some ridiculous data tool images that take up gigabytes, so we just upped the EBS size. Easily done.
It’s designed for running Kubernetes. You would log into the containers running on it if you need to, there’s no need to log into the underlying host. Managed Kubernetes clusters already work like this.
https://github.com/siderolabs/talos/issues/8367
Unless I’ve missed something, this isn’t a big deal in an AWS-style cloud where extra storage volumes (EBS, etc) have essentially no incremental cost, and maybe it’s okay on bare metal if the bare metal is explicitly designed with a completely separate boot disk (this includes Raspberry Pi using SD for boot and some other device for actual storage), but it seemed like a mostly showstopping issue for an average server that was specced with the intent to boot off a partition.
I suppose one could fudge it with NVMe namespaces if the hardware cooperates. (I’ve never personally tried setting up a nontrivial namespace setup.)
Has anyone set up Talos in a useful way on a server with a single disk or a single RAID array?