Talos: Secure, immutable, and minimal Linux OS for running Kubernetes

amluto · 2024-07-14T03:26:05.000000Z

I considered deploying Talos a few weeks ago, and I ran into this:

https://github.com/siderolabs/talos/issues/8367

Unless I’ve missed something, this isn’t a big deal in an AWS-style cloud where extra storage volumes (EBS, etc) have essentially no incremental cost, and maybe it’s okay on bare metal if the bare metal is explicitly designed with a completely separate boot disk (this includes Raspberry Pi using SD for boot and some other device for actual storage), but it seemed like a mostly showstopping issue for an average server that was specced with the intent to boot off a partition.

I suppose one could fudge it with NVMe namespaces if the hardware cooperates. (I’ve never personally tried setting up a nontrivial namespace setup.)

Has anyone set up Talos in a useful way on a server with a single disk or a single RAID array?

rstat1 · 2024-07-14T05:14:12.000000Z

This is my main issue with it right now, I'm wasting a whole physical disk in one of my home lab machines because Talos demands control over a full disk for its uses.

This is one of the more annoying problems with any "immutable" distro (or at least the ones I've tried). They demand a very specific partition layout that basically forces you to devote a whole disk to them, which as you said in a cloud environment (or any other environment where VMs are used in place of physical machines) doesn't matter, but ends up mattering a lot when using physical machines.

sroussey · 2024-07-14T05:18:47.000000Z

I’m curious on your use case. I’ve always had boot disk or boot disk array separate from any data array.

Are you referring to things like a home directory?

amluto · 2024-07-14T05:43:02.000000Z

I’m referring to a server without a dedicated boot disk. This is not especially rare.

On an edge-style machine, you generally have a very limited number of M.2 slots, and dedicating one to a boot device uses up potentially 100% of your potential storage capacity, not to mention that an extra drive would be a non negligible fraction of total cost.

On modern servers, there may not be a usable SATA controller, and NVMe usually costs 4 PCIe lanes. Those aren’t free. You also end up paying the absurd OEM premium for an extra disk if you go with a big name OEM. Sadly, there is no industry standard cheap, reliable boot device standard, at least as far as I’ve ever seen. Maybe someone should push USB3 for this use case — the price is certainly right, and performance is likely just fine.

vetinari · 2024-07-14T11:49:01.000000Z

Server machines usually came with an SD-card slot on the board (and with ability to boot from it). This was often used for ESXi-like environments, which also required dedicated device for themselves.

There are also SATA-DOMs or USB-DOMs. You could use one of these modules with your machine.

taskforcegemini · 2024-07-14T09:46:07.000000Z

is there a reason you can't just boot from usb? it sounds like the perfect match. I've built immutable bootsticks in the past that run in ram only (but with other distros). hetzner for instance lets you rent usb-sticks for their dedicated servers

amluto · 2024-07-14T10:04:42.000000Z

FWIW, it’s been a little while since I’ve messed with the low-level details of BIOS/UEFI to kernel+united to USB root disk handoff, but I’m always slightly concerned that the system will mess up and hand off to the wrong device. For a system that uses USB for anything else, this opens up an attack/screwup vector in which the wrong disk gets used, leading to all manner of problems.

Also, an external dangly thing is asking for trouble (getting dislodged). An internal device solves this.

In any case, the Talos people seem to recognize this as a problem and are working on it.

fragmede · 2024-07-15T06:57:48.000000Z

fwiw an internal usb slot isn't unheard of on server-class motherboards.

lknuth · 2024-07-14T08:12:45.000000Z

I've leaned into it: all application state is on ephemeral storage and its constantly replicated "off-site" to a NAS running minio.

To accomplish this, I have restricted myself to SQLite as storage and use Litestream for replication. On start, Litestream reconstructs the last known state before the application starts. [Source](https://github.com/LukasKnuth/homeserver/blob/912cbc0111e44d...)

It works very well for my workloads (user interaction driven web apps) but there are theoretical situations in which data loss can occurr.

amluto · 2024-07-14T08:53:07.000000Z

You need to store data somewhere. If you’re willing to have that location be off-site and to pay for remote storage and possibly egress, and you’re okay with your local installation not coming up if the off-site storage is inaccessible and with the latency involved in bringing your on-site data back, fine.

Of course, if you really lean in to a complete lack of local persistent state and you configure your network or some other critical service like this, good luck recovering from an upgrade and complete loss of ephemeral state.

lknuth · 2024-07-14T10:32:57.000000Z

I'm not sure I understand your second point. Yes, if the external replication target is unavailable, I can't bring my local service back online. Same goes for if the replication target becomes unavailable and I don't realize it, there is potential for a lot of data to be lost if the application restarts.

For my personal usecase this is fine. I also have monitoring setup to look for just this case. It's a tradeoff between resilience and simplicity that works for some use cases - mine included.

amluto · 2024-07-14T11:11:51.000000Z

My second point is that, in a setup of any complexity, a black start is nontrivial. You have a network, with routes, DNS config, maybe VLANs, maybe a whole SDN. If that’s down because the machines running it are trying to pull their own configuration over the network, it won’t come back up. You can get pretty far into the weeds with situations like this.

Facebook supposedly got locked out of their own datacenter due to a network outage preventing the access control system from accessing whatever service it needed to allow anyone to open the door.

lknuth · 2024-07-15T06:37:58.000000Z

I see what you mean. For my case, the network is much simpler. I'm also fine if an unavailable replication target means I can't start an application.

The upside of my solution is that there is no scheduling requirement on which node the PVC was initially created. There is also a certain guarantee that I have a working, recent backup of the application data. Starting from scratch every time is also basically a backup recovery operation. It gives me confidence that there is a recent backup which is restorable.

INTPenis · 2024-07-14T08:14:23.000000Z

This is worrying because I am just in the process of migrating old CentOS clusters to Talos and we need one additional disk other than the system disk. It's used for the host based ceph cluster.

But if I read this correctly there shouldn't be an issue adding one blank disk to the Talos VM, the issue is only more granular disk and partitions management.

splix · 2024-07-14T12:41:33.000000Z

I'd suggest to run it on top a virtualization environment, like Proxmox. That solves a lot of problems, and not just related to Talos. Basically, you split k8s and the resource management, networking, disks, etc, as well as getting backup, migration, etc.

vbezhenar · 2024-07-14T06:58:04.000000Z

Just use PXE? Why use disk at all?

robcohen · 2024-07-14T09:01:39.000000Z

PXE seems like kind of a nightmare to manage. Seems better to just use Cloudinit or Ignition with proper Certificate management

fmajid · 2024-07-14T10:51:51.000000Z

With NetBoot.xyz, it’s just a DHCP server setting, setting up a TFTP server and very easy to set up. Much easier than dealing with a bunch of USB boot drives and keeping them up to date.

vetinari · 2024-07-14T20:14:46.000000Z

Machines with UEFI support http boot since around ~2017. Then you can forget special vulnerable server (tftp) and just use plain webserver, together with dns (and have it in different subnet, if necessary).

JustinGarrison · 2024-07-14T16:15:56.000000Z

Thanks for the interest in Talos Linux! I work at Sidero (creators of Talos) and there are lots of “secure, immutable, and minimal” Linux distos out there.

Something that Talos does differently is everything is an API. Machine configuration, upgrades, debugging…it’s all APIs. This helps with maintaining systems way beyond the usual cloud-init and systemd wrappers in other “minimal” distros.

The second big change is Talos Linux is only designed for Kubernetes. It’s not a generic Linux kernel+container runtime. The init system was designed to run the kubelet and publish an API that feels like a Kubernetes native component.

This drastically reduces the Linux knowledge required to run, scale, and maintain a complex system like Kubernetes.

I’ve been doing a set of live streams called Talos Linux install fest walking new users through setting up their first cluster on Talos. Each install is in a new environment so please check it out.

https://www.youtube.com/siderolabs/streams

Veraticus · 2024-07-14T02:04:41.000000Z

We use Talos really extensively in production. It’s been an amazing solution for our Kubernetes clusters. Highly recommended for a really smart, really directed Linux distro.

PhilipJFry · 2024-07-13T23:51:42.000000Z

Previously on Hacker News:

https://news.ycombinator.com/item?id=37846995

https://news.ycombinator.com/item?id=31798362

https://news.ycombinator.com/item?id=31486815

philips · 2024-07-13T22:53:23.000000Z

This team has a pretty active YouTube channel that is worth checking out too.

https://youtube.com/@siderolabs

ksec · 2024-07-14T03:27:55.000000Z

Around 90MB when downloaded compress. Not Sure am I the only one who is interested in its size when I see the word minimal.

JustinGarrison · 2024-07-14T16:00:30.000000Z

90MB is the size of the kernel, our init binary (written in go), the 12 binaries we include by default, and various required config files and drivers https://www.siderolabs.com/blog/there-are-only-12-binaries-i...

gotbeans · 2024-07-14T08:14:58.000000Z

It probably accounts for the k8s release unless they download it post-boot?

JustinGarrison · 2024-07-14T15:58:09.000000Z

K8s binaries are downloaded separately. Specifically they’re downloaded and run inside containers on the host.

splix · 2024-07-14T12:57:45.000000Z

We've been using it for a while, and I'm absolutely happy with the project.

Before that, we had a Kubespray based setup. It's a bunch of Ansible script and it allows to make any custom setup, like absolutely anything as you in control of the machines. But the other side of this is that it's extremely easy to break everything. Which we did a couple of times. And so any upgrade is a risk of loosing the whole cluster, so we decided it must be run in VM with full backup before each upgrade. Another problem that it takes about an hour to apply a change, because Ansible has to apply all the scripts each time.

Then we migrated to Talos, and it's a day and night. The initial setup took like an hour, including reading the docs and a tutorial. Easy to setup, easy to maintain, easy to upgrade (and it takes minutes). Note that we run the nodes as VMs in Proxmox, so the disk and network setup are outside of Talos scope, as well as backups, and it's actually simplifies everything. So it "just works" and we can focus on your app not the cluster setup.

JustinGarrison · 2024-07-14T15:57:17.000000Z

Thanks for the feedback and happy to hear you like Talos!

mrbluecoat · 2024-07-14T03:30:58.000000Z

A related insightful read: https://www.siderolabs.com/blog/there-are-only-12-binaries-i...

apexalpha · 2024-07-14T14:02:17.000000Z

I think a word is missing from the front page:

Talos improves security further by mounting the root filesystem as read-only and removing any host-level such as a shell and SSH.

After host-level, probably 'access'.

rompledorph · 2024-07-14T05:57:09.000000Z

The documentation seems to be lacking. I am specifically interested in gvisor and kata support, but cannot find information on installing additional runtimes

glitchcrab · 2024-07-14T09:05:49.000000Z

They are system extensions - https://www.talos.dev/v1.7/talos-guides/configuration/system...

And there is a repo of them here - https://github.com/siderolabs/extensions

Szpadel · 2024-07-14T10:47:27.000000Z

I'm surprised there is no support for cri-o

glitchcrab · 2024-07-14T11:59:56.000000Z

Same, I tried googling to see if there was any reason but I couldn't find anything helpful

JustinGarrison · 2024-07-14T16:03:12.000000Z

You can include them and download an image from our image builder factory.talos.dev

They’re even shareable images. Eg here’s the image with gvisor included https://factory.talos.dev/?arch=amd64&cmdline-set=true&exten...

russfink · 2024-07-14T03:11:50.000000Z

Can you install a shell? Does it work with MS Secure Boot? If not, what small distro might do that?

hadlock · 2024-07-14T05:16:59.000000Z

Depends on what you want to do with it. Alpine Linux is pretty popular for size optimized containers. I played around with Tiny Core Linux (TCL, used to just be tiny linux) back in the 5.0 days it is size optimized and supports desktop use in some forms.

vbezhenar · 2024-07-14T07:00:39.000000Z

It's signed with their own keys, so you can use it with Secure Boot, but you need to enroll their keys on the first boot. They did't sign it with MS keys.

dtx1 · 2024-07-14T02:08:03.000000Z

Is this available and usable on a raspberry pi?

JustinGarrison · 2024-07-14T16:21:58.000000Z

I’ll be doing a live stream setting it up on raspberry pi 4b tomorrow

https://www.youtube.com/live/HsY8D9aO84Y?si=VL5LPG_M9GwfM7d_

Talos doesn’t support older models (too slow) or the 5 yet (waiting for uboot support)

dtx1 · 2024-07-14T20:52:09.000000Z

nice, i'll be there.

sciencesama · 2024-07-14T02:46:57.000000Z

Yup !! If you want to save your sd card !!

dtx1 · 2024-07-14T03:01:04.000000Z

Save my sdcard from what?

zxexz · 2024-07-14T03:18:50.000000Z

I presume from the comparatively many more reads/writes that a standard, larger, and non-immutable distro would perform on average

Sparkyte · 2024-07-14T10:56:14.000000Z

I'm comfortable with using alpine, is this another one of the many container solutions for tiny containers?

JustinGarrison · 2024-07-14T16:04:05.000000Z

Talos is for the host installation where the kubelet runs, not as a base OS for the base container image

Sparkyte · 2024-07-14T20:11:53.000000Z

Oh I see what you mean! I think I'll keep with either centos or redhat due to the ability to receive updates on vulnerabilities then. FedRamp stuff does require CVEs to be handled in a timely fashion.

JustinGarrison · 2024-07-14T23:13:12.000000Z

Compliance and security are extremely important. Because of Talos’ single purpose nature and extreme small size it hasn’t needed patches for the recent “big” CVEs (xz utils, SSH, etc) because we don’t even have that software present.

When you get sick of patching let us know

JustinGarrison · 2024-07-14T23:15:49.000000Z

FWIW Talos has DoD users from multiple countries. The areas that need a lot of security have repeatedly chosen Talos when they compare it to traditional Linux distros.

breadwinner · 2024-07-14T01:59:57.000000Z

If you can't login to it then it is not good for development. If it is not good for development it is not good for production because ideally your dev and production environment should be the same.

birdiesanders · 2024-07-14T04:22:03.000000Z

You shouldn’t be working on the node OS for k8s dev work.

breadwinner · 2024-07-14T14:34:23.000000Z

I said ideally your dev and production environment should be the same. That should have been a hint I was talking about container.

breadwinner · 2024-07-14T14:32:18.000000Z

You assume I was talking about the node OS, but I was talking about the container.

cassianoleal · 2024-07-14T09:29:33.000000Z

I’ve been writing and managing development platforms on Kubernetes for 6-7 years.

In this time, I remember having to SSH into a host node exactly once. This was me, the platform engineer - not an application developer. Even then, having is a strong word. I could have just as well done with a privileged container with host access.

Application developers have nothing to do on the host. As in, they gain nothing from it, and could potentially make everything worse for themselves and the other applications and teams on the platform.

breadwinner · 2024-07-14T14:30:11.000000Z

I was talking about logging into the container, not the node. Is Talos an OS for the node or the container?

cassianoleal · 2024-07-14T18:26:45.000000Z

For the nodes. For the containers there's things like Google's distroless which are great for reducing attack surface as well.

breadwinner · 2024-07-14T20:14:26.000000Z

Their website only says it is "for Kubernetes", that could mean containers, nodes or both.

cassianoleal · 2024-07-14T20:58:15.000000Z

> Supports cloud platforms, bare metal, and virtualization platforms

> Production ready: supports some of the largest Kubernetes clusters in the world

> It only takes 3 minutes to launch a Talos cluster on your laptop inside Docker.

> delivers current stable Kubernetes

Whilst you're not wrong, and the website could be clearer, there are plenty of clues.

hadlock · 2024-07-14T05:18:49.000000Z

You should not be logging in to production unless something has gone seriously wrong. I've not seen a company where developers (minus a handful of "blessed" staff) even know how to access prod, let alone log in.

breadwinner · 2024-07-14T06:07:51.000000Z

During development you will need to login to the pod, to review settings, directory contents and so on. If the OS running in the pod does not allow you to do that - during development - then that's severely limiting.

vbezhenar · 2024-07-14T07:04:18.000000Z

Talos is OS running in the host. You can run in the pod whatever you need.

Also you shouldn't really use pod OS to debug it. Kubernetes supports debug containers: you launch a separate container (presumably with convenient debug environment) and mounts selected container rootfs inside, so you can inspect it as needed. It also helps, when the target container does not work and you can't just exec into it.

There's a recommendation to remove everything from the container that's not necessary for running a given program, that reduces attack surface.

gotbeans · 2024-07-14T08:21:03.000000Z

You can still exec yourself into the pods. No one said you cannot. There is no shell or ssh on the hosts for you to login to, but still if you absolutely must you can create a privileged container and mount /. Whole point is you shouldn't.

yjftsjthsd-h · 2024-07-14T05:26:30.000000Z

> If it is not good for development it is not good for production because ideally your dev and production environment should be the same.

Correct: Your dev environment should also not let you do stuff on the host machines. In an k8s environment, you run everything in pods. Don't compromise on security and operational concerns just because it's a dev environment.

> If you can't login to it then it is not good for development.

You develop inside pods, and you are more than welcome to install any shell and other programs you want inside containers. (Or for working at the k8s level it doesn't matter; you `kubectl apply` or run helm against the k8s API, it doesn't matter what's happening on the host.)

FooBarWidget · 2024-07-14T06:17:14.000000Z

Some customers of an internal Kubernetes platform complain that their pods keep getting evicted because their nodes keep running out of disk space. The platform's maintainers' first instinct was that customers' pods should not write to ephemeral storage, e.g., should not write to log files on the filesystem unless mounted from external storage. But that turns out not to be the case: customers' pods did not write anything to disk at all. So why were nodes running out of disk space? Prometheus metrics show which partitions use how much disk space but cannot go into more detail. The team wanted to inspect the node's filesystem to figure out what exactly is using so much disk space. The first thing they tried is to run a management pod that contains standard tools such as 'df' and that mounts the host's filesystem. Unfortunately the act of scheduling a pod on that node causes it to experience disk pressure, and so the management pod gets evicted.

So, being dogmatic about "the host should not have any tools installed" is good and all, but how do you debug this scenario without tools on the host?

We eventually figured it out. By logging into the host OS and using the shell tools there.

yjftsjthsd-h · 2024-07-15T04:03:13.000000Z

> So, being dogmatic about "the host should not have any tools installed" is good and all

Less dogma, more the lived experience that letting people log into hosts ends badly. Though I grant there's a cost/benefit both ways and perhaps there could be edge cases.

> but how do you debug this scenario without tools on the host?

Cordon the node, evict any one pod to free up just enough room, and then schedule your debug pod with a toleration so it ignores the error condition? I confess I've never had to do this but it seems workable.

sierra1011 · 2024-07-14T06:32:25.000000Z

Sounds a lot like the host OS is generic instead of container-focused, which could be part of the problem.

What was the cause/solution? Images too big?

FooBarWidget · 2024-07-14T16:43:12.000000Z

Actually, no, the host OS was Amazon BottleRocket, a specifically container-focused OS.

The cause was indeed images being too big. Images — not only the raw images, but also their extracted contents on the filesystem — count towards ephemeral storage too. In their case they can't even control the size of the images because those are supplied by a vendor.

The solution was to increase the node's disk space.

sierra1011 · 2024-07-14T18:34:34.000000Z

Interesting, I use Bottlerocket on my work clusters too. I think we had issues like this using some ridiculous data tool images that take up gigabytes, so we just upped the EBS size. Easily done.

breadwinner · 2024-07-14T06:08:47.000000Z

Is Talos for running inside pods, or running on the node. It is not immediately clear from the website.

sierra1011 · 2024-07-14T06:21:20.000000Z

"Talos: Secure, immutable, and minimal Linux OS for running Kubernetes"

I guess we'll never know...

But in truth it's for running on hosts.

antonvs · 2024-07-14T09:26:54.000000Z

It’s designed for running Kubernetes. You would log into the containers running on it if you need to, there’s no need to log into the underlying host. Managed Kubernetes clusters already work like this.

vbezhenar · 2024-07-14T07:06:40.000000Z

It's trivial to mount host root fs into container, so I don't see any issues "logging in" to it, as long as it works.

agilob · 2024-07-14T12:14:22.000000Z

We're long past the times when you would ssh into prod as root, run `nano /var/www/index.php` to fix a bug.

rstat1 · 2024-07-14T05:16:04.000000Z

k8s is k8s, it doesn't really matter much what the host OS is.