"systemd-nspawn is like the chroot command, but it is a chroot on steroids.
systemd-nspawn may be used to run a command or OS in a light-weight namespace container. It is more powerful than chroot since it fully virtualizes the file system hierarchy, as well as the process tree, the various IPC subsystems and the host and domain name. "
I actually disagree that this is the reason for the aversion. Instead I think it comes down to a couple of things:
1. Containers are commodity. For commodities, price wins first, then marketing. Docker is equally free (as in beer) and is far better marketed
2. There are a very low number of people working on systemd-nspawn and a very high number of people working on docker (and the ecosystem).
3. Dockerfiles and images are ubiquitous. They're easy to support from other run times, but if you're already in the ecosystem, what's the incentive to change?
There is also the fact that some people have a knee jerk reaction to hate anything associated with systemd. Although, I think that is less significant than the two reasons you listed.
For those of us not into this space, does this replace docker or kubernetes or just some piece of the puzzle that could be bolted into both to replace some component?
Take a car, the engine is the OCI runtime, crun, containerd, etc.. The whole car is Docker, or Podman. Kubernetes is the city where the cars run.
---
EDIT: For more details, OCI (Open Container Initiative) defines a few things:
- an image format (how to create them)
- a runtime specification (how to run them)
- ...
A container runtime is an implementation of one of those things.
Docker is a set of tools hidden under a unified CLI which gives you the ability to:
- create images
- upload/download images to/from a registry
- run images with volumes, networking configuration, environment variables, ...
But Docker is only one host.
Kubernetes is an interface to abstract a cluster of hosts. Everything is described as a "resource" which goes through the "control loop":
1. the user uses the k8s REST API to create/read/update/delete the resource
2. the k8s api server will contact admission controllers (via webhook) to authorize (and/or mutate) the action
3. the action is persisted to a distributed database (usually, etcd)
4. then, controllers are notified of the change and will run the side effects
This is the simplified version. But what is stored in the distributed database is called the "desired" state, and controllers have the duty of observing the real state (the "observed" state) and make it converge towards the "desired" state.
So a "Pod" controller's job will be to observe Docker instances, to check what containers are running, and start/stop the containers based on what "Pod" resources exists in k8s's database.
A "Deployment" controller's job will be to observe the "Pod" resources in the k8s database and create/update/delete them based on what "Deployment" resources exists in k8s's database.
etc...
In theory, Kubernetes does not need docker. You could have a "proxmox" controller which would start/stop virtual machines instead.
Kubernetes provides a lot of tooling for storage management, secret management, networking, workload management, etc... so that you can manage it all with a unified REST API.
The very nature of the "control loop" makes it very extensible, allowing you to build layers of abstraction on top of layers of abstraction on top of layers of abstraction ... A real "onion cloud" if I dare say it.
kubernetes can use anything that conforms to the CRI interface, which in practice is either CRI-O (RedHat) or Containerd (Docker, Inc.). Podman and Docker are also consumers of both of those engines
It replaces `runc` which is used by most non-docker container runtimes to actually start the container. Thus the punny name.
When using kubernetes, the hierarchy is as follows:
1. kubernetes master tells kubelet what to do (sort of, not important here)
2. kubelet uses CRI-compatible runtime to start containers
3. containerd or CRI-O handle management of containers and start them using runc or crun
4. runc/crun are the applications that setup the final environment of application to run in container, using resources (mounts, devices, etc) provided to them by upper layers. They also handle things like sending stdout/stderr to logs, or setting up a pseudoterminal to talk to a program in container, etc.
I can’t go into too much detail but: IME the benefits are somewhat doubtful and it can run into weird issues that are hard to debug (because C) at scale. I wouldn’t use it unless I had a maintainer or a C expert on my team. The other OCI runtimes are written in go and are (generally) easier to debug.
Why is Go easier to debug? It kinda sounds like you're more familiar with one language than another and are basing your assessment of the tool on that.
In C, this surprisingly changes the value of x. (Well, it's undefined, so it could do anything!) In Go, the program crashes with the error that index 124 is out of bounds.
C is the absolute best programming languages for programmers who don't make mistakes. I'm not one of them, and I've never met one of them. If it works for you, that's impressive!
You know sanitizers and static analysis tools exist, and have existed for decades, and have been the basis for the work done in Rust and other "safe" languages?
Also, a disciplined C programmer will always keep the size of the buffer near the buffer itself.
For example, using something like this is perfectly safe (though, we can see the performance hit of those if-statements if used intensively):
This comes up from time-to-time. Surely, there's some caveat here either for performance or other reasons (I'm not a solid C programmer enough to know the if this hypothesis is viable or not). If it was so simple this approach would be ubiquitous and C would be safe. What am I missing?
> If it was so simple this approach would be ubiquitous and C would be safe.
There are some unavoidable footguns (notably around unintended integer promotions and overflow), but for four decades life-critical machinery like rockets, munitions, airplanes, heavy industrial equipment, automotive control systems ... and more have been controlled by C code, and the number of lives lost due to bugs you are complaining about are statistical noise.
It's been used extensively in products that could never be patched or updated after release, and could only be recalled, and yet I recall only a few instances when bugs lead to lives lost, and in at least one of those cases the culprit was identified to be something other than the language (i.e. those same errors or worse would have resulted even if a different language was used due to the dev process and architecture).
These bugs are not even a rounding error! So it would seem that writing safe C is ubiquitous. You're seeing the statistical noise and concluding it is representative of all software written in C, when you should be looking at all that noise and saying "is this all there is?"
A chainsaw is not safe. But you can learn to use it safely.
Replace chainsaw with:
- knife
- guns
- C programming language
There are tools and methods (and a lot of discipline) to ensure safety in C:
- compiler's sanitizers[0]:
- address: to detect out-of-bounds and use-after-free bugs
- pointer-compare, pointer-subtract: to detect invalid operation when pointers are non null
- shadow-call-stack: to detect return address overwrites (stack buffer overflows)
- thread: to detect data races
- leak: to detect memory leaks
- undefined: to detect undefined behaviors
- ...
- static analysis tools, like Splint[1]
Would you ride a motorbike without the proper protections (helmet, heavy jacket, ...) ?
Unless unsafe package is used which is rare or there’s a bug in the compiler which is rarer still you will not get silent memory corruption bugs which are stupid common in C and hard to debug to boot
So your whole argument isn’t that crun itself is unstable but that C can be hard for non-experts to write programs in. That could be said of tons of stuff including the OS kernel you’re using to run containers, so I’d be curious to hear what you’re using to replace Linux in your work.
The principal issue with Go applications in my work continues to be the massive size of the executables.
I can't speak for crun specifically since I haven't used it personally but almost every single C application I've had to deal with had these hard to debug memory corruption bugs which required an expert (me) days/weeks to fix. And that included the Linux kernel itself btw, the difference being it was almost always fixed in the upstream by the time I found it.
> The principal issue with Go applications in my work continues to be the massive size of the executables.
That is certainly true in general case, however out of curiosity I went to see the size of containerd runtimes on my machine and:
Some systems I’ve worked in have entire root filesystems that are merely twice the size of that one binary. And in those cases sending an extra 40 megabytes over the network connection is a big imposition. So there are still places where executable size matters and that’s why we’d want crun and not runc. If someone wrote an alternative in Rust I’d be interested but golang is just too piggy.
Sure but I’m having a hard time imagining a scenario where a full blown OCI (i assume youll run it with kubelet/nomad) is required on such a constrained hardware and simple nspawn or systemd container won’t do
Technically if you use jemalloc, which most everyone should do anyway, it comes with built-in instrumentation but you need to enable it compile time and generally not many are aware of this.
I think tcmalloc will output protos too now which google pprof tool understands. If you’re using standard glibc malloc you’re probably leaving a lot of performance on the table
I think what the crun author is positing is the container runtime is closer to the kernal (cgroups) than it is the orchestrator (Kubernetes). I tend to agree.
Of course kernel / cgroups / container runtime need to be rock solid. There should be no need to debug these in most use cases.
I'd be very cautious about running such a core piece of infrastructure in an insecure language. Especially one without the level of scrutiny other projects like linux have.
https://wiki.archlinux.org/title/systemd-nspawn
"systemd-nspawn is like the chroot command, but it is a chroot on steroids.
systemd-nspawn may be used to run a command or OS in a light-weight namespace container. It is more powerful than chroot since it fully virtualizes the file system hierarchy, as well as the process tree, the various IPC subsystems and the host and domain name. "