Hacker News new | past | comments | ask | show | jobs | submit login
Minikube now supports rootless podman driver for running Kubernetes (github.com/kubernetes)
225 points by encryptluks2 on June 23, 2022 | hide | past | favorite | 81 comments



I getting lost in this orchestrator world. Can somebody explain the use case for Minikube? Or Microk8s? All claim to be certified as "perfectly like kubernetes" and show they can be used in production, but are people using them? Why?


Kubernetes is considered too complex to deploy for mere mortals.

Here's rough list of things that I had to do, because I'm doing right now exactly that: trying to deploy kubernetes.

1. Prepare server by installing and configuring containerd. Few simple steps.

2. Load one kernel module. Configure ip forward sysctl.

3. Install kubernetes binaries from apt repository.

4. Run kubeadm init

5. Install helm (this part is optional, but makes things simpler): download binary or install it from snap.

6. Install network plugin like flannel or calico.

7. Install nginx ingress plugin.

8. Install storage provider plugin. Also you should have some storage service like NFS server, so Kubernetes can ask for storage.

If you want single-node Kubernetes, I think that should be enough. May be #6 is not even necessary. If you want real cluster, you would need to tinker with load balancer for which I don't have clear picture right now. I'm using external load balancer.

If you're using docker, my understanding is that containerd is already installed.

I actually spent few weeks trying to understand those parts and I have only shallow understanding so far.

With that in mind, simple kubernetes solutions probably have its place among those who can't or don't want to use managed kubernetes from popular clouds.

I have no idea about those simple kuberneteses though.

My opinion is that vanilla kubernetes is not that hard and you should have some understanding about its moving parts anyway. But if you want easy path, I guess it's something worth considering.


Flannel and Calico are responsible for assigning pod IPs, so you need them even on a single node.

One main reason you'd want to run minikube or kind is also that these clusters are easy to reproduce and don't pollute your system's network namespace and sysctl.


For Load Balancer in your case you would probably provision MetalLB in place of the cloud specific LB solutions that cloud providers deploy. It’s somewhat straightforward, though the steps I believe are specific to each network provider (flannel, calico etc)


Maybe a bit too hacky, but if you only plan to use nginx-ingress + HTTPS (and don't have spare /24 IPs around), then you can set up nginx on each node, run a script that generates a nginx config every few minutes (use the stream module to forward port 80 and 443 TCP/UDP to the ingress nginx)

Then add the IP addresses of the nodes as a wildcard DNS.


Or you could just set up the ingress as a daemonset with a NodePort service that has externalTrafficPolicy set to local.


Docker’s containerd is different in many ways from the normal version


If you used docker compose and getting frustrated with the lack of some important features, using these lightweight kubernetes distributions are actually great. Blue/green deployment, a whole bunch of storage volumes supports, and load balancer with automatic letsencrypt supports, and great secret management (ability to mount secrets as files/directories inside a pod is a killer feature) are the reason I use kubernetes instead of docker compose for my side projects even though I ignore the rest of kubernetes features.


> All claim to be certified as "perfectly like kubernetes"

because they are. they are directly built from the go sources, all are wrappers around the meat of k8s. (which are the various control loops packaged into services like api-server, kubelet, schedulker, controller-manager, etc ... and etcd itself)

minikube does a big monolithic build for convenience. (it can do this because all involved components are in pure go.)

microk8s is also a distribution of k8s.

almost all distributions have convenience features to help you with installation/setup. but all they do is what "the hard way" setups do. fetch/copy binaries, generate/sync keys, setup storage (devicemapper, btrfs volumes, whatever), setup wrappers that then start the binaries with the long list of correct arguments, and set them up to start when the node starts (usually by adding systemd services or something).


So what are they missing that K8s has? I don't understand where I'd use one vs the other.


In a sense they are distributions. Ubuntu and Fedora can also both do it all, and it's no clear where you'd use one vs the other. You're in the hands of different people.


Oh hmm, I see, thanks!


Well also, minikube is meant to facilitate development and testing. We use minikube for local dev, tests, etc. and everything transfers over well to the production k8s cluster


um, they aren't missing anything (but see below). they are k8s, just as you rarely run the Linux kernel without userspace.

so if you want to get the genuine original mainline experience you go to the project's github repo, they have releases, and mention that the detailed changelog has links to the binaries. yeey. (https://github.com/kubernetes/kubernetes/blob/master/CHANGEL... .. the client is the kubectl binary, the server has the control plane components the node binaries have the worker node stuff), you then have the option to set those up according to the documentation (generate TLS certs, specify the IP address range for pods (containers), install dependencies like etcd, and a CNI compatible container network layer provider -- if you have setup overlay networking eg. VXLAN or geneve or something fancy with openvswitch's OVN -- then the reference CNI plugin is probably sufficient)

at the end of this process you'll have the REST API (kube-apiserver) up and running and you can start submitting jobs (that will be persisted into etcd, eventually picked up by the scheduler control loop that calculates what should run where and persists it back to etcd, then a control loop on a particular worker will notice that something new is assigned to it, and it'll do the thing, allocate a pod, call CNI to allocate IP, etc.)

of course if you don't want to do all this by hand you can use a distribution that helps you with setup.

microk8s is a low-memory low-IO k8s distro by Canonical (Ubuntu folks) and they run dqlite (distributed sqlite) instead of etcd (to lower I/O and memory requirements), many people don't like it because it uses snaps

k3s is started by Rancher folks (and mostly still developed by them?),

there's k0s (for bare metal ... I have no idea what that means though), kind (kubernetes in docker), there's also k3d (k3s in docker)

these distributions work by consuming/wrapping the k8s components as go libraries - https://github.com/kubernetes/kubernetes/blob/master/staging...

...

then there's the whole zoo of various k8s plugins/addons/tools for networking (CNI - https://github.com/containernetworking/cni#3rd-party-plugins), storage (CSI - https://kubernetes-csi.github.io/docs/drivers.html), helm for package management, a ton of security-related things that try to spot errors in all this circus ... and so on.


Worth mentioning there's a middle path, namely kubeadm. That's the "sanctioned" way to bootstrap clusters without going full from scratch and many other distributions actually use it internally.


Ahh that clarifies things a lot, thank you!


I don't see why you would use Minikube in production (nor have I ever heard anyone do this), but Minikube is exceptionally helpful for local development, when you want to test against a real Kubernetes API server (as well as test any of your desired orchestration for your component).


This is a great use case I've found as well. If you have a product that is deployed to K8s, the ability to create clusters on demand for testing, whether local or otherwise, is awesome.


Local development of k8s apps without having to deploy to a k8s cluster that you may or may not need to setup.


Also edge of the network deployments where you want consistency with datacenter deployments but don't have a lot of local compute resources or are otherwise limited.


I am adding another executor to a workflow engine. Minikube is a huge help for my dev work (I always test against a Local Real Instance, which is what minikube is). It's helped on more than one occasion to show that a prod k8s instance lacked a feature or was misconfigured.


I've only used Minikube, kind, and k0s as sandboxes for production kubernetes deployments in the cloud (i.e. EKS). Given I'm already using Docker Desktop on my mac laptop though, the easiest thing to do is just use its built-in kubernetes. It works pretty well, and obviates the need for any of these micro-kubernetes distros.


Recent comparison of Minikube, K3s and MicroK8s: https://goalz.online/pros-cons-of-minikube-k3s-microk8s-ligh...


Minikube is in a different category, alongside kind. These clusters are meant to be disposabale for development, and at least for kind, can't be updated easily.


I wish distros would stop making "docker" an alias for "podman", they are not the same thing and breaks all light-k8s implementations.(looking at you redhat)


I've encountered cases where podman CLI does not match Docker, specifically for network creation with IPv6--the commands are different. What are you experiencing?


CLI is not my issue, k3s and kind wont work with podman (or any rootless container for the matter) out of the box, in both you need to do some non-trivial cgroups configuration on the OS to make it work (in k3s this mode is experimental)


It's an optional package...just don't install it.


Please excuse my ignorance. I am aware of how Docker generally operates (at a noob level i.e. containerizing an application & uploading to container public registry for general availability).

Given that understanding, could someone please explain what "rootless" would mean? I want to understand these in simpler terms:)

(Thank you in advance)


Docker is running a daemon with root privileges to start all containers. So if your start a container with "docker run -d ...." you talk to a privileged process. That in turn means, all spawned containers can have root privileges (docker run -v /etc/shadow ... to change the root password of your host). "rootless" actually means running a container process as a normal user. (less attack surface because of less permissions). So if you would run "podman -v /etc/shadow" as a normal user, you wouldn't have the permissions needed to open the file.

As simple as possible: Docker ("normally"): run every command inside container with full root permissions on host $root-> Docker -> container Docker/Podman ("rootless"): run every command as the current user $user-> container

Maybe take a look here for a better explanation: https://docs.docker.com/engine/security/#docker-daemon-attac...


The other big piece is capabilities (specifically CAP_SYS_ADMIN) which as I understand it is related but kind of orthogonal to the question of root/rootless.

For example, buildah (the container-building part of podman) is daemonless and can use the fuse-overlayfs storage driver to build containers rootlessly— you appear as root inside the container, but from the outside, those processes and any files created are owned by the original invoking user or some shim UID/GID based on a mapping table.

But critically, this doesn't mean it's possible to just run buildah inside any Kubernetes pod and build a container there, because buildah needs to be able to start a user namespace, and must have the /dev/fuse device mapped in. I believe there continues to be ongoing work in this area (for example Linux 5.11 allows overlayfs in unprivileged containers), but the issue tracking [1] it is closed without really being IMO fully resolved, since the linked article [2] from July 2021 is still describing the different scenarios as distinct special cases that each require their own special sets of flags/settings/mounts/whatever.

[1]: https://github.com/containers/buildah/issues/2554

[2]: https://www.redhat.com/sysadmin/podman-inside-kubernetes


Yup, and based on that mapping table the process inside the container is not allowed to create another namespace and/or fuse-overlayfs. That's why you need to mount /dev/fuse into the container (you might also need cap_sys_admin and cap_mknod). There is another link from RedHat which also explains it:

https://www.redhat.com/sysadmin/podman-inside-container

You can run "capsh --print" to see your current capabilities. And to run a container without any capabilities:

podman run --cap-drop ALL -it fedora capsh --print


Typically, the way a normal Docker installation works is that dockerd (the Docker daemon) is an always-on background service running as root that exposes a socket file with group write privileges owned by the 'docker' group, allowing non-root users to send commands, effectively acting as a privilege-escalation mechanism. There were at least three reasons the daemon needed to run as root, which included needing to modify the host routing table to set up an overlay network, only root being able to create overlay filesystems, and at least some containers themselves having to run as root because they contained files that had to be manipulated in some way by uid 0 in the container.

podman in rootless mode gets around these by using slirp4netns to create pure-userspace overlay networks, fuse-overlayfs to create pure-userspace overlay filesystems (or a driver that can't deduplicate storage on older kernels), and uid/gid mapping in user namespaces to create the illusion inside of a container that an application is running as root when it isn't really root on the host.

Additionally, podman gets rid of the daemon and just uses normal fork/exec of the ephemeral podman process.

The upsides are:

- podman can run entirely in home directories and doesn't need to globally install config files or the container filesystems, making it easier for many users to share the same server.

- Running a malicious or compromised container won't compromise your host (big caveat here is unless it can exploit a vulnerability in user namespaces).

- Users who don't have root at all can still run containers. Note that while this appeared to be true using Docker because you could just be part of the 'docker' group to write to dockerd's socket, effectively this was giving you root.

The biggest downside is the userspace networks and filesystems are slow compared to their in-kernel counterparts, which is why you typically won't see it in any kind of production setting, but minikube is meant to be used as a small-scale mock of production kubernetes run by developers, so it can be a good fit there.

Note that rootless minikube was actually already possible, but way more convoluted than just using rootless podman as the container runtime.


I've seen netavark described as a much faster rootless networking stack. Do you know if that is the case? I know that Podman supports it. Does anything like that exist for storage?


Not an expert at all, but here's how I would simplify it. All corrections are welcome!

Docker has two main components. The daemon (you can think of it somewhat like a server) and the client (application you use to run commands).

When you install docker on your machine, it generally installs both. The daemon is a process that runs on your local machine and runs as root.

Rootless refers to the alternative method (used by podman for instance) to run the daemon as a standard user, and delegate root-level tasks to something else, like systemd for instance.


> Docker has two main components. The daemon (you can think of it somewhat like a server) and the client (application you use to run commands).

Is the daemon what they call the docker-engine? Is this what's available on Linux natively? Rootless makes sense here bc you wouldn't want one docker image able to interfere with another, or even the Linux system that is running the docker runtime/engine.

For Windows/Mac docker solutions, where does the daemon live/exist/run? Inside a virtualized Linux instance?

As I understand it, most of these alternatives to docker-desktop are all just wrappers around a virtualized Linux image running the docker engine/runtime. That's why many of them require a virtualization engine like Virtual Box. So are these no-commercial solutions just wrappers around one or more virtualized Linux runtimes where the docker engine/runtime is running natively?

If all the above is (approx) correct, then "what" is rootless with this announcement? The docker runtime/engine in the virtualized Linux instance?

I thought the docker engine/runtime on Linux was always able to run rootless docker images. So what is the news here if all these non-commercial solutions are just wrappers around the docker engine/runtime running in a virtualized Linux?


Yes for windows and Mac it runs a Linux VM. On windows it can also use WSL2 as the linux vm.

Docker-engine is the daemon built by docker. Podman is an opensource work a like. Docker-engine doesn't support running as a user other than root. Podman does. This announcement says minikube will work with Podman running as not root.


Thanks for clearing that up.

I remember hearing that development of docker-engine was ceasing, but could obviously live on as it was forked. I guess rootless is some of the work that Docker (company) wanted to keep proprietary and out of this open-source project.

Really quite a shame, although understandable from a commercial perspective.

Assuming that these improvements are finding their way back into an open-source project, I'm glad to hear about this work from minikube and Podman.


> I guess rootless is some of the work that Docker (company) wanted to keep proprietary and out of this open-source project.

Rootless mode for Docker is completely FLOSS, and its main contributor (me) has even never worked for Docker (company).

https://github.com/moby/moby/blob/master/contrib/dockerd-roo...

https://github.com/rootless-containers


> Docker-engine doesn't support running as a user other than root. Podman does.

Docker engine does.


It means it uses user namespaces to map a non root user in the top level user namespace (where eg init runs) to a root user inside the container. This allows the container process to run as root inside its user namespace, retaining the full set of capabilities required to call privileged syscalls or access files owned by root.


Docker does all its work in a central daemon running as route. Any docker command you run is just sending messages to that central daemon.

You can see some downsides to this when you do the classic developer setup system of having a docker image with your tools and mounting a volume of your source tree into the container for building. When you build, the build products in your filesystem are owned by root because the code was actually running under the daemon. This can cause all sorts of pain.

When you run something like podman, there's no daemon - it's all just processes running as your user (like any other script) so files created end up on your filesystem owned by you.


It means you don't need to be root to run it.


You can also call docker commands by being part of the docker group IIRC.

Doesn't this have more to do with the daemon that the user executing commands ?


> You can also call docker commands by being part of the docker group IIRC.

Which effectively gives you root on the host.


Which is an horrible practice and has roughly the same attack surface as login as root all the time.


With podman there is no daemon, everything is running as you. The standard setup for docker has a daemon running as root, which means when you start a container it has root privileges.


Are there security issues with user namespaces? For instance, Arch disables them in their hardened Linux kernel: https://wiki.archlinux.org/title/Linux_Containers#Unprivileg...



Just to make sure: "rootless" is really misleading. As far as I researched, podman either relies on suid binaries or privileged capabilities or both to do its magic. You might as well call it "capabilitiesful podman driver".


You do need an suid binary to e.g. set a new user id map, since this requires comparing the user id range owned by you to what you're mapping, but you only do it once and it's a simple, secure operation.


I don't think it is misleading. Just because you need root privileges to enable "rootless" doesn't mean it isn't rootless once configured.


It's somewhere in between. You definitely need to enable features that are normally out-of-reach of regular users (i.e. user namespaces, network namespace, unprivileged ping, etc.) However it's still a far cry from full root access, and arguably a smaller surface area than regular run-everything-as-root mode.


> podman either relies on suid binaries

I'm fairly sure this is only the case on older systems, if your system is up to date then podman should not rely on suid binaries.


Maybe you should include this into your "research":

- https://opensource.com/article/19/2/how-does-rootless-podman...

- https://github.com/containers/podman/blob/main/docs/tutorial...

TL;DR

cgroup V2 support

Installing Podman

Install slirp4netns

Ensure fuse-overlayfs is installed


you can run containers without root, suid bits or special capabilities with podman.

of course, withouth any of that your containers will be able to do very little (eg: no networking).


Slirp networking does not need any suid bit or special capability.


A ton of failed tests in that PR and yet it was merged anyway?


This is awesome, was looking around trying to figure out if this was available yet just last night... and here it is!


While rootless is a curious technical trick I don't understand why the implementation ever left someone's laptop, both file and networking performance are utterly abysmal, which is completely at odds with one of the primary benefits of containers (near zero overhead).


On servers, yes, rootless doesn't make much sense. But on on my dev laptop, "sudo docker" is tiring and adding docker to the sudoers group is a big security hole (why does everyone seem to think that "docker run" giving root privileges is ok ?!).


This indeed. The Docker team should not include the "adding your user to the docker group"-section in the install documentation. It is very unsafe and even though they link to a document on security implications I don't think all users will truly grasp the implications.

Better to hide this feature and promote the rootless docker mode for local use. On servers you won't be adding any unprivileged user to the docker group in any case.


You should add yourself to the docker group...


which has the same effect, the docker group effectively has root access.


sudo usermod -aG docker $USER


this is not safer.


this is the same as:

%wheel ALL=(ALL) NOPASSWD: ALL

effectively disabling sudo completely.


This is the first I've heard about serious performance overhead from going rootless. Do you have any links with more info about it?

I haven't encountered any issues like this personally with rootless podman (although I'm not doing any large scale deployments).


What causes the file/networking performance degradation when running unprivileged containers ?


The filesystem performance degradation was resolved in kernel 5.11 which added support for rootless overlayfs.

The network performance is caused by slirp (usermode TCP/IP) but it is being resolved too : https://github.com/rootless-containers/bypass4netns


overlay2 or fuse-overlayfs?


Tangential, but are there any easy ways to run server applications on bare metal in a way that removes the need for an underlying OS in order to decreases the overall attack surface an attacker can look for exploits in? (Mainly talking about applications written in Go(TinyGo), Rust, and C++ that can be easily compiled to run on bare metal)


Unikernel is what you're interested in, but it's not as easy as taking some Linux-based server software and spitting out a bootable image for baremetal. If you strip the kernel and OS out you lose the network stack and all kinds of system services that most software depends on directly.

I think Google's distroless container images are worth checking out as a quasi-alternative: https://github.com/GoogleContainerTools/distroless You use them as a base for a docker image and copy in your server code. These images are tailor made to strip out _everything_ that's not necessary to run the software--there's no shell for example. So you're still running a Linux kernel, libc, etc. but there's nothing there for an attacker to use other than your app code. You yourself can't even get into a shell to debug or examine what the state of your app is (which can actually be kind of aggravating in development).


"Distroless" containers are pretty cool for making deployment images. I feel like a better name could have been chosen, because ultimately you are relying on a distribution and how they operate unless you're building an image from scratch and copying in your self-compiled dependencies.

I build my own distroless-like images for personal use using Fedora and RHEL, though I do follow the ubi-micro[0] build steps and include a tiny bit of user space components to enable debugging.

[0] https://catalog.redhat.com/software/containers/ubi9-micro/61...


As an alternative to unikernels, that the other replies are talking about, which require special builds and might not work the same, you can also do something pretty simple:

Just run your program as the only process.

As a Linux host with no other software. No /bin/sh, nothing else in the filesystem.

Simple demo: https://github.com/tv42/alone


From what I gather a unikernel is what you are searching for. Many exists - https://github.com/unikraft/unikraft - https://github.com/hermitcore/rusty-hermit are the one that comes to my quick search.


IncludeOS was one such approach. Sadly the company behind it perished and it seems unmaintained.

https://includeos.org/


Another one: https://mirage.io/


Does this mean it’ll be faster?


No I don't think so and in fact rootless containers can be slower due to user-level networking and overlay storage, but the goal is more isolation and security.


Please please please let rke follow suit :)




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: