Userspace networking to dodge kernelspace ping pong RTT overhead? This looks like it has the potential to be awesome. At the very least, it's a great quick talk on how "kernelspace/userspace RTTs are not fast mkaaaay?" and it's exciting that some folks are trying to think outside the box on this.
So, questions: Am I correct in thinking this approach would be limited to advisory-only sorts of networking help, and cannot be used as a security boundary? It seems like anything running within a container's network namespace would still need a host interface exposed for its own virtualized packets, and obviously LD_PRELOAD hijinx remain freely ignorable by any program that decides not to go with the libc flow. Maybe it's possible to pick up the host interface, and then jump into the container's limited network namespace and thus be able to give no other options to the other contained processes?
Alternatively, I wonder if this approach could also be plugged in via seccomp filters, as well as LD_PRELOAD hooks? The docs at https://github.com/torvalds/linux/blob/5634347dee31373a8faf0... seem to suggest syscall capture and reroute might be capable of this. This would still be causing a kernel/userspace bounce, which we were trying to avoid, but it would still cut out the unnecessary trips through the host networking stack that we're going to ignore anyway... and more importantly, would actually be strong enough to be relied upon as a security constraint.
Good luck, ZeroTier folks! Looking forward to watching this continue to develop.
Depends on what kind of security boundary. You could only allow zerotier traffic, in which case the container only lives on the virtual net and no "real" traffic flows. You could also have the preloaded intercept library forbid any other traffic, isolating the process network wise. It's a total socket API takeover.
And yes there are other mechanisms than LD PRELOAD. It can also be explicitly linked into a binary if you do want to rebuild, or linked into libc, or linked into everything via the dynamic linker configuration files. The latter us how an entire container as opposed to a process can be placed into a virtual net.
This looks very cool indeed. I have a quick question unrelated to containers — is ZeroTier something you can run on servers to create a private, transparent cloud VPN?
For example, say I have a bunch of boxes on some cloud providers like Digital Ocean and Linode. I'd like for them to communicate securely — across data centers and providers — without having to set up SSL for _every_ individual app in the stack (Memcached, PostgreSQL, etc.). At the same time I'd like the boxes to talk to any open port among their peers, and not worry about having to configure iptables separately for every service the box is running. In effect, I want a private network layered on top of what the hosting provider has.
Is this what ZeroTier can do for me? If yes, are a lot of people using it this way? How's the performance? (I notice ZeroTier runs over UDP.) If not, what's the appropriate software?
The simplest solution of all, as far as I can tell, is to add a new virtual interface to the host (all hypervisors provide this functionality; or use macvlan on bare metal) and assign it to the container after obtaining layer 3 information about it (DHCP, static addressing, whatever). Then you don't have to worry about the complexity of overlay networks or NAT.
So what is, exactly, the current state of the art in container networking? To the best of my understanding all current solutions (including this one) create one big LAN where all containers can see each other, but I'm certainly no expert.
How would one go about creating a network of containers more in line with traditional physical networks, with virtual switches, routers etc. ?
The only good thing about OpenStack is that if you find yourself thinking it's the solution, you know with complete confidence that your problem lies elsewhere.
I find the way that SmartOS does it is pretty nice. Each container gets its own private, virtual NIC, which sits on top of one of the physical NIC or an overlay network, and has its own networking stack. You can even enable layer 2 or layer 3 spoofing protection if you're in a multi-tenant situation, or just want to be more secure.
The best way to think of the ZeroTier network is that it is a virtual switch. Each host on the network is connected to a port on the switch.
Since ZeroTier supports hosts connecting to multiple networks, you can setup a pretty sophisticated multi-tiered network architecture, no router needed.
one issue is that veth is slow - slower than VM networking.
in fact since VMs are now booting instantly there's argument to use docker images and boot them as VMs.
So, questions: Am I correct in thinking this approach would be limited to advisory-only sorts of networking help, and cannot be used as a security boundary? It seems like anything running within a container's network namespace would still need a host interface exposed for its own virtualized packets, and obviously LD_PRELOAD hijinx remain freely ignorable by any program that decides not to go with the libc flow. Maybe it's possible to pick up the host interface, and then jump into the container's limited network namespace and thus be able to give no other options to the other contained processes?
Alternatively, I wonder if this approach could also be plugged in via seccomp filters, as well as LD_PRELOAD hooks? The docs at https://github.com/torvalds/linux/blob/5634347dee31373a8faf0... seem to suggest syscall capture and reroute might be capable of this. This would still be causing a kernel/userspace bounce, which we were trying to avoid, but it would still cut out the unnecessary trips through the host networking stack that we're going to ignore anyway... and more importantly, would actually be strong enough to be relied upon as a security constraint.
Good luck, ZeroTier folks! Looking forward to watching this continue to develop.