LXC 1.0 Released

nl · on Feb 23, 2014

Can someone talk about LXC security in practice?

I've read[1], and it is excellent to have options, but there I'd like guidance for what is actually usable and useful, especially via Docker.

Background: I want to be able to execute arbitrary code submitted from the web. At the moment I'm sandboxing it inside a ZeroVM[2] container, inside a Docker (LXC) container.

I'd like to lock down the Docker container more than the default (eg, disable outgoing network connections), but I don't really know where to start.

(And yes, I do find it strange that it was easier to get an entirely new, experimental and undocumented container running inside LXC than it was to work out how to configure LXC. But layers are good, right?)

[1] https://www.stgraber.org/2014/01/01/lxc-1-0-security-feature...

[2] http://zerovm.org/

justincormack · on Feb 23, 2014

VeroZM is a perfectly correct answer, that is exactly what it is designed for. The security model is excellent, far better guarantees than anything you will get from containers.

If you want another answer that works, well what kind of code is it? That does make some difference. I am assuming it is some scripting language (as I presume you are not compiling it for zerovm). And I assume it is not one with a trustable sandboxing model (I only really trust Lua, and that is with caveats).

You can lock down networking by not having any (assume you use eg a pipe to communicate), or with iptables, or by using seccomp mode 2 filter system calls. The third is the most general extra filtering, but you do need to know exactly what your container needs to do - the more minimal it is the better.

nl · on Feb 23, 2014

It's Python on ZeroVM.

I'm interested in the seccomp option, but it looks pretty intimidating to setup. I suspect I could do much of what I need with AppArmor, but I have no idea how to apply that to a single Docker container.

Edit: I also tried the MBox sandbox, but it doesn't work on Ubuntu.

viraptor · on Feb 23, 2014

You can always use Tomoyo instead of AppArmor (also available in ubuntu). It allows you to configure specific rules for each domain, where a domain can be "this app" just as well as "this app started by that script", so you can differentiate between them hopefully.

nl · on Feb 23, 2014

Tomoyo looks interesting (although it is yet-another-thing-doing-almost-the-same-thing).

justincormack · on Feb 23, 2014

seccomp is pretty intimidating, and the lxc config makes it even harder (numbers not names for the syscalls!). Conceptually it is fairly simple, and it is quite fun to play with, but you really need tests with very good code coverage (including error handling) to know which syscalls you need, and it will vary if you change any software potentially. There are audit tools though, you could give it a go.

Apparmor has a rather simple "deny network" rule, which might be a good starting point... but I haven't spent much time with it and not sure how to apply it to one container either. Maybe apply it to the python in the container not the container itself? Might be easier.

mjn · on Feb 23, 2014

This is one place I think Solaris (and Illumos) is still ahead. Not only do they have a full privilege system, but there is an easy to use CLI tool, ppriv(1), to control privileges on a per-process basis. You can start a process but drop its network privileges, or its file-write privileges (with some files possibly whitelisted), or its ability to spawn other processes, etc.. There's also a "privilege debug" mode so if the process crashes as a result, you can figure out what prohibited stuff it was trying to do. That allows an approach of just dropping all privileges to start, and then whitelisting a few things it needs.

FreeBSD's 'capsicum' and Linux's 'seccomp' look like they can conceptually do the same thing, but afaict there isn't yet a good command-line interface to them that lets you drop privileges of unmodified binaries.

justincormack · on Feb 23, 2014

Capsicum is much more intuitive so I think it would be a lot less work to set up without tooling.

mjn · on Feb 23, 2014

For a simple case I could write a C wrapper that just drops privileges, but it'd be nice to have a more versatile CLI tool. Doing that in the general case, e.g. letting me specify options like "no network, no writing files except A and B, no reading files except files in this directory, no spawning processes", requires more or less porting something like ppriv(1) and its privilege-specification syntax to FreeBSD, or writing a workalike.

nl · on Feb 23, 2014

you really need tests with very good code coverage (including error handling) to know which syscalls you need, and it will vary if you change any software potentially

This is where I run into trouble - given that I want to sandbox arbitrary code in theory I should be able to define what I want to allow, and then set it up. But the practice seems.. esoteric.

you could give it a go.

I seem to end up doing that a lot, for every single thing I try in this area.

Maybe apply it to the python in the container not the container itself?

That would actually be Python-in-ZeroVM. But yeah, that's an interesting idea.

The other thing is that ZeroVM currently has no networking available via Python. So there is some protection there, too.

tinco · on Feb 23, 2014

ZeroVM is very interesting, but my intuition would be to put the Docker container inside the ZeroVM, instead of the ZeroVM inside the Docker container. Is my intuition weird? Perhaps I misunderstand ZeroVM.

My train of thought would be that you'd put the strongest isolation on the outside.

justincormack · on Feb 23, 2014

You can't do that. Zerovm does not give you access to the system calls to create containers.

shykes · on Feb 23, 2014

Docker is evolving towards a generic "container engine" with swappable execution backends. In that architecture, lxc becomes one possible backend. ZeroVM, OpenVZ, kvm/qemu or plain chroot could be used as backends under the same management API.

nl · on Feb 23, 2014

In the case of ZeroVM it's hardly going to be a transparent change. You'll have to recompile all your code for one thing, and for another ZeroVM doesn't (currently) supply anything like the APIs a "normal" VM does.

jimmcslim · on Feb 23, 2014

Interesting, but at that point doesn't Docker just become libvirt almost?

cjbprime · on Feb 23, 2014

My understanding is that recent (13.10+) Ubuntu ships with an AppArmor profile that blocks known ways of escalating out of the container. So, LXC under Ubuntu is in better shape than LXC on other distros, where you can use the uevent_helper trick[1] to easily escape the container.

It sounds like your use case could also use the new unprivileged containers[2] feature, which is going to be a dramatic increase in security (since even if you somehow get access to run a command on the host container, it would not be as root).

[1]: http://blog.bofh.it/debian/id_413

[2]: https://www.stgraber.org/2014/01/17/lxc-1-0-unprivileged-con...

willvarfar · on Feb 23, 2014

Defence is best in depth.

Put all your login etc on a separate box. Have that box send code to the dedicated runner box to be run.

Boot your runner's host OS from CD and reboot each midnight too.

Plus the zerovm in docker and everything you already have. And look at firewalling and rescinding capabilities to further isolate the sandbox.

persepolist123 · on Feb 23, 2014

This is a nice blog post about isolating your container network stack using LXC:

http://l3net.wordpress.com/2013/08/25/debian-virtualization-...

zobzu · on Feb 23, 2014

layers of containers wont give you much. the main issue is the shared kernel, any bug there, any shared resource from there, and all containers have the same risk, no matter how many levels of nesting you have.

layers of vm's, lxc containers, zerovms, etc, will probably make the attacks harder, but also the management, and speed, will suffer, and the security/useability trade off will probably not be adequate (given how fast vm's start anyway, if you want more isolation, you could just use a VM then)

Zerovm is nice, but it requires porting, and of course, is not a silver bullet either.

pekk · on Feb 23, 2014

If you want security, just use a VM designed for isolation rather than something obscure or experimental or primarily aimed at other use cases.

nl · on Feb 23, 2014

Isn't isolation the primary use case of ZeroVM? Although I'll concede the "experimental" bit.

falconfunction · on Feb 23, 2014

Libvirt and virtsandbox might help?

nl · on Feb 23, 2014

Could you be more specific?

There is plenty of general advice around, but a complete lack of actionable steps to take.

groks · on Feb 23, 2014

http://sandbox.libvirt.org/quickstart/

Or even just:

http://danwalsh.livejournal.com/28545.html

...an selinux sandbox without the containers stuff.

jevinskie · on Feb 23, 2014

Can anybody give me pointers on setting up this scenario?

I would like to run my IRC bouncer inside a container. I want to make sure that this bouncer only makes outbound connections using an OpenVPN instance running outside the container. I think I need to wire up a tun/tap device to the container. The container also needs to somehow accept connections from my IRC client (directly, not through the VPN) while still concealing the connecting IP. Basically, I don't want my ZNC container or the IRC servers that it connects to to have access to sensitive Ip addresses.

SEJeff · on Feb 23, 2014

Great news for docker and the greater container loving communities!

ucarion · on Feb 23, 2014

Hopefully a Docker 1.0 will follow soon too!

magnusgraviti · on Feb 23, 2014

The same thoughts here.

Docker 0.8 added a lot of new features and fixed a lot of bugs but it is still work in progress. I.e. you can't rename containers and have to recreate (remove and create) them.

and thank you guys for the release!

jvermillard · on Feb 23, 2014

and added some bugs ;) like I'm having hard time "rm" a stopped container since 0.8

shykes · on Feb 23, 2014

Hi, Docker 0.8.1 was released last week, it fixes several old bugs as well as all known regressions spotted in 0.8. Would you mind trying it out, and letting us know in an issue if we missed your bug?

Thanks!

jvermillard · on Feb 23, 2014

Thanks will try next week. I just need the time to patch the docker code for bigger /dev/shm

jacksoncage · on Feb 23, 2014

At last! This release is a significant milestone for us as it is the first release we consider to be production ready.

nkvoll · on Feb 23, 2014

Anyone know if it will make it into Ubuntu 14.04 LTS?

magnusgraviti · on Feb 23, 2014

Is flynn.io using LXC or something Docker-like?

xnxn · on Feb 23, 2014

Flynn uses Docker.