Docker 0.9: introducing execution drivers and libcontainer

WestCoastJustin · on March 10, 2014

This article highlights an interesting distinction between "containers" the idea and linux userspace tools, like LXC and lmctfy. I think there has been some confusion about what "linux containers" are, because LXC is actually an acronym for LinuX Containers, but they are not mutually exclusive. Containers the idea, is the concept of isolation and resource control (on linux using chroot, namespaces, cgroups, selinux, etc [1]), where LXC is a set of userspace tools that implement the idea. Take for example, databases the idea, and userspace software like mysql or postgres that implement the idea.

  idea                implementation tools/libs (userspace)
  -----------         -----------
  database            mysql
                      postgres

  linux containers    LXC (LinuX Containers)
                      libvert
                      libcontainer
                      lmctfy

ps. personally, I think Docker is headed in the right direction by developing libcontainer. Docker will have much more control over their own destiny, by using their own library, which implements the linux containers idea.

[1] http://en.wikipedia.org/wiki/Operating_system%E2%80%93level_...

Updated: s/theory/idea/

chubot · on March 10, 2014

This doesn't really make sense... the "theory" (or I would say "idea") is OS virtualization. Then you have various kernel features (namespace, cgroups, etc. -- the page mentions them). And then you have user space tools.

There is no "theory" of Linux containers. It's bound to the implementation of Linux. FreeBSD jails are another, earlier, completely separate implementation of the same idea.

Your analogy conflates the separate issues of abstraction vs implementation and kernel vs user space, which just confuses matters.

WestCoastJustin · on March 10, 2014

Maybe theory isn't the best word to describe what I mean. I've updated the posting s/theory/idea/. Basically, I want to illiterate the idea of something vs the implementation.

> There is no "theory" of Linux containers. It's bound to the implementation of Linux. FreeBSD jails are another, earlier, completely separate implementation of the same idea.

I think you have just hit upon my point re: FreeBSD jails are another, earlier, completely separate implementation of the same idea.

SEJeff · on March 10, 2014

Actually you're still a bit off. This is in reference to libcontainer vs lxc only.

libcontainer is a native golang library for accessing all of the linux namespace and cgroup features of the linux kernel.

lxc is a project that bundles the namespace and cgroup isolation features of the linux kernel into a simple set of command line utilities.

Both are not "implementations of an idea". They are abstractions ontop of the kernel features.

Edit:

For the parent, s/Libvert/libvirt/. It is a virtualization abstraction library written by Dan Berrange and the Redhat crew.

emmelaich · on March 11, 2014

I think you're on the right track and I've had the same thought.

The word I used is 'goal' or 'requirement'. Virtualisation is not an end in itself.

The point of virtualisation is isolation. It is not totally successful in that and it is certainly not the only way to get some isolation.

I've been meaning to put together a list of various approaches in a heirarchy of levels.

It would look something like

    * threads

    * processes

    * users

    * 'virtual' ips

    * chroot

    * jails

    * namespaces (same kernel)

    * kernels  (same machine)

    * hardware (same data centre)

These are not linear or orthogonal of course.

burke · on March 10, 2014

I find it really odd how they emphasize how this release is about stability and refinement, then go on to talk about how they're abstracting over LXC and developing their own bindings as -- not a replacement -- a configurable default.

This reeks of bad choices. If they really think libcontainer is the way to go, IMO they should commit to it fully and make it a 1.0. I don't see a reason that docker needs to support multiple paths to a handful of syscalls, especially at this fairly-immature point in its evolution.

shykes · on March 10, 2014

> I don't see a reason that docker needs to support multiple paths to a handful of syscalls, especially at this fairly-immature point in its evolution.

Adoption.

The priorities of Docker 1.0 are production quality and first class support of all major operating systems. Today, that means first-class support of lxc (because lots of Ubuntu users depend on specific lxc configurations) and first-class support of either systemd or libvirt (because lots of Red Hat users depend on specific systemd or libvirt configurations).

From the strict point of view of surface area and reliability, you're right, it would be better if we exposed a tiny core with no possible customization. But then nobody would use Docker in the first place and there would be no Docker ecosystem, making its quality irrelevant.

The way we resolve this is by 1) exposing lots of APIs to allow better customization, while 2) shrinking the core and making it more reliable. Over time this allows us to converge towards higher quality, without sacrificing adoption. It's a delicate balance, but a necessary one and ultimately I think everybody is better off.

quaunaut · on March 10, 2014

Docker has had a clearly outlined vision of 1.0 for quite some time, and the next major version(which is using libcontainer), 0.10, is going to be the first release candidate for v1.0. So you're not far off, they are committing to it, but there are other pieces required for it to hit their version of 1.0.

FooBarWidget · on March 10, 2014

Several weeks ago when baseimage-docker was released (from the article "Your Docker image might be broken without you knowing it", http://phusion.github.io/baseimage-docker/), many people pointed out that it shouldn't use SSH, but lxc-attach instead. I pointed out that Docker might not use LXC in the future, but people insisted on lxc-attach.

Now, Docker is no longer LXC only. I guess the use of SSH - as a means to login to the container for one-off debugging, inspection and administrative work - is still justified.

shykes · on March 10, 2014

There is a pull request underway which allows injecting new processes into an existing container using the public Docker API. That could be implemented by lxc-attach in the lxc driver, and by calling setns(2) natively in libcontainer.

That should offer a robust alternative to ssh.

FooBarWidget · on March 10, 2014

That is great to hear. Will it also be available as a 'docker' subcommand? The unavailability of such a subcommand is a minor pain right now.

shykes · on March 10, 2014

Yes, probably as docker exec.

While we're at it we will also fix the "pid 1" problem described in the phusion blog post (I would argue it's the most useful and constructive contribution in that blog post).

Specifically, when a process runs as pid=1, it can't be killed by SIGKILL. It can only be affected by signals it explicitly chooses to handle. This is enforced by the kernel to prevent /sbin/init from being accidentally killed. And it is enforced in all namespaces, so that /sbin/init can be run inside a container and behave the usual way. Unfortunately this means that the same rules apply to a regular app (say, a python script) when it runs as pid 1, even though it is not programmed to handle these rules.

In short, regular applications don't expect to be pid 1, and generally speaking they shouldn't. The future version of the libcontainer and lxc drivers should both fix that.

philips · on March 10, 2014

There is a tool called nsenter which is part of util-linux which can be used to enter the namespace of another PID. You can find the manpage here: http://man7.org/linux/man-pages/man1/nsenter.1.html

This can be used for docker, lxc, libvirt-lxc or nspawn.

FooBarWidget · on March 10, 2014

That only works for the Docker backends that are based on Linux namespaces. What about backends like OpenVZ or FreeBSD jails? Even if I limit the scope to Linux namespaces, for arbitrary shell commands to work properly, I have to know exactly which namespaces Docker use, which may be version-specific or configuration-specific.

crosbymichael · on March 10, 2014

There is also a little tool in libcontainer that I use for testing and prototyping called nsinit. You can build and use it to spawn processes inside existing containers.

https://asciinema.org/a/8090

Like shykes said, this functionality should make it into docker soon.

songgao · on March 10, 2014

> We have developed libcontainer in the hope that other projects will reuse it.

Any plans on making that into a separate repo? I thought that would be more community friendly and easier to envolve.

In addition, dokcer/pkg/netlink [0] seems to be a package that others can use as well. Also, any plans on adding IPv6 support to that [1] or is pull requests for that welcomed?

[0] https://github.com/dotcloud/docker/tree/master/pkg/netlink

[1] https://github.com/dotcloud/docker/blob/master/pkg/netlink/n...

jamtur01 · on March 10, 2014

Currently we don't have any plans to make it a separate repository. This is largely because of packaging issues - most of the distros currently struggle with packaging Go components. We're working closely with them to make this better but in the meantime libcontainer lives in the pkg/ directory to make it easier for us to ship Docker.

You can most certainly use netlink in the same way and we'd love a PR for IPv6 support for it!

voidlogic · on March 10, 2014

Can you explain this? How is it hard to package a statically linked binary into an rpm or deb? How can it be harder than a C program? Are you talking about creating -dev packages?

crosbymichael · on March 10, 2014

You can still use these packages with the slightly longer import path util we find a solution that works for everyone.

import "github.com/dotcloud/docker/pkg/libcontainer" import "github.com/dotcloud/docker/pkg/netlink"

smcleod · on March 10, 2014

Yieks, I can't believe this release didn't include a fix for #4068, it's quite serious. https://github.com/dotcloud/docker/issues/4068

shykes · on March 10, 2014

We're on it. Releases are time-based so whatever is merged at the beginning of the month gets released. For bugfixes we maintain a backport branch and can release 0.9.1, 0.9.2 etc. as needed.

smcleod · on March 10, 2014

Excellent, thanks for the update!

vlowther · on March 10, 2014

Reporter here. I just have everyone use the devicemapper backend for everything.

vikrantrathore · on March 11, 2014

Actually I was thinking docker to re-use lxc-go to build the driver for LXC instead of using LXC userspace tools, this would have helped linux containers project as well. But I never intended they should re-write a new library from ground up.

My feeling was since LXC is already 1.0 and almost production ready, it would have been better to integrate its features in docker to make it production ready with one of the major enhancement of unpriviledged containers.

Now libcontainer as it is re-inventing the wheels, what's being done by LXC, will require a long path towards stability similar to LXC 1.0.

Moreover docker team should have spend more time building other drivers similar to LXC for OpenVZ or Solaris and BSD zones. Anyways I am not doing any code contribution so do not know the priority of the docker team. But this seem more sensible to me. Just my two cents.

Aqueous · on March 10, 2014

The docker driver system is an abstraction that could make docker truly cross-platform, correct? Are we going to start seeing a Docker that can deploy containers on MacOS X/Windows soon, on bare metal, without a virtualization environment?

shykes · on March 10, 2014

Yes, it's definitely a first step in that direction :)

sturadnidge · on March 10, 2014

But please, don't make us wait for 1.0 until you have first class Windows support (your 1.0 goals do say 'all major operating systems')...

jamtur01 · on March 10, 2014

The first step there is boot2docker - http://boot2docker.io/ - for OSX and Windows. I don't think Docker 1.0 will have native server support for either operating system but I'd be very surprised if it wasn't possible in a near term subsequent release.

voltagex_ · on March 10, 2014

I'm not sure what tech you'd use on Windows for that - there's no container-like concepts available on Windows as far as I know.

derefr · on March 11, 2014

Windows is kinda-sorta halfway there through a combination of window stations, integrity levels, and file and registry virtualization. I'm guessing any result in this space would inevitably look pretty similar to ThinApp (http://www.vmware.com/products/thinapp/features.html), which is probably as close as you can get.

nl · on March 11, 2014

There's virtualization, of course. Client HyperV apparently comes with Windows 8 now[1].

Obviously it's a lot heavier than a container-like system though.

[1] http://www.techrepublic.com/blog/windows-and-office/get-star...

mooism2 · on March 11, 2014

Docker newbie here. (Actually, not even a newbie: Docker's on my to-play-with list but I've not got round to it yet.)

> Docker out of the box can now manipulate namespaces, control groups, capabilities, apparmor profiles, network interfaces and firewalling rules – all in a consistent and predictable way, and without depending on LXC or any other userland package. This drastically reduces the number of moving parts, and insulates Docker from the side-effects introduced across versions and distributions of LXC.

Does this mean Docker is no longer tied to Ubuntu, merely to any sufficiently modern Linux kernel? The install instructions still recommend Ubuntu.

nl · on March 11, 2014

Docker has been usable on non-Ubuntu Linux for a while now (eg [1]). Non-Ubuntu platforms do seem to get exercised less than Ubuntu though.

[1] http://docs.docker.io/en/latest/installation/rhel/

ChuckMcM · on March 10, 2014

Assuming the docker team is listening in, consider making your RC release 0.98 or 0.99 rather than 0.10. I realize that your going for "dot ten" which comes after "dot nine" but waaaaaay too many years of seeing .1 as being .8 less than .9 strangely biases my opinion of .10 as being less good than 0.9.

Nice job on the container work though, that will be really useful.

ominous_prime · on March 10, 2014

After waaaaay too many years using software, you regularly don't see 0.9, 0.10, 0.11, and so on? Semantic versioning has never been decimal based.

shykes · on March 10, 2014

Hi - we are listening :)

I agree 0.10 might be confusing to people... But we had a discussion about it on #docker-dev and the conclusion was that in doubt, following the model set by Linux couldn't be that bad. The most important thing, I think, is to be consistent and stay the course over many releases. That should do more to reassure people than any particular choice of versioning scheme, I think.

staunch · on March 10, 2014

Nice work! It's very cool that you're breaking out components into nice standalone Go libraries.

shmerl · on March 11, 2014

Can Docker be a solution for GOG to offer long term Linux support? Their concern was about difficulty to offer long term support when distros are moving with different pace. May be Docker can help them isolate the environment while having little performance overhead?

georgebarnett · on March 10, 2014

Congrats on getting closer to 1.0!

Am I correct in thinking that lib container is not root safe (where LXC 1.0 was)? If so, could you tell me when you think root safety will be re-introduced?

I realise that I can use LXC, however I'm interested in libcontainer.

craigyk · on March 10, 2014

How does one prevent docker from insisting of doing its own DHCP? I'd like to just have it add an interface to a bridge and have the container get an address from the network DHCP server.

mattparlane · on March 10, 2014

So, the current Debian/Ubuntu package is called docker-lxc -- is that name going to change now that Docker doesn't (necessarily) have anything to do with LXC?

jamtur01 · on March 10, 2014

Yes. That's a change we've got planned and we're working with the Ubuntu/Debian packagers.

zobzu · on March 10, 2014

the important part here is that they no longer really use LXC.

vikrantrathore · on March 11, 2014

thats the main issue they don't use LXC but want LXC kind of functionality and re-inventing the wheels with libcontainer. Indeed it would have been better to keep the docker core as small as possible and move LXC driver code as module. Indeed instead of using LXC userspace tools they might have focused on rebuilding the driver using lxc-go. This would have helped LXC project as well and docker team then can focus on incorporating production quality changes from LXC into Docker like unpriviledged containers.

This is really an issue abandoning a production quality LXC 1.0 code and re-writing it from scratch. This is NIH syndrome.