> docker is just slightly further along the spectrum than Xen: instead of using a special guest kernel, you use the host kernel. Instead of paravirtualisation ops, you use a combination of cgroups and lxc containers. Without the direct virtualisation of hardware devices, you don’t need the various special drivers to get performance, but there are also fewer security guarantees.
No. People seem to be really confused about this. Docker is a container standard, not a virtualization system. The thing you can download on the Docker website, besides being a toolchain for creating containers, is a reference target for containers to deploy to, which just happens to (currently!) use cgroups and lxc and aufs overlays.
The point of Docker is to create one thing (basically a container file format + some metadata stored in a "container registry") that you can deploy to all sorts of different places, without changing what's inside it. In fact, one of the goals of Docker is precisely that the reference target (cgroups+lxc+aufs) could be entirely swapped out for something else in the next version of the Docker toolchain, and none of your containers would have to change.
Docker containers will be deployable as Vagrant boxes, Xen VMs, AWS instances, whatever. Targets that have an overlay filesystem will use overlays; targets that don't will build a flattened filesystem image from the container stack. Targets that have paravirtualization will rely on their host kernel; targets that don't will rely on the kernel in the container's base image. And so forth. It isn't possible yet, but that's the entire goal here.
(Disclaimer; I wrote the blog post a little while ago, and my thoughts on this have changed a little bit - surprised to see it pop up here...)
I think the confusion here is two-fold; sure, I think many don't really grasp what the point of the project is in its entirety, but I also think the aim of the project is a bit confused.
The comment I made that you're referring to was specifically about the runtime: that's the bit of the paragraph at the start you chopped out. Yes, the runtime could change in the future, but the comment was about the existing runtime.
The 'container standard' thing I think is potentially interesting, but actually, I don't think it buys much. As a set of tools, it is substantially weaker than the existing system development/spinning tools. And sure, getting rid of overlays might be possible - but then, what's the point? If you're going to flatten out trees, you may as well build the image properly in the first place.
The point of the layered containers has nothing to do with how they're run; it has to do with how they're developed. By splitting the OS from the runtime from the application, each one can be updated when the layer above it needs it to be, and is otherwise fixed to a binary-identical image. Then, new releases of higher-level things (apps) can target the old versions of lower-level things (runtimes, OSes), knowing they will be literally, bit-for-bit, the same thing their other releases are using.
This is the guarantee Heroku makes, for example: updates to their servers will never break your app, because although the packages making up their infrastructure might update, the packages making up your container's base image are frozen in time unless you switch out your container's "stack" (base image) for a new one.
Having a frozen base OS image, and then a frozen runtime on top of that, allows for perfect repeatability in your deploy process. Once you've got a tested-and-working runtime image, that references a tested-and-working OS image, you just stop touching them altogether; you keep the container-ID of that runtime image fixed in production, and deploy your new app releases on top of it.
One neat side-effect of this: if containers have parent-images in common, those parent-images can be cached at the target. If all the containers running on some server use the same common base-image, that base image only needs to be downloaded to the server once. The second-through-Nth time, the container only grabs what's different--the tiny "app" part of the image--and then composes it with everything into a running container.
I disagree entirely, tbh, the only advantage of layering I see is precisely in the runtime: it makes the layers that change most very slim and easily deployable.
Repeatability is great, but who doesn't have that already? Are people really building new OS images for every deploy? I don't believe that for a second, and I can't think of many tools off the top of my head that don't have that baked in right from the start. That's the whole point of package management.
> Repeatability is great, but who doesn't have that already?
In my experience: Pretty much nobody does fully repeatable builds. Most people have a hodgepodge of "mostly similar" OS images that sometimes get updated in production, possibly in contradictory ways.
You'll find the odd exception, but outside of large-ish companies they are few and far between, largely because while it certainly can be done, the toolchains people work with are cumbersome enough that for small teams it gets terribly tempting to e.g. "apt-get install" something on a live box rather than update their images and deploy.
> That's the whole point of package management.
Only if you disable it in production and only ever use the package management tools to update OS images.
"Repeatability is great, but who doesn't have that already?"
Almost everyone. I think this is the heart of the misunderstanding right there.
99% of developers don't have access to a repeatable deployment system. When they do, it severy limits what they can deploy, and where.
You mention in your article that the "secret sauce" of docker is "not that clever". The point is not to be clever but to solve a problem. The fundamental technology for solving repeatable deployment has been available for years, and what do those in the know have to show for it? We're all still reinventing the wheel for every single damn project, wasting huge amounts of energy in the process.
Package management is entirely non-repeatable in almost every incarnation. DEB, RPM, even Slackware tarballs, all run scripts on install. There are shunts of existing files and directories, interactive prompts, EULAs to agree to (try installing mscorefonts on Ubuntu), packages that immediately start services that generate files on first run that the service expects to be there from then on, packages that expect other packages to not have been installed (samba3 doesn't like samba4 much)--and so forth. Installing a package can fail, or produce widely-varying results from machine to machine.
"Configuration management", ala Chef/Puppet, is no better--it tries to manipulate a system from an unknown state into a known state, without having an awareness of the full set of interactions that the unknown state can have on the known state. (For example, deploy Apache with default setup via Chef, on a server that already had Nginx installed manually. What do you get? Who knows!)
You'd think that, say, running the OS install media from scratch on each "container-up", and then running a script to install a preset list of packages with hardcoded versions, might be enough--but nope, OS install media is absolutely non-deterministic, just like everything else. The installer could decide from the container's IP address that it should talk to a different package server (oh hey I'm in Canada let's use ca.archive.ubuntu.com!) and then find itself unable to get past the deploy-infrastructure's firewall-whitelist.
In short--anything that relies upon, or can make a decision based upon, information that could be different from container to container (like the container's IP address, for example) isn't guaranteed to produce binary-identical results at the target. You only get that by running through whatever imperative process spits out all these files once--and then freezing the results into a declarative container.
So, what does work? Xen snapshots. Amazon AMIs. Vagrant images. All of these are declarative. And all of these are target formats for Docker. Docker is a vendor-neutral thing which you will turn into these, along with turning it into its current LXC+aufs form.
And note, by design, the running images of the "final product" will be leaf-nodes; you won't touch them or modify them or SSH into them, you won't base new containers on them; you'll just spin them up and then down again, like on-demand EC2 instances. Docker is not for doing fancy things with the running final products in their target format. Docker, by itself, is not for production at all! Once you've deployed a Docker container as a Xen snapshot or an AMI or whatever, it's done; some other infrastructure takes care of the running the target-format containers part.
So what's all that junk in the Docker toolchain? Docker, as a standard, is an intermediary format that makes it easy for developers to build these vendor-neutral images. The reason you can start containers, stop them, freeze them, and then fork new containers from them, is entirely to do with developing new container images, and not at all to do with deploying containers in production. It's about re-using common parent images, by reference, as easily as we currently stick a Github repo in our Gemfiles.
> Docker is not for doing fancy things with the running final products in their target format. Docker, by itself, is not for production at all! Once you've deployed a Docker container as a Xen snapshot or an AMI or whatever, it's done; some other infrastructure takes care of the running the target-format
containers part.
This is something very interesting and under-communicated. I've always assumed docker is run on the production server and is used to pull updated images and spawn containers. But you're suggesting one uses custom toolchain for making an image out of the container filesystem and the LXC template, and then deploy this container?
Both are possible. And since docker itself is not yet production-ready, exporting docker containers to "inert" server images (for example AMIs) is a good stopgap which allows you to use docker for dev and test.
But that is not the ideal workflow. The ideal workflow is to run docker on all your machines, from development to production, and move containers across them. If you don't do that, you will miss out on a big part of the value of docker. To name a few: identical images in dev and prod; lightweight deployment; and a toolchain that is less dependent on a particular infrastructure.
Take a look at the latest blog post, which should address the confusion you mention: http://blog.docker.io/2013/08/getting-to-docker-1-0/. The blog addresses the direction that Docker is headed in and who the core audience of the tool should be.
Ever since I first read about docker I wondered: How are you dealing with application data and updates of containers? Let's say you package everything you need for your web application together as one container: Web Server, Database-Server and Application Server: Where would the data go that the users of that web applications generate?
In the container? That would means that the data goes away as the container gets updated (if you just replace the old container with a new one). Or do you just never replace the whole container but just update what's inside? Or would you mount a device of the host? Does the container get access to device nodes of the parent? I would assume not. Or can you provide a container with something it can mount? Or are you stuck with some network based storage solution? That would rule out running databases in containers once the load raises.
Yes. I could just read up on all of this, but I have a feeling that other people have the same question, so by asking here, I might help them too.
I think this issue somewhat compromises Dockerfiles as a means of distributing applications. With a VM, you can supply something in a minimally configured, plug-and-play state, and then leave it to the user to set it up. With Docker, it's a little bit too painful to configure things after running an image. I'm not sure what the answer to this is.
Which is it: Minimally configured, or plug-and-play?
Either you'll be setting up your containers to prepare you to use something like Chef or Puppet against the, or you'll be setting up your containers to be fully configured apart from connection details to other components.
That last bit is down to service orchestration, which is outside the scope of Docker as far as I understand it. You can "roll your own" easily enough: Write a "/usr/local/bin/add-relation" script that takes a function and a list of arguments to configure that function.
Write a script that reads a config file that defines the relationships between container types (e.g. your MySQL container and your Web container), and triggers execution of those add-relation scripts. E.g. your Web container gets called with "add-relation Mysql --ip [foo] --username [bar] --password [baz]" and the version of the script in that container knows how to add users to MySQL.
That is very roughly the approach that Juju (Ubuntu's orchestration system) takes.
The point is to split all configuration details into thress classes: Those that can be set statically on build. Those that can be decided on first boot (e.g. re-generate a ssh-host-key, get IP address etc.). And finally those that needs to be set dynamically based on what other containers are spun up.
The latter has no place in the Dockerfiles or images, but scripts or tools to set these config details based on input from some external source, does.
It may sound painful, but only if you do it to a handful of vm's that rarely change. The moment you need to manage dozens, or need to spin vm's up and down regularly, taking the effort to set up proper orchestration, whichever method you prefers, wins hands down to having a user set it up (I can guarantee with near 100% certainty that said user will not set them up different VMs consistently).
Think of it like an EC2 instance with ephemeral storage. You need an external storage service provided by the host (like EBS and S3 are for EC2), that the instances either talk to over the network (like S3), or which the host mounts into the container (like EBS.) Both are possible options with the current Docker runtime.
Yes. And we all know how well EBS works for consistent database performance. That's why I said "... would rule out running databases in containers ..."
I don't think so, but bind-mounts makes that irrelevant for most uses. If you need bare-metal device access, then you have very specialised needs that apply to a very tiny fringe set of users (as an example, we can reach 1GB/s reads from our SSD RAID arrays on some of our containerised database servers without resorting to raw device access).
I don't know if they meant GB or Gb, but for the record, I didn't see any difference in disk performance between native and containerized apps. In that case, that was pulling ~900 MB/s seq reads from a RAID10 disk array of 8 old 7k HDD. This is not surprising, as the code path for block I/O is exactly the same for native and containerized processes.
No, I don't think so. This is prevented by kernel namespaces and default privileges of LXC in Docker. That doesn't mean you can't bind mount a directory from your host into your container though; there's also the concept of volumes.
I'm quite excited seeing the open source community putting so much work into Linux containers. For the most part they're a lot better than true virtualization. We've seen it in the past, what you guys can do with Puppet as a configuration system.
I just wish lxc was as secure as Solaris zones. Since containers are not secure at all, they definitely won't be used for shared hosting. The team seems to be working on it, but it will probably take a few years to get it secure enough:
I feel like you're just spreading FUD here. This was definitely true a few years ago, but "Citation Needed" applies. The worst thing I found in the article you linked is that guests use the same kernel as the host, so if the host kernel is vulnerable, it will still be vulnerable from the guest...
Then it goes on to say "we have seccomp2 to lower the dimensions of attack surface." It does not sound to me like your citation agrees with what you said at all.
I just hope you've read the top-rated comment, where it's explained that containers are not virtualization, and they solve different problems.
IF you have root on the container, you have root on the host. This is a HUGE difference and probably one of the main reasons LXC isn't in huge use for VPS.
This is, btw, different than Solaris Zones, which give you a complete new user management for each container. They're very isolated. Zones have had some exploits to get out of the Zone, but they're pretty secure. LXC has started moving towards a more secure design but it will take years (IMO) to get LXC actually in production for _shared_ hosting.
As of Linux kernel 3.1.5, LXC is usable for isolating your own private workloads from one another. It is not yet ready to isolate potentially malicious users from one another or the host system. For a more mature containers solution that is appropriate for hosting environments, see OpenVZ.
"Containers are not for security", he said, because root inside the container can always escape, so the container gets wrapped in SELinux to restrict it ... A number of steps have been taken to try to prevent root from breaking out of the container, but there is more to be done. Both mount and mknod will fail inside the container for example. These containers are not as secure as full virtualization, Walsh said, but they are much easier to manage than handling the multiple full operating systems that virtualization requires. For many use cases, secure containers may be the right fit.
> IF you have root on the container, you have root on the host.
This is only true on badly configured systems. If you run some kind of public shared hosting (like Heroku, dotCloud, etc.) you probably slap some extra security on top of it. For instance, dotCloud uses GRSEC, limits root access, and uses kernel capabilities.
It won't take years to get LXC in production for shared hosting: it has been in production for shared hosting for years -- but by people who (more or less) knew what they were doing.
Agreed, "out-of-the-box LXC" is probably not that secure; which is probably why many people won't deploy it. And I can't blame them. Any technology generally starts being usable (or usable safely) only for expert users, then progressively gets more industrialized and ready to use for a broader audience. It doesn't mean that the technology is not mature.
Also, the user separation that you mention has been implemented in the Linux kernel for a while[1]; it's called "user namespace", and even if the default LXC userland tools do not make use of it at this point, it's here.
I do know those things. It doesn't change the fact that you can _easily_ break out of a Container and compromise neighboring Containers. No matter how much to harden the system you implement.
Who is using LXC in a _shared_ hosting environment?
You've cited kernel 3.1.5 which was literally a couple of years ago. As for the rest of your resources, they are LXC without Docker. Docker is the special sauce that makes MongoDB WebScale.
No really... if Containers are not for security, Namespaces and CGroups are. I won't pretend to know what SElinux does, or exactly how all of this works, docker does not purport to be production ready, but I would think that sharing the box with someone else would be the fastest way to find out if it can be broken.
This is exactly what I'm trying to tell people here on HN. A few weeks back there were discussion about SmartOS, then Zones, Containers, Virtualization, Para-Virt. etc. etc..
Most people here on HN had very little knowledge about Zones/Containers a few months ago. Right now people seem to think Containers are just as good as Virtualization (and can replace it) but have a better IO performance due to all having one Kernel. The problem is: It is _not_ going to replace Virtualization any time soon since the security is missing. It's very very easy to break out of a Container, heck even a Zone.
This is exactly what I'm trying to preach. You can put different users on Containers (what we do with ~1000 Users) but you can't give them root or you've compromised the entire host.
The first link to suse.com should be valid enough. If you want to know the details about security and if it can be done, I'd suggest mailing the lxc user list.
I'm just trying to debunk this myth that Container are just like super fast and easy Virtual Machines. They are not (at least not yet).
What makes me excited about Docker’s diff-based filesystem stack is that it potentially allows a build of code to go through a CI chain untouched, all the way into production really quickly. I’m not a virtual machine expert, but I believe this is slower in traditional virtualization like Vagrant for example. Am I far off? At work we’re using layered AMI build tools to roll new services and app builds into production, and the process is not quick.
You hit the nail on the head. That is a major promise of Docker, and the reason we are emphasizing Dockerfiles so much. It makes the act of building your code 100% automated and discoverable, and the resulting build should be usable, unchanged, all the way from development to production.
> given the claims on the website I assumed there was something slightly more clever going on, but the only “special sauce” is the use of aufs to layer one file system upon another.
There's also the network stuff? From docker's website, a simple "docker run ubuntu /bin/echo hello world" does the following:
It downloaded the base image from the docker index
it created a new LXC container
It allocated a filesystem for it
Mounted a read-write layer
Allocated a network interface
Setup an IP for it, with network address translation
And then executed a process in there
Captured its output and printed it to you
What is the recommended workflow for using docker?
Say I have:
* a base Ubuntu container with a few utilities installed like htop and my user account
* a base Python web container with Nginx and libxml, libjpeg, etc
* a specific Python web app
If I update my app should I save the container state and push it to all my web servers or use something like fabric to update the app on all my app servers?
If Nginx has a security release and I update my base Python web container, do I now need to rebuild all my Python web containers or is there some way to merge?
When the next LTS release of Ubuntu comes out and I upgrade, do I have to manually apply that to every container or just the base?
Serious question. I've been looking at docker today and I can't seem to grasp why would I use docker for dev and deployment over something like ansible also both for dev (on VM) and deployment. I must be missing something. I now understand vagrant somewhat that it alleviates manual hurdle of VM starting and running, but not enough for me to use it.. yet I don't see where and how docker fits in. Would I be wrong if I thought of it more as a virtualenv(wrapper) but not only for python?
Thanks for the article. While I disagree with your conclusion, getting a clear view of the the misconceptions that exist allow for more pointed documentation and focus.
Anyways is there any option to grant external ftp access to docker instances? I was wondering if I can move my multiplayer games instances to docker and allow my users to still connect to ftp.
No. People seem to be really confused about this. Docker is a container standard, not a virtualization system. The thing you can download on the Docker website, besides being a toolchain for creating containers, is a reference target for containers to deploy to, which just happens to (currently!) use cgroups and lxc and aufs overlays.
The point of Docker is to create one thing (basically a container file format + some metadata stored in a "container registry") that you can deploy to all sorts of different places, without changing what's inside it. In fact, one of the goals of Docker is precisely that the reference target (cgroups+lxc+aufs) could be entirely swapped out for something else in the next version of the Docker toolchain, and none of your containers would have to change.
Docker containers will be deployable as Vagrant boxes, Xen VMs, AWS instances, whatever. Targets that have an overlay filesystem will use overlays; targets that don't will build a flattened filesystem image from the container stack. Targets that have paravirtualization will rely on their host kernel; targets that don't will rely on the kernel in the container's base image. And so forth. It isn't possible yet, but that's the entire goal here.