Hacker News new | past | comments | ask | show | jobs | submit login

Package management is entirely non-repeatable in almost every incarnation. DEB, RPM, even Slackware tarballs, all run scripts on install. There are shunts of existing files and directories, interactive prompts, EULAs to agree to (try installing mscorefonts on Ubuntu), packages that immediately start services that generate files on first run that the service expects to be there from then on, packages that expect other packages to not have been installed (samba3 doesn't like samba4 much)--and so forth. Installing a package can fail, or produce widely-varying results from machine to machine.

"Configuration management", ala Chef/Puppet, is no better--it tries to manipulate a system from an unknown state into a known state, without having an awareness of the full set of interactions that the unknown state can have on the known state. (For example, deploy Apache with default setup via Chef, on a server that already had Nginx installed manually. What do you get? Who knows!)

You'd think that, say, running the OS install media from scratch on each "container-up", and then running a script to install a preset list of packages with hardcoded versions, might be enough--but nope, OS install media is absolutely non-deterministic, just like everything else. The installer could decide from the container's IP address that it should talk to a different package server (oh hey I'm in Canada let's use ca.archive.ubuntu.com!) and then find itself unable to get past the deploy-infrastructure's firewall-whitelist.

In short--anything that relies upon, or can make a decision based upon, information that could be different from container to container (like the container's IP address, for example) isn't guaranteed to produce binary-identical results at the target. You only get that by running through whatever imperative process spits out all these files once--and then freezing the results into a declarative container.

So, what does work? Xen snapshots. Amazon AMIs. Vagrant images. All of these are declarative. And all of these are target formats for Docker. Docker is a vendor-neutral thing which you will turn into these, along with turning it into its current LXC+aufs form.

And note, by design, the running images of the "final product" will be leaf-nodes; you won't touch them or modify them or SSH into them, you won't base new containers on them; you'll just spin them up and then down again, like on-demand EC2 instances. Docker is not for doing fancy things with the running final products in their target format. Docker, by itself, is not for production at all! Once you've deployed a Docker container as a Xen snapshot or an AMI or whatever, it's done; some other infrastructure takes care of the running the target-format containers part.

So what's all that junk in the Docker toolchain? Docker, as a standard, is an intermediary format that makes it easy for developers to build these vendor-neutral images. The reason you can start containers, stop them, freeze them, and then fork new containers from them, is entirely to do with developing new container images, and not at all to do with deploying containers in production. It's about re-using common parent images, by reference, as easily as we currently stick a Github repo in our Gemfiles.




> Docker is not for doing fancy things with the running final products in their target format. Docker, by itself, is not for production at all! Once you've deployed a Docker container as a Xen snapshot or an AMI or whatever, it's done; some other infrastructure takes care of the running the target-format containers part.

This is something very interesting and under-communicated. I've always assumed docker is run on the production server and is used to pull updated images and spawn containers. But you're suggesting one uses custom toolchain for making an image out of the container filesystem and the LXC template, and then deploy this container?


Both are possible. And since docker itself is not yet production-ready, exporting docker containers to "inert" server images (for example AMIs) is a good stopgap which allows you to use docker for dev and test.

But that is not the ideal workflow. The ideal workflow is to run docker on all your machines, from development to production, and move containers across them. If you don't do that, you will miss out on a big part of the value of docker. To name a few: identical images in dev and prod; lightweight deployment; and a toolchain that is less dependent on a particular infrastructure.


This exchange has really helped me understand more about Docker and why and when to use it. Thanks!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: