Hacker News new | past | comments | ask | show | jobs | submit login
Docker 1.10.0 is out (github.com/docker)
194 points by raimille1 on Feb 4, 2016 | hide | past | favorite | 91 comments



Disclaimer: I work for Docker

For the security enthusiasts out there, Docker 1.10 comes with some really cool Security focused additions. In particular:

- Seccomp filtering: you can now use bpf to filter exactly what system calls the processes inside of your containers can use.

- Default Seccomp Profile: Using the newly added Seccomp filtering capabilities we added a default Seccomp profile that will help keep reduce the surface exposed by your kernel. For example, last month's use-after-free vuln in join_session_keyring was blocked by our current default profile.

- User Namespaces: root inside of the container isn't root outside of the container (opt-in, for now).

- Authorization Plugins: you can now write plugins for allowing or denying API requests to the daemon. For example, you could block anyone from using --privileged.

- Content Addressed Images: The new manifest format in Docker 1.10 is a full Merkle DAG, and all the downloaded content is finally content addressable.

- Support for TUF Delegations: Docker now has support for read/write TUF delegations, and as soon as notary 0.2 comes out, you will be able to use delegations to provide signing capabilities to a team of developers with no shared keys.

These are just a few of the things we've been working on, and we think these are super cool.

Checkout more details here: http://blog.docker.com/2016/02/docker-engine-1-10-security/ or me know if you have any questions.


It's "funny" yesterday RKT made the announcement of their version 1.0 (with emphasis on security) and today we have 2 news about Docker at the top of HN with your comment about security.


By the way, you can use DockerSlim [1] to auto-generate custom seccomp profiles (in addition to shrinking your image). They are already usable, but they can be improved. Any enhancements or ideas are appreciated.

[1] http://dockersl.im


Any idea on the priority of getting a container with working systemd ?

https://github.com/docker/docker/pull/5773 and https://github.com/docker/docker/issues/3629


Disclaimer: I work for SUSE, specifically on Docker and other container technologies.

Docker containers /in principle/ do work with systemd. They are implemented as transient units when you use --exec-opt native.cgroupdriver=systemd (in your daemon's cmdline). I've been working on getting this support much better (in runC and therefore in Docker), however systemd just has bad support for many of the new cgroups when creating transient units.

So really, Docker has systemd support. Systemd doesn't have decent support for all of the cgroup knobs that libcontainer needs (not to mention that systemd has no support for namespaces). I'd recommend pushing systemd to improve their transient unit knobs.

But I'd rather like to know why the standard cgroupfs driver doesn't fulfil your needs? The main issues we've had with systemd was that it seems to have a mind of it's own (it randomly swaps cgroups and has its own ideas about how things should be run).


im not sure if we are talking the same thing here. I'm talking about systemd inside a container (as pid 1). I think that's the part that's not working.

Every few days someone comes up with a new run script for docker (baseimage "my_init", etc). I personally use supervisord. Since systemd is already universal, might as well use that.

Somebody posted this yesterday - https://news.ycombinator.com/item?id=11019143

Im already running my containers on a debian host with systemd - so that is ok. Overlayfs is still causing some problems though.


> I'm talking about systemd inside a container (as pid 1). I think that's the part that's not working.

Ah sorry, I misunderstood. I'm not sure why you'd want to use systemd as your process manager in a container. systemd has a very "monolithic" view of the system and I'm not sure you gain much by using systemd over supervisord (I'd argue you lose simplicity for useless complexity).

> Overlayfs is still causing some problems though.

I've been looking into overlayfs and I really encourage you to not use it. There have been an endless stream of bugs that the Docker community has discovered in overlayfs, and as far as I can see the maintainer is not particularly responsive. There's also some other issues (not being POSIX complete) which are pretty much unresolvable without redesigning it.


Whoa... Thank you so much for pointing out the issue with overlays. There seems to be no real consensus on what should be used. Could you talk about what should be used?

Just FYI - we use Debian on Linode.


Devicemapper works (make sure you don't use loop devices) okay. Unfortunately it's slow to warm up, but it's probably the most tested storage driver (it's the default on a bunch of systems).

btrfs works pretty well and is quite a bit faster. It's the default for SLE and openSUSE (as well as other distros which use btrfs by default). I'd recommend it (but I can't remember if it requires you to use btrfs on your / partition, which might be an issue for you).

ZFS, while being an awesome filesystem, I doubt has had much testing under Docker, so I'd be wary about using it.

And I've already told you what I thought about overlay. I'd like to point out that it's a good filesystem for what it was designed for (persistent or transient livecds) but the hacks in Docker in order to use it for their layering keeps me up at night.


Yeah - all of which mean that I have to move away from Linode. They can't backup anything other than a vanilla ext4 volume. I setup a direct lvm docker vm on Linode (was surprisingly easy) - but Linode is refusing to back it up.

Oh well, should have done this long ago.


If Linode supports ZFS, you could use ZFS send-recieve to make backups to your local machine. But like I said, the ZFS storage driver probably hasn't been well tested.


nope. no backup for anything other than EXT4. Its really weird and limiting. I mean LVM volumes have to be pretty standard right ? (I would argue more standard than ZFS).


Whats advantage of having a "supervisor" inside the container, rather than just "supervising" the container itself?


Because "only one process in a container" is a dangerous rule (because it has so many exceptions). In certain cases, that idea makes sense, but you shouldn't contort your app such that you only have one process in every container. Not to mention that there are other issues (PID 1 has unique problems that databases and your app aren't used to having to deal with).


maybe you should read this [1]. We have always run all processes under a supervisor.

[1] https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zomb...


I think this was the problem [1] was targeting [1] http://engineeringblog.yelp.com/2016/01/dumb-init-an-init-fo...


One of many.

Which is why u think might as well get systemd working on docker...


I see, so it looks like if your process spawns other processes and doesn't reap them when they die, you end up in trouble with zombie processes in docker.


> The new manifest format in Docker 1.10 is a full Merkle DAG, and all the downloaded content is finally content addressable.

Can someone elaborate on this a bit more? From a CS point-of-view, sounds like a problem where a data structure came in handy but I'm not sure what it solves. Thanks!


A simple immutable data structure can be implemented with a Merkle DAG. Merkle tree leaves store hashes of previous nodes and DAGs are directed graphs that don't loop around. Examples include simple blockchains. These structure provides immutable, versioned control of information. Containers are immutable, or like to think they are at least, so blockchains are an obvious thing to use in conjunction with deployments of said containers. At least that's what I keep telling everyone.


Do you literally mean a proof-of-work backed blockchain (like Bitcoin), or something more like git, which has a similar structure to a blockchain without the consensus mechanism?

I don't see how the former would be useful to someone deploying containers, but interested to hear your thoughts in either case.


From the blogpost:

> Image IDs now represent the content that is inside an image, in a similar way to how Git commit hashes represent the content inside commits.


I was referring to the second part of the comment:

"Containers are immutable, or like to think they are at least, so blockchains are an obvious thing to use in conjunction with deployments of said containers. At least that's what I keep telling everyone."


I'm pretty sure they mean that they are using Merkle DAGs. A blockchain is a Merkle DAG. The proof-of-work algorithm in Bitcoin is an algorithm for deciding how a node gets added to the blockchain. Depending on how you look at it, that algorithm is not part of what makes it a "blockchain".

Admittedly people are sloppy about how they use the term "blockchain". I would prefer that people use the term Merkle DAG and forget the term "blockchain" altogether, but I think we are stuck with "blockchain" ;-)


"People" are also sloppy about how they use the term "cloud", yet the world goes on with that concept in hand applying it to everything in site, often times in irritating ways. "Blockchain" is now a thing people can hold in their hand as a way to visualize the concept of a nearly immutable data store. That idea of storing something in an immutable way represents a shift in the way we can think about system's design. Calling it a "Merkle DAG" isn't going to kick off that insight any better than using "blockchain", but remembering what it really is and drawing the distinction with the right people can be immensely useful when trying to implement the insight.


Blockchains don't have to contain proof-of-work as long as the values in the chain itself aren't valued in and of themselves over longer time periods. In Bitcoin, a cryptocurrency built using a blockchain, the values represent debt owed to someone in exchange for a real world item, and that debt stays active for the life of the entry. There are a slew of proof-of-somethings that allow blockchains to become cryptocurrencies. I don't exactly ascribe to these ideas of value store as related to compute provisioning, but I suppose there could be some actions which might benefit from it, such as certain types of licensing.

That said, triggering provisioning using cryptocurrencies is likely to be a thing at some point.


Is a git repository one implementation of a Merkel DAG?


> Docker 1.10 uses a new content-addressable storage for images and layers.

This is really interesting.

Sounds like the up/download manager has improved too. I did some early work adding parallel stuff to that (which was then very helpfully refactored into actually decent go code :), thanks docker team) and it's great to see it improved. I remember some people looking at adding torrenting for shunting around layers, I guess this should help along that path too.


IIRC, Docker has used content-addressable storage for layers for a very long time (in the form of filesystem directories whose names looked like md5 hashes). I'm not sure what's changed. Maybe just the hash function?


Layer IDs were random UUIDs before.


Correct, layers were hashed for verification at upload and download, then stored on a "regular" uuid-addressed storage. Now they are stored in a true content-addressed store end-to-end.


Network-scoped aliases are really handy when dealing with a multi-container setup, so I'm really happy that they implemented this!

In previous versions, only the name of a container would be aliased to its IP address, which can make it hard to deploy a setup with multiple containers in a given network group that should address each other using their names (e.g. "api" host connects to "postgres") and then have multiple instances of those groups on the same server (as container names need to be unique).


For those interested in the user namespace support, the best post I found was https://integratedcode.us/2015/10/13/user-namespaces-have-ar... (there are also some docs here https://github.com/HewlettPackard/docker-machine-oneview/blo...)


I had with an issue with files in a volume being created by users in the container that don't exist on the host. I was trying to figure out if this fixes that issue, and if so how? I played around with it for an afternoon but left confused.

Use case:

Using compass within a container for development

Compass creates sass files with whatever user it run under within the container (likely root)

Host must chmod them to do stuff with them

As a work around, I've been building images and creating a user:group that matches my host. Obviously this destroys portability for other developers.


Yes, I think this will solve it.

In unix, a uid is just an integer (chown <random_int> file will work). In your case, the container created a file in a volume with a uid. This uid makes no sense on host but it leaks out to host anyway since it's a 'volume'.

I think with the userns map, you can map container uid to a host uid. The files created in the volume will then be visible on the host as the mapped uid.

This is my understanding, I have to play with it :-)


I think the canonical way (at least a while ago) was to only manipulate volumes using containers. There might be other solutions available now but this one seems to be the easiest, as it does not require to change the permissions or ownership on the files/volumes being manipulated.


Don't run compass as root.

Don't run anything as root unnecessarily.

(Your problem is likely because you refer to the root namespace as a "host". It is common practice, but leads people to bad conclusions.)


I think I might have caused a misunderstanding of the issue (I was trying to be brief). It really doesn't matter who I'm running compass as (root or otherwise), the issue is that the container is writing files to my host with a different UID:GID then the one I'm using on my host machine.

I wouldn't normally run compass as root, it was incidental to the actual issue.


It's not clear why that should be an issue.

Are there applications communicating through the file system? You would need to take care of that in more ways than standardize uid.


Wow, user namespaces! That was quick!

EDIT: And a default seccomp profile! Did I miss the memo about containerisation suddenly becoming a competative industry?


From what I can see on an upgrade anyway, they don't seem to be enabled by default, but not hard to setup (I did a quick note for ubuntu https://raesene.github.io/blog/2016/02/04/Docker-User-Namesp...)


Yeah, there is a pretty sweet blog post and demo videos (just posted) @ https://blog.docker.com/2016/02/docker-engine-1-10-security/


> Wow, user namespaces! That was quick!

Heh. Yeah, it took quite a while. Lots of kernel bugs. :P


Items of particular interest to monitoring and diagnostics:

1. docker stats --all

Built-in alternative over 'docker ps -q | xargs docker stats' which takes care of dynamic additions to the list.

For consistency, it would be nice to have a similar option in the API stats call to fetch statistics for all running containers.

2. 'docker update' command, although I would have preferred 'docker limit'.

Ability to change container limits at runtime:

- CPUQuota - CpusetCpus - CpusetMems - Memory - MemorySwap - MemoryReservation - KernelMemory

With this feature in place, there is no reason to run containers without limits, at least memory limits.

3. Logging driver for Splunk

Better approach is to enhance generic drivers to be flexible enough to send logs to any logging consumer.


> 2. 'docker update' command, although I would have preferred 'docker limit'. Ability to change container limits at runtime: - CPUQuota - CpusetCpus - CpusetMems - Memory - MemorySwap - MemoryReservation - KernelMemory

This is not correct. You cannot change kernel memory limits on Linux after processes have been started (disclaimer: I've done a bunch of work with the runC code that deals with this in order to add support for the pids cgroup). You can update everything else though.


Thanks. Indeed, an attempt to set kernel memory on a running container will return an error.


Great, I was just forced to change from Splunk to F ArcSight.

So happy right now.


I love the ability of specifying IPs but, I just want to give static IPs to my containers from my private network, and attaching to my already existing bridge does not work, I started daemon as following but no help

> ./docker-1.10.0 daemon -b br0 --default-gateway 172.16.0.1

> ./docker-1.10.0 run --ip 172.16.0.130 -ti ubuntu bash docker: Error response from daemon: User specified IP address is supported on user defined networks only.

But my KVM vms work fine with that bridged network. I know I could just port forward but I don't want to, yes It seems I am treating my containers as VMs, but it worked so fine in default LXC, we could even use Open vSwitch bridge for advanced topologies.


You can create a bridge user-defined network and take advantage of this awesome new feature. https://docs.docker.com/engine/userguide/networking/dockerne...


But doesn't user-defined networks create new bridges? I want use my already existing network. Over SSH, I executed the following and my connection is lost, because my eth1 and probably newly created bridge is in conflict over routing table.

> docker network create --gateway 172.16.0.1 --subnet 172.16.0.0/21 mynet


Yes. `--ip` is supported only on user-defined networks. That is because, the subnet for the default bridge (docker0) can be changed (via --bip) or user can change the user-specified default bridge (via -b) on daemon restarts. If the subnet backed by these default bridge's change, then the container with a assigned `--ip` will fail as well.

Having said that, with Docker 1.9 & above, IP address management and Network plumbing are separate concerns and both are pluggable. One could implement a network plugin with any logic and continue to enjoy all the built-in IPAM features (including --ip). Hence, if you have a specific network plumbing requirement, you could easily spin up a plugin (or use one of the many network plugins that are available out there).


Sadly, https://github.com/docker/docker/issues/3043 is still open, so no multicast support since 1.6...


As mentioned in https://github.com/docker/docker/issues/3043#issuecomment-51..., multicast works in bridge driver. It works for the default docker0 bridge network and also other user-defined bridge networks.

Docker 1.9 brought in Native multi-host networking support using overlay driver. Proper multicast support for the overlay driver would require proper multicast control-plane (L2 & L3). Contributions welcome.


http://github.com/weaveworks/weave enables multicast between containers (and many other things besides).


Yeah, but I needed this for scenarios where I managed the Docker bridge directly - i.e., running a set of streaming servers that are an insane hassle to set up and required frequent upgrades. Docker was perfect for building, upgrading and deploying them.

Weave would just get in the way in this scenario (and has a tendency to over-complicate simple stuff like running an ElasticSearch cluster with auto discovery)


Weaveworker here:

I'd love to know what you mean by "over-complicate". It can be as simple as `docker run --net=weave ...`.

We have a PR open that would let you connect Docker's bridge to Weave Net - https://github.com/weaveworks/weave/pull/1955


Thanks for letting me know, but it's still open, right?

Besides, can't we all just get along and accept that I might want to do things in a specific way because I don't need the extra overhead? :)


> Weave ... has a tendency to over-complicate simple stuff like running an ElasticSearch cluster with auto discovery

Elasticsearch autodiscovery relies on multicast, AFAIK the only way to get it working with Docker is to use Weave (or another overlay network that gives you multicast). Is that not correct?


That is precisely my point. I want to do it without weave. As long as I can control the Docker bridge (and be responsible about it), I should be able to just do it.


For an overview of what's new in this release, check out the blog post: https://blog.docker.com/2016/02/docker-1-10/

The highlights are networks/volumes in Compose files, a bunch of security updates, and lots of new networking features.


It's the danger of running against "latest" all the time...But it's been a day of chasing my own tail when creating a new cluster (Mesos, but that really isn't an issue) and using some tools built against the prior version (volume manager plugin, etc.) that break with updates to Docker.

It seems like if one piece gets an upgrade, every moving component relying on some APIs may need to be looked at as well.

Did a PR on one issue.

Currently chasing my tail to see if a third party lib is out of whack with the new version or it's something I did.

The whole area is evolving and the cross pollination of frameworks, solutions (weave, etc), make for a complicated ecosystem. Most people don't stay "Docker only". I'm curious to see the warts that pop up.


I'm also running Mesos and Docker as a containerizer, and experienced the same problems (i.e. API change on the Docker volumes leads to broken volume driver implementations).

Even within the Mesos environment, there are so many nuts and bolts which have to fit together that sometimes I'm just fed up with the complexity. Furthermore, releases of Mesos and Marathon are not necessarily synched... Stateful apps? No go... Persistent volumes in Marathon? Maybe in v0.16... Graceful shutdown of Docker containers? No go...


Which volume driver? Wasn't aware mesos was providing one.


Sorry for the misunderstanding... I was talking about third-party Docker volume drivers, for example for Ceph.


The --tmpfs flag is a huge win for applications that use containers as unit of work processors.

In these use cases, I want to start a container, have it process a unit of work, clear any state, and start over again. Previously, you could orchestrate this by (as an example, there are other ways) mounting a tmpfs file system into any runtime necessary directories, starting the container, stopping it once the work is done, clean up the tmpfs, and then start the container again.

Now, you can create everything once with the --tmpfs flag and simply use "restart" to clear any state. Super simple. Awesome!


I'd really-really need DNS for non-running containers, somehow. Nginx can't start if an upstream container is down, as its name won't be resolved.


I've had that problem too, and found that you can implement a "wait" command.

If you're using Docker Compose, add this to your environment: environment: - WAIT_COMMAND=[ $(curl --write-out %{http_code} --silent --output /dev/null http://elastic:9200/_cat/health?h=st) = 200 ] - WAIT_START_CMD=python /code/lytten/main.py - WAIT_SLEEP=2 - WAIT_LOOPS=10

Then, create a 'wait' bash script in your app's source code that looks like this: !/bin/bash echo $WAIT_COMMAND echo $WAIT_START_CMD

    is_ready() {
        eval "$WAIT_COMMAND"
    }

    # wait until is ready
    i=0
    while ! is_ready; do
        i=`expr $i + 1`
        if [ $i -ge $WAIT_LOOPS ]; then
            echo "$(date) - still not ready, giving up"
            exit 1
        fi  
        echo "$(date) - waiting to be ready"
        sleep 10
    done

    #start the script
    exec $WAIT_START_CMD
Then, finally, for your nginx container, add: command: sh /code/wait_to_start.sh

to its specification


Just in case a shorter version appeals to you, I think you could rewrite it like this:

  timeout 60 bash -c 'until is_ready; do sleep 10; done'
You'll need to `export -f is_ready` if it's a shell function. That's assuming you can use a timeout instead of number-of-retries.

http://man7.org/linux/man-pages/man1/timeout.1.html


I was trying to find the source of where I got that bash script from, but couldn't, and then forgot about it within the editing window.

I'll try to find it and provide that feedback. Thanks!


we've got consul set up in a way that points names to a known 503 service if its containers are not up.


please, I just want fewer bugs after creating/destroying a few hundred containers on a host.


Nice to see building from stdin working again.

https://github.com/docker/docker/issues/15785


If you're on OS X and using docker-machine, the command to upgrade is `docker-machine upgrade default` where default is the name of your VM.


ossreality 5 hours ago [dead]

Apparently no one else has been paying an ounce of attention... And you get downvoted for it. The HN way! https://github.com/docker/docker/issues/19474 Least of all you're forced to go through their DNS server which doesn't support TCP. Boy, this is absolutely going to fuck people. Because I bet a bunch of people are going to run Go containers in 1.10 engine. And guess what happens when you send a Go app a DNS response, in UDP format, that is larger than 4096 bytes? You get a panic and crash! Woohoo! And yes, there are DNS servers that incorrectly throw out UDP DNS responses larger than 4096 bytes. Can't wait for my containers to fail because of fucking Docker putting a DNS service in Engine. Unacceptable. Docker should've realized they needed to think about this stuff all-the-why shykes was too busy picking fights with people as Kubernetes encroached on what he saw as "his" territory. There's a reason that everyone is very excited about the rkt announcement today. Particularly amongst some Kubernetes users... (In the interest of not tainting the waters, I do NOT work for Google)


FYI. TCP support for the embedded DNS was added via https://github.com/docker/docker/pull/19680.


Do you have a source on the Go UDP/DNS crashing?


How's the reliability story going, nowadays? Especially around layed filesystems.


>Use DNS-based discovery instead of /etc/hosts #19198

This will end well...


It uses DNS for only discovering the other containers within the same custom network, if query is not found the DNS redirects the queries to your own DNS server, I don't see how it can be worse than /etc/hosts solution.


Exactly, obviously /etc/hosts will still take precedence, but instead of munging your /etc/hosts when starting a container you can just use their DNS server. I don't see a problem with this.


Could you explain why?

Simply from a learning perspective. I just don't know why and would like to know.


Because /etc/nsswitch.conf exists to do exactly what's needed here. Now there's an extra layer of need-to-know that adds confusion. I can almost guarantee that there's going to be a major outage somewhere, someplace because of this change.


I tend to agree with you. Between time to live, caching, and various namespace conflicts, this could make for a very large troubleshooting headache.


Docker assumes it cannot trust the container's DNS resolver to respect TTL and cache in a compliant way. So it guarantees stable name->ip mapping for each container. As a result, when you point a service endpoint at another container, it's the IP/container mapping that is changed, which is a much more reliable and atomic change. I would definitely never rely on changing DNS records to orchestrate changes to my production stack, that would be way too brittle.


The real problem is that you can't trust the program's resolver, either. Java will behave differently than Go, which behaves differently from Python, and so on.


This is when you specify you want a user defined network and want to rely on docker networking.

I'd give it some time before using that in production and use your own DNS/Service Discovery mix.


What does it mean that the LXC backend has been deprecated?


Nothing, since no one has used that backend in years. Note that the LXC command-line utility is not the same thing as Linux containers, which Docker still uses.


It's not deprecated anymore. It's been removed. People have been using the native driver (libcontainer) for 2 years. LXC was deprecated in 1.8 and the code was completely removed in 1.10.


how does that work in terms of semver? Removing something means having a new major release number.


Well ... it's an alternate backend which has been known to be "a bad idea to use" since 0.11. LXC stopped being the default a long time ago, and anybody using it right now REALLY shouldn't be.

Not to mention that I'm not really sure that Docker Inc has strong feelings about semantic versioning (I don't work for Docker).




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: