Hacker News new | past | comments | ask | show | jobs | submit login
Don't expose the docker socket, even to a container (lvh.io)
133 points by lvh on Sept 24, 2015 | hide | past | favorite | 101 comments



This assumes that Docker containers are being used like VMs. They're not designed to allow running isolated arbitrary code in some sort of multi-tenancy setup, they're designed to isolate dependencies and configuration between your own deployed services. This is a "security vulnerability" in the same way that putting up a white picket fence around a jail is - a gross mis-application of a tool built for a completely different purpose.


Listening to what the Docker people write about security, they sure do sound like they are designed to be secure, even though they admit to a couple of potential shortcomings versus VMs.

https://blog.docker.com/2013/08/containers-docker-how-secure... concludes,

"Docker containers are, by default, quite secure; especially if you take care of running your processes inside the containers as non-privileged users (i.e. non root)."

They also recommend using SELinux with Docker to beef up security (see https://blog.docker.com/2014/07/new-dockercon-video-docker-s...)

As long as people are using containers as a security boundary it makes sense to pay attention to things like this one about the control socket.


There is a section in that link titled "Specific Attack Surface of the Docker Daemon". Meanwhile they are ignoring the attack surface of the Linux kernel. There is no mention of seccomp in the whole post.

People find privilege escalation bugs in the Linux kernel often enough that you can't really claim that running arbitrary native code is "secure" unless you're doing something to mitigate these. Things like:

http://www.openwall.com/lists/oss-security/2015/07/22/7

Note that SELinux won't do anything about this kind of bug. You have to use seccomp and similar to disable large swaths of the Linux kernel API, particularly exotic parts that aren't well-tested or reviewed.


there is no mention of seccomp because virtually no software can utilize it.


Sandstorm.io (disclosure: my project) applies mandatory seccomp and other attack surface reduction yet manages to run a lot of apps with only minimal modification:

https://docs.sandstorm.io/en/latest/developing/security-prac...

Admittedly, though, "only minimal modification" is probably too much modification for Docker's goals. That's fair. But I don't think that gives them a pass to pretend the problem doesn't exist.


As a Chrome, systemd, and Sandstorm user, I'm a little curious what you mean.


qemu uses seccomp. If you run your VMs with libvirt, then seccomp can be used to protect the host from qemu easily and routinely.


OpenSSH and BIND use seccomp. :)


I second the idea of using SElinux to secure docker because most people really are not able to fully deal with the complexity of SElinux. If they would understand security they wouldn't be jumping onto the docker hype bandwagon in the first place.

To paraphrase Theo de Raadt:

‟You are absolutely deluded, if not stupid, if you think that a worldwide collection of software engineers who can’t write operating systems or applications without security holes, can then turn around and suddenly write virtualization layers without security holes.”

[1] http://blog.valbonne-consulting.com/2015/04/14/as-a-goat-im-...


> They're not designed to allow running isolated arbitrary code in some sort of multi-tenancy setup

You mean Linux containers are not designed for this, nor are they designed to be secure. What a sad design failure.

Some [1] container technology was developed with security as a first principle.

[1]: http://us-east.manta.joyent.com/jmc/public/opensolaris/ARChi...


Linux containers are certainly intended to be secure. The problem is that the Linux kernel API is huge and privilege escalation bugs are found all the time. In order to make a container secure, you need to disable 95% of this API, e.g. using seccomp, not mounting /proc or /sys, drastically limiting /dev, etc. Sandstorm.io hasn't seen a single working breakout since we started keeping track over a year ago, despite numerous Linux kernel exploits going by in that time, because all the bugs have been in features that we turned off.

https://docs.sandstorm.io/en/latest/developing/security-prac...

That said, there is a cost in compatibility. Most apps can be made to run just fine in this constrained environment, but it does sometimes require tweaks. Docker is more interested in compatibility than in security, so naturally they don't do this kind of attack surface reduction by default. (You can configure it manually, but realistically if it isn't mandatory then few people will bother.)


It is also important to note that Microsoft is not offering shared kernel on Azure using their new container technology because they believe it is too insecure. Microsoft might be a special case because their kernel has an especially large surface area. But I think the current cloud providers that are providing shared kernels are walking a very fine line when it comes to security. Seccomp obviously mitigates this problem and probably puts you in a better position than using a hypervisor if you lock things down very aggressively.


And you can run Docker containers on SmartOS, using "LX-branded zones", which do Linux syscall emulation:

https://www.joyent.com/blog/triton-docker-and-the-best-of-al...

I'd be curious if the Joyent folks believe Triton to have the security properties that everyone seems to want out of Docker. It doesn't involve the standard Docker daemon, so it might.


@geofft that is a great link, our CTO Bryan Cantrill continued to beat the drum at the Docker SF meetup in May 6, 2015. Video and transcript at https://www.joyent.com/developers/videos/bryan-cantrill-virt... . My favorite line is, "And, so in terms of the security of containers, from our perspective they have to be secure because if you're able to break out of a container into the global zone of SmartOS, you could delete the company."

Yesterday at http://containersummit.io/, Bryan's talk "Going Container Native" (video coming soon) was in the same vein, SmartOS (illumos) directly benefits from its Solaris lineage -- Solaris Zones were engineered for security. We, Joyent, absolutely do believe Triton has the security properties that everyone seems to want out of Docker, because Joyent has been running OS containers in multi-tenant production since ~2006.

Open source SmartDataCenter [1], rebranded Triton, is the exact same code that is run in Joyent's Public Cloud. This truth in code, product and service contributed to me leaving an OpenStack company to join Joyent two years ago.

1. https://github.com/joyent/sdc


Bryan's talk "Going Container Native" from earlier this week is now online: http://containersummit.io/events/sf-2015/videos/going-contai...


Isn't this specifically a Docker failing? My understanding (from reading rather than use) is that LXC (the original Linux container project) provides a lot more in the way of security than Docker does?


As far as I can tell, Solaris Zones are no more secure than Linux containers; both use kernel-level isolation mechanisms, while sharing the same kernel between zones/containers. And thus they both have the same security property: secure as long as you can't successfully exploit a syscall, but if you can, you get kernel-level access.


The security property of "secure unless there's an exploit" applies to basically everything (maybe excluding formally verified software), including Xen and KVM, so I don't see how that's a specific knock against Zones. Yes, if there's an exploit that lets you do something you shouldn't be able to do, you can break out of the container/VM. But, like Xen and KVM, Zones are so far fairly successfully used as an isolation mechanism for the public cloud, for some years now. And all three make the claim that you ought to be able to rely on their isolation mechanisms, and any hole in them will be considered a serious bug. Whereas the way Docker is using Linux containers doesn't seem secure enough to make that claim or be used for that purpose, at least not yet.


> The security property of "secure unless there's an exploit" applies to basically everything (maybe excluding formally verified software), including Xen and KVM, so I don't see how that's a specific knock against Zones.

Defense in depth. With KVM, you'd have to exploit the guest kernel, then exploit KVM's hardware emulation to get to host userspace, then escalate to host root/kernel. (And KVM can use seccomp to limit itself, making that last step harder.) With Linux containers or Solaris Zones, you run directly on top of the host kernel, so you just need a single kernel exploit.

In any case, Zones and containers have the same security properties (running directly on top of the kernel), so the original comment about Zones being more secure than containers makes no sense.


> Whereas the way Docker is using Linux containers doesn't seem secure enough...

Which is totally expected, because docker tries to be useful to the largest user-base possible. It would be quite harder to use if didn't support directory mounting (via -v). And it would be a total nightmare for almost every user if you had to specify a list of allowed syscalls for every container.

This reminds me the situation with SELinux a lot. It has improved a lot but I still see "disable SELinux" almost in every tutorial I read on CentOS, Fedora or RHEL.


> This reminds me the situation with SELinux a lot. It has improved a lot but I still see "disable SELinux" almost in every tutorial I read on CentOS, Fedora or RHEL.

Because security is hard.

Just look at the Mac OS X users running as root and disabling Gatekeeper.

Or the developers that stay away from the sandbox model.

One of the nice things of mobile OSes is that there isn't a way around the container model. Although the history with permissions kind of messes it.


I know of only one Solaris Zone escape CVE in its entire existence.

http://www.cvedetails.com/cve/CVE-2008-5689/

I also know of only one FreeBSD Jail escape CVE in its entire existence.

https://www.freebsd.org/security/advisories/FreeBSD-SA-04:03...

These were years ago. Jails and zones are everywhere. You're kidding yourself if you think people haven't tried to attack them.


Oracle doesn't tend to disclose security vulnerabilities; they just silently fix them; the absence of CVEs is not the absence of vulnerabilities. On top of that, a vulnerability wouldn't need to be Zones-specific; almost any kernel exploit would work, unless it uses a syscall that only works in the "global" zone.


You forget that Sun was in charge of Solaris for many years and was pretty open about such things.


Any exploit against the kernel gaining code execution is a Solaris Zone escape or Freebsd jail escape.

Solaris got hit by the one in 2012 which is the non-canonical RIP which hit a bunch of operating systems.

FreeBSD has had numerous kernel exploits which could be used for zone escapes.


Really? Which ones? I'm pretty sure local privilege exploits that give you root don't magically jump you into JID 0 (prison0), escaping the jail. You'd just have "root" in the jail.

edit: well i guess if your attacker knows you're on FreeBSD they could override to prison0 in the exploit. But it's a good thing we dont have a ton of local priv exploits, eh?

edit2: and that was a pretty ignorant statement of mine about root exploits anyway :)


Are linux containers really not securable? That's how Google runs all their stuff in production.

http://research.google.com/pubs/pub43438.html


Google generally doesn't have to worry about mutually-untrusted containers.

The meaning of "Linux containers are not secure" is that untrusted code should not be given root privileges within a container. Google is generally not doing that. They have e.g. trusted Gmail code running on the same machine as trusted YouTube code, handling untrusted emails and untrusted videos. But the Gmail team is not worried about the YouTube team hacking them, or vice versa. The security mechanisms just need to keep honest people honest.

And when they do have untrusted, third-party code to run, for Google Cloud Platform, they use VMs or actual sandboxes: see section 6.1 of the paper you linked.


No, but the Gmail team might be worried about some Russian guy finding a buffer overflow in YouTube's application. Then what? They can escalate privileges and read your email?


Google's deployment of Docker is less susceptible to this. They do additional partitioning of applications.


Yeah... so it seems like Linux containers can be sufficiently hardened?


Not really. You physically seperate gmail and youtube.


Source? How do you get 80% CPU utilization if you have to physically separate all your job types?

http://csl.stanford.edu/~christos/publications/2015.heracles...


Google has way more applications than just Gmail and YouTube.

Basically, if someone breaks into Gmail, you're already in deep trouble. You're not significantly better off because they didn't break into Hangouts. Same in the other direction. So you can run those on the same hardware.

If you think YouTube is less sensitive than Gmail (I don't know if Google actually does), you can separate those, sure. But there are enough applications that are no worse to break into than YouTube, like Photos. You can run those on the same hardware.

At Google's scale it's not too hard to say, this is the stuff we're ridiculously paranoid about and this is everything else, and get 80% CPU utilization on both infrastructures.


Google can afford to keep separate server flights for each app. They have a lot of money and a lot of dedicated computer power for actual revenue generators.


I should clarify I mean Docker containers specifically, good point.


Specially since HP-UX and Tru64 were doing it already in the late 90's.


Absolutely; as I mention in the article. There's nothing new or exciting here; there's simply a big discrepancy between reality and how a lot of users understand it. This is partially true for the my-dev-user-is-part-of-the-docker-group case, but even more so for the container-with-docker-sock-access case.


First let me state that this article is interesting and well written and taught me something new, thank you for that.

However my main frustration is I'm not really sure what the article is advocating for here. Instead of not giving access to the docker daemon to containers (which is legitimately needed for complex deployments where one container needs to dynamically start up "sibling" containers, e.g. a CI service), wouldn't it make more sense to talk about not viewing Docker container security the same as VM security in the first place?

Sure if you're going to do that anyway it makes sense to disable access to the socket, but then there's a million other things you'll have to do because docker containers are currently not primarily intended as a replacement for the isolation security of VMs. Their security is more like a useful extra layer, rather than a full blown replacement.


Check www.hyper.sh and https://github.com/hyperhq/runv

They can boot a new VM with Docker images in 200ms, which is very close to LXC, and perfectly isolated by hypervisor.

The problem of "Virtual Machine" is not "Virtual"/Virtualization, the problem is the full blown guest OS, aka "Machine".


Running docker inside VMs loses all the IO benefits. That's not to say I use docker at all: I don't for this reason. Someday it will be solved and docker will actually be as production ready as its proponents think it is.


Since folks seem to be new to docker and Xen: Hypervisors like Xen have very poor IO performance: one of the major benefits of docket is that it provides containerisation without the IO overhead of a hypervisor. Neither of those have been disputed by anyone, ever.


Docker is not a sandbox and was never intended to be. A comprehensive SELinux profile for untrusted Docker containers could be developed, but I've yet to see one. If you want to run untrusted Docker containers, then this is what you want.


I don't understand the point of Docker. It seems like a great product. For any serious production grade containerization , I'd use a real virtualization solution like KVM, or VMWare.


It's containerization between trustworthy apps; it's not security containerization. What it gets you is, if you have one application that's designed to run well on RHEL 5 with /usr/bin/python pointing to Python 2.4, and another one that's designed to run well on Debian testing with a manual /usr/bin/python symlink to Python 3, you can give both of them what they want. This has nothing to do with security.

If you want Docker + security isolation, I'm intrigued by Clear Containers, which is a lightweight KVM-based virtualization thing:

https://lists.clearlinux.org/pipermail/dev/2015-September/00...

https://lwn.net/Articles/644675/


If you want to try out Clear Containers we merged it into the rkt container runtime a few weeks ago. And since rkt can run both docker and appc container images you can run any existing app container from this runtime. This blog posts details Clear Containers in rkt:

https://coreos.com/blog/rkt-0.8-with-new-vm-support/


> It's containerization between trustworthy apps; it's not security containerization.

Isn't that what a process is? The containers in Linux are based on Jails from FreeBSD and zones from Solaris. They are absolutely there for security.

Regarding the remaining part of your post, I understand what you are trying to show but python is a really bad example. You absolutely can have python 2 and 3 side by side, or even different minor versions. And with virtualenv or pyvenv (that came with 3.4) you can even have multiple installation of the sane version. If you add setuptools to your application you can easily generate single file package (I personally like wheel) the deployment is as simple as writing pip install myawesomeapp-1.0.py2.py3.whl it downloads all dependencies. There is not much that Docker would help, it only makes things more complex.


> Isn't that what a process is?

It's the level of isolation of a process, yes. Just as two processes can use their address spaces as they see fit without bothering each other, even loading different versions of the same library, under a Linux container, two applications can use their filesystem as they see fit without bothering each other, even using different versions of the same binary applications.

But the security isolation between two processes running as the same user account is extremely weak. While it's true that one process can't write to another one's memory directly, it's not a fundamental breach of the security policy if it can do so indirectly. There may be things to increase defense-in-depth (like Yama) but fundamentally if you're the same UID there is no security boundary. The same rule applies to containers.

> python is a really bad example

Yeah, agreed. I was just trying to come up with something quick. If your app works with v(irtual)env, by all means just use that and stop messing with containers. However, if you've got some large closed-source app with a portion in Python, and it expects /usr/bin/python to both work and be some exact version, you need to virtualize the filesystem.


> It's the level of isolation of a process, yes. Just as two processes can use their address spaces as they see fit without bothering each other, even loading different versions of the same library, under a Linux container, two applications can use their filesystem as they see fit without bothering each other, even using different versions of the same binary applications.

I'm sorry for disagreeing, but that's what chroot() or even chdir() supposed to do. It's not for security (process can fool it), but they do provide isolation assuming there are no malicious actors.

Containers were created to provide security, perfect example is FreeBSD Jail which precedes Solaris zones. It supposed to be secure version of chroot() which should not be escapable. It was successfully used in early 2000 before VMs to provide shared hosting.

> But the security isolation between two processes running as the same user account is extremely weak. While it's true that one process can't write to another one's memory directly, it's not a fundamental breach of the security policy if it can do so indirectly. There may be things to increase defense-in-depth (like Yama) but fundamentally if you're the same UID there is no security boundary. The same rule applies to containers.

Agreed, except the last sentence. With processes the isolation is weak because same UID represent the same user, if you use a different UID the isolation is enforced. The containers (assuming they are correctly set up) allow you to actually have two root accounts that can't interfere with each other.

> Yeah, agreed. I was just trying to come up with something quick. If your app works with v(irtual)env, by all means just use that and stop messing with containers. However, if you've got some large closed-source app with a portion in Python, and it expects /usr/bin/python to both work and be some exact version, you need to virtualize the filesystem.

Assuming these are not malicious, you can just do:

chroot /app1_root python myapp1.py chroot /app2_root python myapp2.py

And as long as they don't use any tricks to get out, they won't step on each others teas.


> The containers (assuming they are correctly set up) allow you to actually have two root accounts that can't interfere with each other.

To the best of my knowledge, Docker (the official implementation) does not do that. rkt does, as mentioned at the bottom of this blog post mentioned elsethread:

https://coreos.com/blog/rkt-0.8-with-new-vm-support/

(The Linux implementation of this is somewhat poor, in that you need to have a separate UID reserved in the global namespace, and you can only do 1:1 maps in containers. A nicer implementation would treat the user principal as a (container, UID) tuple. I recall that Linux tried that, but gave up for backwards-compatibility reasons.)

> chroot /app1_root python myapp1.py

Yeah, I think 80% of what Docker actually gets people in practice is a system for managing and running things in chroots. Containers also let you give them separate networking setups, track PIDs properly, and apply resource controls. But I've seen homegrown approximations that preceded Docker, based on stuff like schroot.


Pretty much. It is marketing multiple exiting open source Linux technologies (overlay file systems, namespacing for processes, network sockets, chroots) under a set of tools and bam! -- hundreds of millions of dollars in valuation.


Well, more than that, it's packaging them into user-friendly tools and promoting the shit out of it.

Still Docker the company's core value proposition is a hosted registry, something many savvy corporations will never go for. Docker the product could probably do just fine if the company were to fold.


but the hosted registry is what your average distribution already provide under the form of packages.

and with everything moving to services, I see the utility of actually using components diminishing fast (well except for those providing those services)

but for everyone else, docker solves no actual problem that can't already be solved now.


Writing a Dockerfile and typing "docker run whatever" is a hell of a lot easier than configuring a chroot and LXC container.


virtualenv only works if you want an isolated Python environment. What if you have two things that each want a set of .so libraries with incompatible versions, and one wants node.js? In theory this could be managed with something like NixOS, but containerization is the much more mature and flexible solution.


In that situation you can use LD_LIBRARY_PATH, that's what it is for.

But what you really want in that case is to link the application statically. If you don't want to have benefits of shared objects:

- smaller binary - memory savings (if multiple programs are using the same library, it is loaded once) - less files to patch to fix a security vulnerability

The share objects have these features but it comes at price of lower performance, so by putting all .so files into a single docker file instead of statically compiling your application you're getting worst out of both worlds.


Loading .so files doesn't mean the language can be statically compiled.


How is that a problem? That's bascially the same thing as installing a newer set of tools under your home directory, and using them for an application. This has been done since long before Linux.

(Containers are still good, for isolating concerns and management. Multiple versions of the same library is just not it.)


so it's basically a complete set of workaround for enabling people to live with brain-damaged applications?

and we are encouraging this now, instead of fixing the damned apps?

sheesh this makes me feel old.


Yeah, the UNIX-Haters(tm) view of Docker is something like, this is what you get when you give up on static linking, give up on a UNIX spec, and make the mistake of telling people that chroots exist. We could have avoided all of this, and normal application deployment would have just worked with all the benefits Docker gives.


Containerization and virtualization serve different purposes. VMs run actual operating systems within them. A single operating system runs many different containers, that each act something like processes running on that same OS, in a way where they're highly sandboxed and segmented from each other.

If your goal is strong isolation, then VMs are definitely better today. The purpose of Docker and similar container technologies is not that kind of isolation. It's to package up and distribute applications in a way that's more decoupled than simply installing them all on the same system.


I'm not hiding that after I learned about Docker I became skeptic. It seems like yet another thing that people observed what Google was doing, and then implementing it wrong.

Google is using containers instead of VMs. This still provides security isolation and allows them to use resources more efficiently (VM has overhead where you need a whole OS for every instance).

This approach does not make much sense in public cloud, where you already run inside of VM and the overhead is really for Amazon not you. So I see Docker is now pivoting to be a package manager, but there are already tools that do that. You can argue that Docker is simpler but so was rpm when it started. As Docker will grow it will become more complex in order to support all functionality package format already provides. There might be an argument that you can run multiple Docker containers on a single host, but that's what processes are for.

There is change happening, and looks like cloud companies want to create "cloud os", I guess Docker is step toward that direction, but at current state in don't see it offering anything valuable to the organization that uses it.


Docker as a package manager somewhat matches my use, and I think it does its job well. As the packager, I can just create one "package" that any Linux distribution running Docker can install, instead of creating packages for multiple versions of many different distributions.


I think that container images will replace traditional package management in many cases. Package management in Linux has given us many great things:

1) Easy global mirroring of software using "boring" protocols like http and ftp

2) Cryptographic signing of the software so we can trust mirrors and systems that put users in control of who to trust.

3) Human significant package names that are easy to pull onto a host e.g. `apt-get install $name`

Where package management broke down:

1) Package collisions e.g. If I want to install a new custom build of python it replaces the host version and everything may break. The python3 v python2.6 problem.

2) Dependency namespacing. e.g. If I want rely on a non-official mirror to ship me whizbang project X they could also replace my libc by adding it to the repo because the names and versions collide.

Making sure that we hold onto the good three properties of package management while fixing the two problems is important for Linux moving forward. The last 15 years of Linux was dominated by the centralized package management system and a ton of hacks have developed to work around it when you need a new package or want to install custom software. This is why I spend so much time working on container image specs like appc and oci; I hope we can arrive at a good container image format for the next 15 years that everyone can rely on.

For examples see slide 6 forward here: https://speakerdeck.com/philips/rkt-and-the-need-for-the-app...


> 1) Package collisions e.g. If I want to install a new custom build of python it replaces the host version and everything may break. The python3 v python2.6 problem.

I'm sorry, but you hit my pet peeve when you used python as example. Python was written in a way that you can have multiple versions installed side by side and they all can work without problem. If you have a python3 package that is uninstalling python2.6 then its author screwed something up and I would be afraid to use it at all.

> 2) Dependency namespacing. e.g. If I want rely on a non-official mirror to ship me whizbang project X they could also replace my libc by adding it to the repo because the names and versions collide.

Is that still an issue? Zypper in OpenSuSE is quite smart and for each package it remember which repo it is from. It tries to satisfy dependencies without changing vendor, and prompts if only way to satisfy dependencies is to change vendor.

That said if an application requires different version of glibc much better would be to compile it statically, but even then glibc supposed to match your kernel version so you're still risking some incompatibilities.


Static glibc doesn't work anymore. If you need the internet then you'll be pulling in the nss plugins anyway.


GNU Guix is a package manager that solves those 2 problems while not sacrificing the 3 good properties. It also offers additional nice features like unprivileged package management, transactional upgrades/rollbacks, full system configuration management, and a tool that can be used like a language-agnostic virtualenv. Built-in Linux container integration is also on the way.

https://gnu.org/s/guix


What language is it in, why not use packaging from the language itself. It would be more likely to work across many different OSes not just Linux.

If it's a binary, why not compile it statically? That way it's a single file and it also will perform faster (using shared objects has overhead)


Packaging in the language itself ranges from difficult to impossible for most non-trivial apps. Popular runtimes like python, node, and ruby have many extensions and packages that rely on C and share libraries. Because of that a compiled binary package will not be portable between machines or require that the machine you deploying to have a working compiler to build the C package from source. I have seen vast amounts of engineering effort invested into porting C code to slower "native" code just to make packaging work. With containers you can avoid the mess of trying to share the `/` filesystem namespace with the host and concentrate on getting your application working on a chroot that is portable between Linux kernel systems.

I 100% agree with the compile it statically thing. This is what many major internet properties likes Google do and I would argue part of why Go is so popular. But, it is hard for the vast majority of applications that expect to open files for assets and lack build systems for static compilation.


KVM & VMWare are not containerization, they're full virtualization.

There are a lot of benefits to containers and they don't have to be insecure. More efficient resource utilization and orders of magnitude faster allocation and launching to name two.

Google runs a significant portion of its internal operations in a container infrastructure and has for quite a while.[1]

They're perfectly capable of deployment into production environments.

I won't comment on docker as I haven't spent the time to fully grok all its warts.

1. http://research.google.com/pubs/pub43438.html


> Google runs a significant portion of its internal operations in a container infrastructure and has for quite a while.[1]

They use containers inside virtual machines. Virtualization for security, containers for deployment.


I know from Joe Beda's talk [1] they run vm's inside containers for scenarios where they need a managed os. And that those containers run on bare metal. But I can't speak to the reverse not being an employee or authority on Google's internals.

1. https://speakerdeck.com/jbeda/containers-at-scale


Think of it like a tool for packaging and deploying applications with everything they need. Its purpose is closer to package managers than being a secure sandbox for running untrusted users' VMs.


Funny, an article got posted here just an hour ago advocating against treating Docker like a package manager: http://vilkeliskis.com/blog/2015/09/23/docker_is_not_a_packa...

I'd be inclined to agree. The reductionist sum of mechanisms that make a Linux container have always been about detaching, multiplexing and partitioning kernel resource subsystems. Docker was the first program that really hyped it into the idea of being about application deployment, but I fear this gives people wrong impressions and makes the mistake of treating an emergent property as if it were a fundamental.


The first part of your post made sense.

I have no idea what point you're trying to convey with the second part. That applications are not business logic over kernel resources is a curious argument to make.


that was my initial impression i got from reading some posts on the Docker Site, and perhaps there is no consistent definition of "package manager"--to me though, the most difficult tasks a p/m must do are: reproducible builds, dependency management, conflict resolution among transitive dependencies, and the like. But Docker does none of these things as i understand it.


We[1] believe that the point of Docker is to provide "application packages" (called containers), which is a big step ahead to deliver applications (using their words: build, ship, run).

However, we also do believe the isolation containers provide isn't sufficient for multi-tenant usages. This is the main motivation behind Hyper, which run groups of container images (Pods) as Virtual Machines.

[1] https://hyper.sh


Any virtualization solution is going to require you manage an operating system. One of the goals of contanerization is for developers to only work with the application.


Unfortunately most docker containers seem to bring along 200+mb of operating system with them.


This. Only 2G, not 200M. I try to get people to package application plus dependencies yet this is what they do every time. Every single time.

Plus they always base it on images from the Internet, so bascially we trust some stranger with root privileges to all our data. Not always on the same image of course.


Yes, this is a fundamental issue with Docker and other container systems that work with raw disk images as their basic unit of information. I have implemented a container system for the GNU Guix package manager that doesn't have this image bloat problem because it doesn't use opaque disk images. We store packages in a content-addressable storage system, which allows us to know the precise dependency graph for any piece of software in the system. Since we know the full set of software needed for any container, we are able to share the same binaries amongst all containers on a single host via a bind mount.


The binaries have to run somewhere. Container evangelists love to espouse the purity of running any container on any dock OS, and this is as true as being able to migrate VMs between any host - it still comes down to what the application/VM needs from the underlying OS/hardware.


Mostly it comes down to lack of experience with containers so far, and lack of tools.

Most apps need very little from the underlying OS if you actually take the time to e.g. set up a toolchain with a build container that you then move the build artefacts out of to install into the final container. Instead you see a lot of containers that in effect include all the build dependencies and a nearly full OS pulled in by that.

It is improving, though slowly.


If your VMs are single-purpose, you don't necessarily need VMs. Containers are single-process things - they're not running syslog or cron or any of that overhead, for example. Docker is also big on ensuring you are using the literal same artifact in dev as on prod (assuming you change your team's workflow, of course).

Which is the right thing to use is entirely dependent on your use case.


> Containers are single-process things - they're not running syslog or cron or any of that overhead, for example.

You're either being entirely too prescriptive or interchanging containers and Docker freely when they are not equivalent, though one is a particular implementation of the other.

People have been running containers that emulate a full system for quite a while (see FreeBSD jails, illumos/Solaris zones, Linux OpenVZ, LXC, etc.).


Can someone describe a real-attack scenario, using Docker's default settings as of the latest version? I see a lot of people claiming that it's insecure but no concrete examples of exploits. I am not saying that it's as secure as VMs, or that it's inherently secure - I am genuinely* interested and really do care about this.

*We're in a limited beta of cloudmonkey.io, and we want to run unrelated/untrusted containers side by side securely.


I assumed that's what the video in the article was about [0], but I haven't watched it to be sure. Is there anything in particular that isn't clear from the article? Or are you asking about docker security concerns beyond what this article is about?

It seems the main point is if there is any way to exploit code running within a container that has unfettered root access to the host system via the docker socket, an attacker would then have complete control over the host system.

Exploitation is often mitigated in layers, where if Service A is exploited, an attacker can only rwx what and where Service A has been granted priveleges to rwx. That should be as little as possible, the bare minimum access that service needs to operate. There's no reason your web server or database should be able to install new programs, create users, etc.

If Service B is running in a container and is given access to write to the docker socket, suddenly any exploitation of that service opens a door to immediately have full and unfettered root access to the host system.

> [0] FTA "... ended up making a screencast to unambiguously demonstrate the flaw in their setup..."


A container is just running a process or chroot in its own namespace. If you run it as it is you have a single process container like Docker uses, if you run an init in the namespaced process you have LXC or OS containers that can support multiple processes like a lightweight VM. [1]

With LXC containers you start them as root and there is no lingering background LXC process running. Docker also starts containers as root but also has dockerd hanging around presumably so non root users can interface with it. But the container process is still running as root so dockerd seems a bit redundant and unnecesary.

This is because untill recently you couldn't run chroot as non root users and needed to run containers as root. But 'user namespaces' (> kernel 3.8) changes this and allows users to run processes in namespaces as a non root user. LXC has supported unprivileged containers for some time now [2] so you can run LXC containers as non root users, as in the entire container process is unprivileged. Docker and Rkt are working on this but its not simple to implement for container managers as non privileged users cannot access networking and mounts. But when it does presumably dockerd can run as an unprivileged process.

But Linux kernel namespaces have not been designed for multi-tenancy for instance cgroups are not namespace aware, and untill this changes in the kernel, containers will not provide the level of isolation or security required for multi-tenant workloads.

And containers managers like LXC or Docker that take these capabilities and merge them with networking and layered filesystems like aufs or overlayfs cannot work around this. Parallels OVZ is designed for multi tenancy but the kernel patch it appears is too large and invasive and doesn't look it will be merged.

So user namespaces is one level of security and isolation, you can also use seccomp, app armour, selinux or even grsec. But you have to find the middle ground between security and usability and given the relative confusion about containers, namespaces, and container managers it will take time to mature.

[1] https://www.flockport.com/how-linux-containers-work/

[2] https://www.flockport.com/lxc-using-unprivileged-containers/


To be honest, exposing host's /var/run/libvirtd.sock to a guest VM will have exactly the same consistencies.


Do... people really offer docker sockets to running containers without thoroughly vetting them first? Are people really that good at avoiding the warnings?

I mean, I know it's popular to pass the socket in for automatically re-configuring proxies... but I haven't seen any serious use outside of that.


In other news, mounting / gives you access to the root filesystem.


dockerd -- nope. It's `docker -d`


'daemonisedthingd' is standard UNIX terminology.


Docker is a rootkit over HTTP.


>A rootkit is a collection of computer software, typically malicious, designed to enable access to a computer or areas of its software that would not otherwise be allowed (for example, to an unauthorized user) while at the same time masking its existence or the existence of other software.

Doesn't really describe Docker, does it?


It does if you're talking about `docker -d` and --privileged.


Rootkits hide themselves. Docker makes no effort to do so.

I know you're trying to make joke, but it's not funny.


That's the joke, literally.

This is a blog post about how docker is insecure when you can get root perms over a socket the docker daemon exports when you explicitly map it back into a docker container... as if it's unexpected when it's explicitly docker's design and the command line you provided.

Oh well, I did think it was funny.


it hides in plain sight


I don't really get this, the implication is the container becomes more secure without access to the socket, yet it has access to the hundreds of local kernel APIs with which on the average month it can easily gain higher privileges than root, especially on contemporary machines where half the admins around these days don't even know what a security update looks like


Why knowingly give a trivial breakout vector to code you don't trust?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: