Hacker News new | past | comments | ask | show | jobs | submit login
What I found wrong in Docker 1.12 (linux-toys.com)
390 points by rusher81572 on Aug 26, 2016 | hide | past | favorite | 228 comments



Disclaimer: I work at AWS, but on a product which does not compete with Docker or its orchestration tools in any way shape or form. My opinions are my own.

I wouldn't even limit this to just the swarm feature. We've been running Docker in production for a year, using it in dev environments a year before that, and we've had major problems nearly every release. We had to upgrade directly from Docker 1.7 to 1.11 because every release in between was too unstable or had glaring performance regressions. We ended up running a custom build and backporting features which were worth the risk.

Speaking of 1.12, my heart sank when I saw the announcement. Native swarm adds a huge level of complexity to an already unstable piece of software. Dockercon this year was just a spectacle to shove these new tools down everyone's throats and really made it feel like they saw the container parts of Docker as "complete." One of the keynote slides literally read "No one cares about containers." I get the feeling we'll be running 1.11 for quite some time...


To provide some weight the other way, we've been using Docker in production for about 3 years now, and have not had any big issues. Obviously, you guys probably have a bit more extreme use cases at AWS. Things that bug us are generally missing features, but those gradually get added in over the course of the years, though some get less love than others.

For example for some reason it's still not possible to ADD something and change its permissions/ownership in one layer, resulting basically in a doubling of the size of such layers.

I wouldn't go as far as saying it's in any kind of a 'sad' state though. It's a neat wrapper over some cool Linux kernel features, and it's been that way since before 1.0.

I'm curious how you even get performance issues from Docker, what feature did cause performance issues for you?


Always fun to hear experiences from other production veterans. Glad to hear things are working well for you guys.

Our use case involves rapid creation and destruction of containers. Granted, this use case was pretty unheard of when we first adopted Docker, but it is becoming much more common.

Before Docker moved over to containerd, the docker daemon was riddled with locks which resulted in frequent dead-locking scenarios under load as well as poor performance. Thankfully, containerd now uses an event loop and is lock-free. This was a huge motivating factor for us to move forward to Docker 1.11.

To me, the sad state has more to do with Docker the company pushing new features out as quickly as possible and leaving stabilization to contributors. There are some days where it really feels like Docker is open-source so that Docker Inc can get free QA. To most users things may not feel in a sad state, but it can really suck for contributors.


> the docker daemon was riddled with locks which resulted in frequent dead-locking scenarios under load as well as poor performance

I second this. We use Docker in a similar scenario for a distributed CI. So we spawn between 70k and 90k containers every day. Up to very recently we were running 1.9 and got a staggering 9% of failures due to diverse Docker bugs.

It's getting better though, since we upgraded to 1.12 a few days ago we're down to a more manageable 4%, but I'd still consider this very unreliable for an infrastructure tool.

edit: my metrics were slightly flawed, we're down to 4% not 0.5%


You were likely seeing the bug that kept us from deploying 1.9 which was related to corruption of the bit mask which managed IP address application. We saw failure rates very similar to yours with that issue.


how is this acceptable?


You have to design for those failures. In our case we spawn 200 containers for one build, if 9% of those crashes, we still have a satisfactory experience.

In the end, at this scale even with four or five nines of reliability, you'd still have to deal with 80 or 8 failures everyday. So we would have to be resilient to those crashes anyway.

However it's a lot of wasted computing and performance that we'd love to get back. But even with those drawbacks our Docker based CI still run 2 to 3 times faster than our previous one because containers make heavy CI parallelism quite trivial.

Now maybe another container technology is more reliable, but at this point our entire infrastructure works with Docker because besides those warts it gives us other advantages that makes the overall thing worth it. So we stick with the devil we know ¯\_(ツ)_/¯.


> In our case we spawn 200 containers for one build, if 9% of those crashes, we still have a satisfactory experience.

You spawn 200 containers for one build‽ Egad, we really are at the end of days.

> But even with those drawbacks our Docker based CI still run 2 to 3 times faster than our previous one because containers make heavy CI parallelism quite trivial.

Since containers are just isolated processes, wouldn't just running processes be just as fast (if not slightly faster), without requiring 200 containers for a single build?


> wouldn't just running processes be just as fast

The applications we test with this system have dependencies, both system packages and datastores. Containers allow us to isolate the test process with all the dependant datastores (MySQL, Redis, ElasticSearch, etc)

If we were to use regular processes we'd both have to ensure the environment is properly setup before running the tests, and also fiddle with tons of port configurations so we can run 16 MySQLs and 16 Redises on the same host.

See my other comment for more details https://news.ycombinator.com/item?id=12366824


CI can just recover from these error by retrying/restarting containers.


Not Dead containers (which failed their post-shutdown cleanup).


"move fast and do'break shit" philosophy.


Where do you run the CI containers? AWS?


Yes, on a pool of c4.8xlarge EC2 instances with up to 16 containers per instance.

But very little of our failures are accountable to AWS, restarting the Docker daemon "fix" most of them.


For a newbie, what is the reason you didn't use hosted CI, like Travis CI?


Initially we were using an hosted CI (which I won't name), but it had tons of problems we couldn't fix, and we were against the wall in term of performance.

To put it simply when you run a distributed CI your performance is:

    setup_time + (test_run_time / parallelism)
So when you have a very large test suite, you can speedup the `test_run_time` part by increasing the parallelism, but the `setup_time` is a fixed cost you can't parallelize.

By setup_time I mean installing dependencies, preparing the DB schema and similar things. On our old hosted CI, we would easily end up with jobs spending 6 or 7 minutes setting up, and then 8 or 9 minutes actually running tests.

Now with our own system, we are able to build and push a docker image with the entirety of the CI environment in under 2 minutes, then all the jobs can pull and boot the docker image in 10-30 seconds and start running tests. So we were both able to make the setup faster, and to centralize it, so that our workers can actually spend their time running test and not pointlessly installing the same packages over and over again.

In the end for pretty much the same price we made our CI 2 to 3 times faster (there is a lot of variance) than the hosted one we were using before.

But all this is for our biggest applications, our small ones still use an hosted CI for now as it's much lower on maintenance for us, and I wouldn't recommend anyone going through this unless CI speed becomes a bottleneck for your organization.


You didn't include the maintenance cost to manage your infrastructure and container platform, which you don't need to worry with a hosted service.


Even with those it was still worth it. A couple people maintaining the CI is nothing if you can make the build of the 350 other developers twice as fast.

Also it's not like hosted CI is without maintenance, if you want it to not be totally sluggish, you have to use some quite complex scripts and caching strategies that need to be maintained.


> "To me, the sad state has more to do with Docker the company pushing new features out as quickly as possible and leaving stabilization to contributors."

Side note.. I'm a production AWS user, with no plans to change, but I feel like AWS does this exact same thing with each reInvent. The announced products actually become available 6-12 months later, and actually "useable" and reliable 2 yrs later...


You can spin this as they release "MVP" software and let early users drive direction. I mean, that's what I've heard.


Yeah.. Except that's not how they spin it.

In practice, they wrap it in marketing speak to paint it as something to revolutionize your stack.

Then you jump in spending several days of engineering time diving into it, only to find late in the game that the one (or several) critical details you can't find in the documentation that are essential to making an end-to-end production ready pipeline, are not actually implemented yet...

And won't be for many months


Alright that makes sense, most of our containers are long running, usually months. Only the containers that have our apps that are under active development will see multiple rollovers per day.

Now that I think about it we did have one semi-serious bug in Docker, though that was also our own fault. Our containers would log a lot, and we hadn't configured our rsyslog very well so under some circumstances its buffers would fill up and the log writes would become blocking and be real slow. When this would happen some commands like `docker ps` would totally lock up, which messed with our (hand rolled) orchestration system. It wasn't until one of us noticed the logs would be minutes behind that we discovered killing rsyslog would make docker responsive again and thus found out what was happening.

Since it didn't actually affect our service I didn't remember it as particularily bad, but I can imagine that if our service depended on having fast interactions with Docker that would have hurt bad. IIRC they did recognize the severity of the issue and quickly had a fix ready.

I bet Docker Inc. has a tough mission, building out Docker services far enough to compete with the dozens of platforms that integrate Docker such as AWS or OpenStack so they can actually make money off the enterprise.


If you don't mind me asking, What is your use case? The company that I work for is also spinning up and destroying containers constantly and we've had to develop a "spinup daemon" in order to deal with docker's slow spinup time (1-2 seconds is unnacceptable to me).

I'm curious if it'd be worth it to create some shim layer over runC (or adding the functionality) in-order to have a copy-on-write file-system that could be used to discard all changes when you're done with the container. Similar to how you can do a "docker run --rm -v /output:/output/34 mycontainer myapp" and all changes except those within the mounted volume get thrown away.

The use-case at my job needs the security of SELinux + CGroup/filesystem/network isolation. At a first glance, it looks like runC may handle most of the containerization bits, but not the copy-on-write filesystem stuff that I currently need. :s


I can't go into details on our use case, but if it can work for you, I highly recommend the new --tmpfs flag. If you know exactly where your application writes data and are okay with it being in memory, you can reuse your containers with a simple stop and start rather than waiting for the full setup of a new container.

With runC you can mount whatever filesystem you want, but it is up to you to setup that filesystem. So yes, you would need some kind of shim to set up your filesystem.


I've been using Docker in production for the past 2 and a half years (in two different companies) and even with no extreme use cases we've had problems with: performance of volumes/devicemapper, random breaking bugs: daemon would restart without warning or errors in 1.4, randomly killing containers in 1.9, having to restart the daemon in 1.8 when it hung pulling images (consequently killing the containers in the process).

I still like Docker and can see myself, team and company using it for a long time if nothing MUCH better show up (rkt is promising to take some of the complexity pain away but we are not diving into it yet) but I can't say I've not been bitten enough to completely avoid upgrading Docker if it isn't needed, we follow a rule to only upgrade to ".1" releases as most of our problems have been with ".0" ones.


My favorite was docker exec -it $container bash would cause a nil pointer deterrence in docker 1.6.0 and kill the docker daemon. We've seen gobs of bugs since, but that was the most wtf gnarly one


I'd recommend looking at using runC (which is the underlying runtime underneath Docker). Currently we're heading for a 1.0, and the Open Containers Initiative is working on specifications that will make container runtimes interoperable and eventually provide tooling that works with all OCI runtimes. If you have anything to contribute, I would hope that you can give us a hand. :D


I'm a huge fan of the work being done on runC and would love to give you guys a hand! You'll probably see me around soon :)


The same experiences we switched to using rkt, supervised by upstart (and now systemd).

We have an "application" state template in our salt config and every docker update something would cause all of them to fail. Thankful the "application" state template abstracted running container enough were we switched from docker -> rkt under the covers without anybody noticing, except now we no longer fearing of container software updates.


An example of changing behavior that broke us not to long ago: https://github.com/docker/distribution/issues/1662 . By the time this happened we were already working on the transition, just more motivation.


Hi mtanski,

How did you replace docker with rkt? Do you have an howto that you can share?


I haven't replaced Docker with rkt on a big scale (or ran Docker on a big scale), but I recently changed over some Docker containers to rkt.

First off, this and the rest of the rkt docs is a good starting point https://coreos.com/rkt/docs/latest/rkt-vs-other-projects.htm...

Second, rkt runs Docker images without modifications, so you can swap over really easily https://coreos.com/rkt/docs/latest/running-docker-images.htm...

rkt uses acbuild (which is part of the application container specification, see https://github.com/appc/spec) to build images, and I had a very tiny Docker image just running a single Go process.

I just created a shell script that ran the required acbuild commands to get as similar image.

A good place to get started is the getting started guide https://coreos.com/rkt/docs/latest/getting-started-guide.htm...

Docker runs as a daemon, and rkt doesn't (which is one of the benefits). I just start my rkt container using systemd, so I have a systemd file with 'ExecStart=/usr/bin/rkt run myimage:1.23.4', but you can start the containers with whatever you want.

It's also possible to use rkt with Kubernetes, but I have not tried that yet. http://kubernetes.io/docs/getting-started-guides/rkt/


Not to mention 1.11's restart timer never were reset to 0 even after the container ran well for more than 10 seconds. (ie: after a few restart, your container would be waiting hours to start!).

This and I can attest to 1.12 problems listed in this article.

Can't remember the specific with 1.10, but basically, nothing really ever works as promised which make people waste a lot of time trying to make something work when it can't and second, doesn't give much trust in the product's stability.

I really wish they would collaborate a lot more and fragment their solutions in smaller module while keeping everything simple. I think they have a great product, but too much growing pain.


If I were you I would add a disclaimer when criticizing Docker, mentioning that you work on AWS, since the products are competitive in some ways like EC2 Container Registry vs Docker Hub. It would be great for AWS if Docker simply focused on open source bug-fixing and let AWS provide the profitable services....


Added a disclaimer, however, the product I work on does not compete with Docker in any way. We actually rely on Docker quite heavily. No conspiracy here.


"It'd be great if Docker wasn't profitable."

I sort of agree, but that's not entirely realistic :)


Why the anti capitalistic sentiment? Is programmers want to get paid for our work, right?


No. Programmers want huge paychecks but everyone ELSE should be FOSS and code for us for free


ahem Free software does not need to be gratis. There are several examples of companies which charge money for free software.


ahem That's obviously not what is being discussed here.

> It would be great for AWS if Docker simply focused on open source bug-fixing and let AWS provide the profitable services....


I was responding to the specific, sarcastic, wording of this line "Programmers want huge paychecks but everyone ELSE should be FOSS and code for us for free".


And I said FOSS not gratis with intent. Perhaps could have made it FOSS and Gratis.

Devs are mad ITunes is closed source. Mad windows is closed source. But happy to get a big paycheck if they work at Microsoft or Apple.


> But happy to get a big paycheck if they work at Microsoft or Apple.

I wouldn't ever want to work for a proprietary software company. But I admit that I'm on the extreme end on this debate.


Would rkt be a worth to try alternative?


I really haven't looked at rkt as much as I should, but we're more likely to invest in looking at lower level tools like runC moving forward.


Amazing thank you! I need to chose a containerization tech in the next month and I am pretty worried to go with Docker because I hear many stories about how it is not really production ready. Thanks for mentioning runC I will check it out.


All depends on your use case. RunC could be way too low level for what you need and Docker may be production ready for your specific use case.


No, it is perfectly covering my use case. I need a _reliable_ containerization app that does not run any additional service on my boxes. I am working on an extremely low overhead orchestration for our cluster so we can avoid Swarm entirely.


Disclaimer: I work at Mesosphere.

There are alternative runtime implementations (such as Mesos/Mesosphere DC/OS) that let you have best of both worlds: developers can still use Docker and produce Docker images but you use production-grade container orchestration (and that same Docker images) without using Docker daemon for your actual service deployment.


runC isn't an orchestration solution. It's a low-level component that can be (or is already) used by higher-level orchestration technologies.


We're actually working on getting OCI support into Kubernetes. It's a long way away, but we're very determined to get large orchestration engines to provide support for OCI runtimes (runC being the canonical example of such a runtime).


Great, I do not need any orchestration solution at all. I need a container running thing that can encapsulate any software that we are developing (Java, C#, Node.JS, etc.). And now we are approaching the question what is my problem with Docker. I believe it is a misconception to compete with already existing tools like systemd. I especially do not want any mediocre orchestration solutions in my infrastructure that introduce big overhead and complexity that I do not need at all. One thing I learned along the way of managing large clusters (5K+ nodes) that Swarm like frameworks are extremely error prone. If you flip the problem and build a startup script the pulls down the container configuration from S3 for example and the container itself has code that attaches the instance to the right service (EKB, Haproxy, etc.) you can achieve the same without introducing services that sole purpose is to maintain a state that you do not need.


If you want a container-like technology that already has the large cluster management, scaling built in and is ideal for software whose source code you control think about trying kubernetes (and/or any similar competitors).


Sounds like you want a PaaS.

Cloud Foundry is currently running real applications with 10k+ containers per installation. We are on track to test it with 250k app instances.

Plus it's been around for, in internet terms, eternity. Garden predates Docker, Diego predates Kubernetes, BOSH predates Terraform or CloudFormation and so on. Used by boring F1000 companies, which is why it's not talked about much on HN.

Disclosure: I work for Pivotal, we are the majority donors of engineering to Cloud Foundry.


I really do wonder what it would take to get the ecosystem to get behind rkt or something else. The present situation feels to me like it's held together by a shared desire to keep the Docker brand going, and that conflicts between Docker Inc. and basically everybody else just won't stop, because there is a lot of money involved for all sides.


For me the question is more like: why should we bundle together containerization with anything else? Why couldn't we follow the unix philosophy and have the containers work together with the orchestration softwares and not tied together. CoreOS seems to have it kind of independent. Our biggest blocker is the lack of RPMs for CentOS/RedHat for rkt.


> I need to chose a containerization tech

You are more likely to run into issues with containerization itself ( cgroup, namespaces ect) than abstractions on top of it.

Unless you are doing some sort of orchestration on top of containers, you can't go wrong with any of the container abstractions.


Hm, I don’t think so. It just doesn’t have enough momentum and as such there aren’t many containers available. Of course, if you want to roll your own, that may not be relevant.

I tried it with the nginx and php-fpm Docker containers, but it wouldn’t work – because those containers assume specific process hierarchies (to log to the console where you issued `docker run`) that just aren’t present when using rkt. The advertised Docker compatibility only goes so far.

I still think rkt is a great idea, but I’m too lazy to develop my own containers. The documentation isn’t that good either.


To be fair, you really have to roll your own containers for applications, and it's not hard at all, if you already know how the applications are hosted are on a Linux server.

I've found the single-service Docker containers from the Hub are useful for development (MySQL, Redis, etc.), but the "official" language run-time Docker containers that I've looked at are basically demoware. They are built to give you something that runs with the minimum of effort, rather than being efficient or anything else.


They need to implement support for the Dockerfile format if they want to win. People value inertia. The switching costs have to be low if you expect anyone to switch; this only become non-true when the incumbent becomes intolerably useless to the general userbase.


rkt can run any docker image built by a Dockerfile: https://coreos.com/rkt/docs/latest/running-docker-images.htm...

I agree that we need a better ecosystem of build tools and that is something we are looking to help build out. But, with rkt what we are trying to do is build an excellent runtime; and think the build side is an important and orthogonal problem.


There is no proper ecosystem for rkt. All I have seen is marketing hype and I don't know anyone who uses it. Just go to any meetup.


We're in the process of converting our 100+ cloud nodes to Docker+k8s and I have a lot of the same reservations -- the space is very immature and the tooling has a lot of kinks to work out, not only functionally but also aesthetically. It's already been a nightmare and we're not even deployed to prod yet.


If "cloud" is AWS, you should join the kubernetes slack sig-aws channel. Lots of community people figuring out those kinks together.


It's obligatory for me to recommend Cloud Foundry here. We've already built the platform, there's no need to build and maintain your own.

It just works. Really well, actually.

Disclosure: I work for Pivotal, we donate the majority of engineering to Cloud Foundry.


If you haven't heard of Giant Swarm, I encourage you to contact them. They have a scalable microservices provisioning solution that can use either Docker or Kubernetes. German company. Disclaimer: I worked for them last year. Holler if you need an intro.


There are a couple of things here. 1) Right now everyone is afraid that Docker will emulate VMware, and crowd them out of the container space, much like VMware killed most of their competitors. 2) To this end, I have heard that Google and Redhat have massive marketing budgets, and that the marching orders have been over and over - don't say docker, say k8s. 3) The real battle is where the money is - large scale distributed systems. Companies want to freeze docker out, because Docker controls the lowest point of access - the container runtime itself. 4) hence google is trying to push "docker compatible" ideas that are just the OCI standard - nothing to do with Docker itself.

AWS doesn't want to support Swarm, because it gives people portability off of their cloud. Google doesn't want to support swarm, because K8s is a trojan for Google Cloud. No one else wants to support swarm because it competes with their products.

That said, what's happening right now, if we are not careful, will fragment the container ecosystem, and it make it impossible for single containers to target multiple runtimes.

Docker is the only one who can deliver a universal set of functionality that is leveraged by all. From a technology point of view, Docker is going in the right direction. We got burned with Redhat in Openshift 1 & 2 land, and that's left us with a point of view that the only thing we can depend on is a container runtime itself, and 12fa applications.

K8s does not really work that way. It's huge and it's heavy, and it expects every app to be written it's way.

The technical direction here for Docker is good. But the implementation and early release is ridiculous. I was impressed by the first RC release, and then terrified that they released a RC as production.


> Docker is the only one who can deliver a universal set of functionality that is leveraged by all.

Why do you say that? I have quite a bit more faith in the design chops of the folks behind Kubernetes (Google, Redhat, CoreOS, and many others) than Docker Inc.

Swarm really only touches the surface of the requirements for large scale distributed container orchestration.

Kubernetes is complex because the problem it attempts to solve is complex.

I'd also add that Kubernetes is dead simple to use. The difficulty is in setting it up - but even that is getting much better.


Good question. K8s has a network mode that is incompatible with swarm, mesos and nomad. Swarm only touches the very top of requirements for complex deployments, but going into K8s, the way they do thing pretty much prevents separate container orchestration systems from working in parallel.

For it to be universal, it has to live in the container runtime.


> K8s does not really work that way. It's huge and it's heavy, and it expects every app to be written it's way.

I disagree. Kubernetes is quite lightweight, and its architecture is nicely modular. The core of Kubernetes is just four daemons. You can also deploy most of its processes on Kubernetes, which greatly eases the operational side.

> and it expects every app to be written it's way.

Kubernetes makes zero assumptions about how an app is written, as long as it can run as a Docker (or rkt) image.

It imposes certain requirements, such as that pods are allocated unique IP addresses and share networking between containers, but that doesn't really impact how apps are written.

> K8s is a trojan for Google Cloud

Doubt it very much. For one, the Kubernetes experience on GCloud (GKE) isn't particularly good at all — the "one click" setup uses the same Salt ball of spaghetti that kube-up.sh uses, the upgrade story isn't great, alpha/beta features are disabled, you can't disable privileged containers, ABAC disabled, the only dashboard is the Kubernetes Dashboard app (which is still a toy), and GCloud still doesn't have internal load balancers. Setting it up from scratch is preferable, even on GCE.

Additionally:

* Kubernetes has excellent support for AWS as well as technologies such as Flannel for running on providers with less flexible networking.

* Google makes a lot of effort to help you to set it up on other providers (also see kube-up).

* Projects like Minikube let you run it locally.

If Kubernetes is a "trojan" of anything, it's to improve the containerization situation generally, because this is an application deployment model where they can compete with AWS, which doesn't have a good container story at all (ECS is pretty awful).


The arguably whole reason Google is sponsoring K8S is to promote GCE and GKE. It's their main long term game play vs. AWS (moving the world to containers instead of VMs).


Disclaimer: I work for Red Hat on OpenShift.

I apologize for your experience with Red Hat OpenShift 1 & 2. OpenShift 3, which has been out for more than a year now, is natively built around both docker and kubernetes. Red Hat developers are among the top contributors to docker, kubernetes, and OCI. With OpenShift we seek to provide an enterprise-ready container platform, built on standard open source technologies, available as both software and public cloud service. I hope you will give us another look!


I work for what is a Red Hat competitor in this space, Pivotal.

Like this fellow says, OpenShift 3 is lightyears ahead of 1 & 2.

(Obviously, my horse in this race is Cloud Foundry)


I work for Google Cloud (though my opinions are my own).

If people want to run Swarm or Nomad or Rancher on Compute Engine, then more power to them!

In fact, I even open sourced deployment templates to run Swarm on GCE and hopefully will add autoscaling and load balancing soon: https://github.com/thesandlord/google-cloud-swarm


I agree with you that lock-in is a big motivator here. It's always been king in the software space. As you point out, k8s exists as a public project specifically to diminish AWS's lock-in and make it simple to deploy out to other cloud providers (Google Cloud specifically).


Disclaimer: I work at Google and was a founder of the Kubernetes project.

In a nutshell yes. We recognized pretty early on that fear of lockin was a major influencing factor in cloud buying decisions. We saw it mostly as holding us back in cloud: customers were reluctant to bet on GCE (our first product here at Google) in the early days because they were worried about betting on a proprietary system that wasn't easily portable. This was compounded by the fact that people were worried about our commitment to cloud (we are all in for the record, in case people are still wondering :) ). On the positive side we also saw lots of other people who were worried about how locked in they were getting to Amazon, and many at very least wanted to have two providers so they could play one off against the other for pricing.

Our hypothesis was pretty simple: create a 'logical computing' platform that works everywhere, and maybe, if customers liked what we had built they would try our version. And if they didn't, they could go somewhere else without significant effort. We figured at the end of the day we would be able to provide a high quality service without doing weird things in the community since our infrastructure is legitimately good, and we are good at operations. We also didn't have to agonize about extracting lots of money out of the orchestration system since we could just rely on monetization of the basic infrastructure. This has actually worked out pretty well. GKE (Google Container Engine) has grown far faster than GCE (actually faster than any product I have see) and the message around zero lock-in plays well with customers.


Not speaking in an official capacity, but the analogy I've seen used is that big companies don't want to relive the RDBMS vendor lock-in experience.

I'm speaking about something other than k8s (Cloud Foundry), but the industry mood is the same. Folk want portability amongst IaaSes. Google are an underdog in that market, so it behooves them to support that effort -- to the point that there are Google teams helping with Cloud Foundry on GCP.

Disclosure: I work for Pivotal, we donate the majority of engineering to Cloud Foundry.


k8s is essentially "aws in a box" and it's a product that locks. As soon as k8s cluster is running in GKE - it becomes not that portable at all, due to operational complexity as well as tide up to the google infra.


> That said, what's happening right now, if we are not careful, will fragment the container ecosystem, and it makes it impossible for single containers to target multiple runtimes.

Not a chance. There is Packer [0] to get rid of all potential lock-in and monopoly. It's a universal image/container creation tool.

- It re-uses your ansible/chef/puppet/shell/whatever scripts for setting up the image.

- It outputs a docker containers, Amazon AMI, Google images, VmWare Images, VirtualBox Images. Whichever you like, with the same configuration.

[0] https://www.packer.io/


I wish that docker would adapt more of a Unix philosophy and focus on doing one thing well. Why does everyone have to compete with everyone rather than create set of tools that work well together?

I see docker-machine and docker-swarm as distractions. Reasons why doing all those other things, instead of focusing on containerisation and packaging may be harm-full for docker itself:

- Bundling-in the orchestration with docker make k8s or Mesos more inclined to fork docker and pull out unnecessary cruft.

- Churning out half-ready features causes Docker to be known as unreliable and leads to posts with titles like this. Such reputation tends to stay long after bugs are fixed. SV-esque launch and iterate works for web apps, but IMO not for back-end software.


"In infantry battles, he told us, there is only one strategy: Fire and Motion. You move towards the enemy while firing your weapon. The firing forces him to keep his head down so he can't fire at you. (That's what the soldiers mean when they shout "cover me." It means, "fire at our enemy so he has to duck and can't fire at me while I run across this street, here." It works.) The motion allows you to conquer territory and get closer to your enemy, where your shots are much more likely to hit their target. If you're not moving, the enemy gets to decide what happens, which is not a good thing. If you're not firing, the enemy will fire at you, pinning you down."

From one of Spolky's finest, Fire and Motion: http://www.joelonsoftware.com/articles/fog0000000339.html


"If only the Generals had not been content to fight machine-gun bullets with the breasts of gallant young men, and think that that was waging war."

- Churchill 1931


I want you to think very seriously over this question of poison gas. I would not use it unless it could be shown either that (a) it was life or death for us, or (b) that it would shorten the war by a year.

It is absurd to consider morality on this topic when everybody used it in the last war without a word of complaint from the moralists or the Church. On the other hand, in the last war bombing of open cities was regarded as forbidden. Now everybody does it as a matter of course. It is simply a question of fashion changing as she does between long and short skirts for women.

- Churchill 1944


"War,” writes von Clausewitz, “is an act of violence intended to compel our opponent to fulfil our will…This is the way in which the matter must be viewed, and it is to no purpose, it is even against one’s own interest, to turn away from the consideration of the real nature of the affair because the horror of its elements excites repugnance.”


Supreme excellence consists of breaking the enemy's will, without fighting -- Sun Tzu


This!

This analogy perfectly captures why in software, the second best thing always wins :-) [1]

The utopian 'ideal' systems that can only be built slowly and methodically get crowded out by the systems that start with some scruffy code just keep moving.

[1] UNIX vs Multics, Windows vs OS/2, MongoDB vs ?


In the infantry, we learn "Shoot, move, communicate".


"- Bundling-in the orchestration with docker make k8s or Mesos more inclined to fork docker and pull out unnecessary cruft."

Mesos 1.0 already introduced the universal containerizer, allowing to run many docker images natively without the docker daemon:

https://www.youtube.com/watch?v=rHUngcGgzVM&index=14&list=PL...

http://mesos.apache.org/documentation/container-image/


Because shipping features is king. When a single vendor ships many products, those products are more likely to work well together than with products from other vendors because they were built under the same roof. Better integration = saving time while reducing risk = more time to build and ship features.

The downside, of course, is vendor lock-in. But that's only a problem if a) the vendor stops updating their products, which is unlikely if those products are popular, like Docker, or b) the vendor raises prices beyond what can be justified to remain that vendor's customer. But that's a problem for whoever takes over your project next year, not for you.


I would say that the worst problem is the vendor changing the product in some incompatible way as part of a monetization strategy, so that Product X still exists, but the install-base splits.

Docker Inc. themselves make a big play of the facts that containers have to be standardized to be portable, and the portability is the key value. We could have done a lot of this stuff years ago if the virtualization vendors had a totally portable format for transferring VMs between different systems.

Swarm etc. is part of the monetization strategy - other vendors in the ecosystem have already backed Kubernetes, or MesoSphere, or whatever, and do not want or need this stuff tied to the Docker run-time itself. Fortunately, Docker Inc. can add these without breaking compatibility of images or damaging the core features enough that a fork becomes necessary, but it does create market confusion.


This attitude is destroying their ecosystem. Many integrating authors (not going to name names, but plugins, et al) are feeling the heat and focusing their efforts on Kube and Mesos because Docker will just replace your shit with a half-baked thing in 6 months and everyone will flock to that.

Conversely, it's not been a surprise for those of us embedded in this community for a long time to see Kube and rkt join forces. There are a ton of both technical and political decisions behind this and unfortunately most of the political barriers end in the name Hykes.


Everyone tries to get the 100% of the orchestration pie. Pie gets smaller as barrier of entry is higher due to the fragmentation.

If those companies would focus on securing position in the pie, rather than owning the whole pie, the pie would grow quicker and thus the absolute returns of each player could be better.


As a very well-funded startup, they can't just build tools that do one job. They have to build a complete platform.


Yes maybe the Docker team should just have joined forces with Kubernetes (K8s) instead of going out on their own and building Swarm from scratch.

K8s is far ahead of Swarm - K8s has practically built its own language using YAML files - Swarm is still at a stage that all the configs for a service have to fit into a single command (and the options are much more restrictive than K8s).

To be fair, I do like some things about Swarm better than K8s (based on the docs), but in practice, Swarm is behind and they should tell you that up front. When I was just starting out, I literally had to install all of them; Swarm, Mesos and K8s to be able to make an informed decision because, in the case of Docker, the docs are like 6 months ahead of reality. I didn't realize that the Docker 'service' command didn't even exist until v1.12 and I couldn't install v1.12 on my machine (last time I tried, installation was failing - Obviously not yet stable).

I think Swarm has potential but they need to accept that they're just not going to be the first to market.


To be fair, I don't think they built Swarm from scratch, I think it is/was a rebranding of an acquired product. That being said, Docker and swarm in particular move too fast for their docs to keep up (let's change the syntax for some important commands between the final RC and the release?) and it feels like the only way to be well informed is to scour the github issues, which seems wrong for something that's touted as a stable, commercially-supported product.


> move too fast for their docs to keep up

That's not a matter of moving too fast, it's a matter of broken processes. A user-visible change that does not update the docs ought to be rejected during review. If they don't have these basic development processes nailed down, that does not instill confidence in me regarding the quality of their shipped code. And that, of course, fits nicely with the reports of buggy .0 releases.


Swarmkit was built from scratch, based on lessons learned in the previous non integrated Swarm.


This is true. The team packed a lot into this new codebase. Many miles to go, however.


Swarm works (more or less) if one has just 2-3 hosts, use Docker for packaging, and want a semi-unified view of those machines.

Kubernetes seems to be quite highly opinionated toward "clouds" and "microservices". I just wasn't able to wrap my head over its concepts' applicability to my "I just have one server that uses Docker for packaging, and now want to throw in another, for resiliency" case.


Kubernetes isn't particularly opinionated at all. It runs containers, and doesn't care what those containers are or how they behave. Microservices and clouds not required.

Its core data model, simplified, that of pods. A pod specifies one or more named containers that should run together as a unit. A pod's config can specify many things, such as dependencies (volumes, secrets, configs), resource limits and ports (including how to perform health checks). You can deploy single-container pods, and this is the norm, but it's entirely feasible to run a whole bunch of containers that conceptually belong together.

To expose a pod's ports to the world or to other pods, you define services. These simplify specify what ports should go which pods, and Kubernetes will assign a persistent, internal IP address to it. Kubernetes will (typically) configure iptables so that the service is round-robin-balanced at the network level across all containers that it serves; the idea is that the pod should be reachable from any other pod in the cluster. Together with KubeDNS, which resolves service names, you can do things like call http://mylittlepod/ to reach a pod.

To achieve resilience, Kubernetes lets you define replica sets, which are rules that says "this pod should run with N replicas". K8s will use the scheduler to enforce this rule, ensuring that a pod is restarted if it dies and always has N replicas running, and it can automatically ensure that pods are spread evenly out across the cluster. Replica sets can be scaled up and down, automatically or manually.

There are other objects, such as deployments (handle rolling upgrades between one version of a pod and another), ingresses (configures load-balancers to expose HTTP paths/hosts on public IPs), secrets (encrypted data that pods can mount as files or envvars), persistent volumes (e.g. AWS EBS volumes that be mounted into a pod), and so on, but you can get by with just pods and services, at least to start.

Kubernetes is a bit pointless with a single server, but adds convenience even if you have just two or three.


> "I just have one server that uses Docker for packaging, and now want to throw in another, for resiliency"

Yes, when you only have a couple of servers, that is not the sweet spot for K8s.

But few people stop at 2 servers. A few months in, someone asks for a staging environment, and/or QA environment. Someone eventually realizes that they need to regularly test their fail-over and backups. Someone hires a contractor, and wants to give them a copy of the setup that won't block anyone else. Someone realizes we can centralize the logs from all these environments... And so it goes.

Even with one server, sometimes you go to do an upgrade, and find your "one server" is actually a tightly coupled bunch of services. (Made-up example: I want to upgrade Varnish, but it requires a newer library that is incompatible with my WordPress version.) That one server could be a server for the Database, one to run the cron jobs, a few for the cache layer, etc. If you break up those into different boxes, you can scale them better -- Instead of one big beefy server, you can have each layer at it's own scale (one or more wimpy boxes, dynamically adjusted).

You don't do this to save money directly. But by simplifying things, you make it easier to maintain. That saves labor, plus prevents problems (and makes it easier to hire and train ops.)

When you have just a few servers, it looks manageable. As you grow, it gets a lot harder to manage. K8s helps.


The new swarm mode is great when it works, but it is monster to debug. It takes care of so many things that it is nearly impossible to pin point what component isn't working.

The issues I have encountered with swarm mode:

* Some containers could not use hostnames to connect to other containers.

* Sometimes, in a 3 node swarm, containers on A could be reached from B, but not from C.

* After every reboot these issues could be fixed or start occurring.

* It automatically adds firewall rules for every service you port map to be exposed to the internet, without warning

In the end I switched to nomad, it isn't perfect either but at least it is consistent.


Do you have the links to the issue tracker for each of these issues? For me/us to subscribe/follow.


If you look at the design of Kubernetes you'll find a very strong opinion on how networking is done. Kubernetes does not allow the use of Network Address Translation (NAT) for container-to-container or for container-to-node traffic. The internal container IP address must match the IP address that is used to communicate with it. Swarm plays faster and looser than this. I think Google's age and experience shows that Docker went the wrong way.


If by google's age and experience, you mean the requirements of the Google Cloud, then i agree.

Docker is doing exactly what they should be, but in a manner that is destructive. Getting a built in consistent p2p routing mesh under every container is brilliant, and fixes one of the biggest problems with k8s and swarm (it's not really possible for these technologies to interoperate because of incompatibilities with the network model).

The big problem is the stability hit. 1.12 had no business loosing the rc label.


disclosure: Kubernetes engineer

> If by google's age and experience, you mean the requirements of the Google Cloud

Quite on the contrary. Kubernetes flies in the face of Google's cloud APIs, and has to take advantage of every dirty trick it can. But it does that because the result is better. I can say that without hesitation, having worked on the logical conclusion of port-mapping (Borg).

> Getting a built in consistent p2p routing mesh under every container is brilliant, and fixes one of the biggest problems with k8s

That's hilarious to me, because what Docker calls "routing mesh" is a feature that Kubernetes has had since 1.0. It's different in some subtle ways, but again, for really important reasons.


> If by google's age and experience, you mean the requirements of the Google Cloud, then i agree.

Why do you think this is a requirement of Google Cloud?


Kubernetes and Google Cloud were both informed by the design and implemention of Borg. Kubernetes is basically for all intents and purposes Google Container Engine. That's fine, but it;s highly tuned for how google sees the world.


Huh? Kubernetes can run just fine outside of Google's Cloud. It'll work on any TCP/IP IaaS offering out there. If you mean it demands a clear end-to-end connection model for the important moving parts, then, yeah; because of hard-won experience of what works. They found out that you want to spend your time on bugs in your app, not bugs in your networking infra.


Swarm Mode is "great". Assuming you've never heard of or used Kubernetes. In which case, Docker Swarm is too little, and a year+ too late.

As for marketing, it does seem a bit funny that a product would announce "deep integration with underlying infrastructure" for a cloud provider when they haven't written a single line of (public) code to actually support that cloud provider.

The fun thing is this arguably critical blog post praises features in "swarm mode" that have long been present in Kubernetes/Mesos/Nomad: [labels/constraints].

There could be a lot written about the fact that Docker ships "Swarm Mode" as stable in 1.12 despite virtually everyone's actual first-hand experiences. I would argue that if "Swarm Mode" were not shipped inside of Docker 1.12 and didn't benefit from riding along with the normal `docker` package, few would be talking about it.


Yeah, Docker has a lot of catching up to do. They bragged a lot lately about how Swarm is faster than K8s but you really can not compare the two when you look at features and stability of k8s.


It's amazing (and not in a good way) that this stuff is being shipped in a point release.


I also assumed the new easy swarm would be easier, but the Multi-host network only works with the newly introduced service command only, it does not work with regular docker run command, which is disappointing because you still neeed a third party key-value store for it, but not for docker service. It literally took me hours to realize that was missing. I think 1.12 was just rushed to show it in DockerCon, normally major releases were mostly bug-free and worked as intended.


I feel the pain for I was in the same boat.


Docker team, please focus on one thing - packing apps to containers. Leave everything else to other projects, don't try to do everything wrong, just do one thing good. Thank you.


The problem is that they literally can't do that - Docker Inc. took a huge pile of VC money, and containerization is a commodity feature. To succeed they have to build higher-level products and services, which put them in direct competition with the other vendors in the ecosystem.


>To succeed they have to build higher-level products and services

I wish they didn't have to cram it all into single product docker though.


Great point - it would be nice if they separated out their monetization products (Swarm, Docker Datacenter, etc) from the core Docker engine. There is plenty of money to be made from enterprise customers who will want the security features of Docker Datacenter, or support for Swarm, without bundling your orchestration layer with the container runtime engine.

This type of bundling reminds me of Microsoft bundling IE with Windows. Initially, IE was much worse than Netscape, and this seems like roughly the same thing - monopoly in one market bundling an inferior product to try and achieve a monopoly in another somewhat related market.


If they didn't at least try to have a complete solution, someone else would package docker as one part of a turnkey solution and they would get stuck with all the maintenance costs and none of the consulting fees.

Complicit in their own destruction.


> someone else would package docker as one part of a turnkey solution

Indeed, Red Hat have done exactly that with OpenShift 3.

Disclosure: I work on Cloud Foundry for Pivotal, ostensibly a competing project/product.


Sounds like a high-risk ploy. Then again, VC money.


What annoys me about it is that it requires everyone else to lose for Docker Inc. to win. It's rather like if Linus invented Git, then formed a VC-backed company on the back of it, and then tried to figure out how to extract enough money from the ecosystem so that Git, Inc. could be flipped or IPO'ed for a really big pile of cash.


Agree strongly. I think they will lose this fight which is why I've stayed out of the docker race and am currently pinning my hopes on k8s + rkt.


I haven't played enough with Docker Swarm to run into any of these issues, but Rancher (www.rancher.com) does all of this, and more, very well. I am not affiliated with them - just a happy user and contributor of github issues.


Went with them last week, as it seemed to be one of the most sane options out there (I've tried a lot). Still, has its share of issues. Few ones I've encountered so far:

- Their built-in load balancing (HAProxy-based) is nearly impossible to debug. Literally no logging there.

- No locality awareness. DNS queries always return all addresses they know about (including those that don't even work - https://github.com/rancher/rancher/issues/5792) and I haven't yet found any good way to prioritize containers co-running on the local host to the more distant ones (https://github.com/rancher/rancher/issues/5798 - if someone been to this situation and has some ideas, would appreciate any suggestions!).

- Storage management was advertised, but can't find anything besides NFS (which is SPOF) and Amazon EFS (which I don't use). There was GlusterFS support, but it seems it was too broken so they had removed it or something like that. If one wants persistent storage, they'd better pin containers to hosts.


> I haven't yet found any good way to prioritize containers co-running on the local host

You may find that RFC3484 helps; it prefers the address with the longest prefix in common with your own address so will tend to pick your own address. And you are probably getting this behaviour already.


Thanks for the pointers! Unfortunately, I don't think this applies. I'm certainly not getting this behavior - I wouldn't have even thought of it if I haven't observed higher latencies resulting from (sub)requests chain jumping from node to node back-and-forth, instead of staying within the node's boundaries.

The problem is, the network space there is flat, not hierarchical - while I haven't looked at the actual implementation code, I believe container addresses are just randomly chosen from a single big 10.42/16 subnet and I'm unaware if there's a way that I can assign hosts, say, a /20 out of that space (yes, this would've solved things nicely).


Oh. Right. I happen to work on a different Docker network, which 'chunks' the address space so containers on the same machine are very likely to have contiguous addresses. Hadn't occurred to me theirs doesn't do that.


Just curious - what networking/clustering solution you're using?

(I'm asking, so the next time I'll have to make a choice between stacks I would be more aware about the finer details.)

Thanks!



GLuster is a disaster.


Haven't used it in non-toy environments, so won't argue with that.

My actual issue is, there's effectively no distributed storage support in Rancher/Cattle at this moment, be it GlusterFS or anything else (for all I know, MooseFS worked quite well for us on one project).

Just pointing it out, because for some reason I got quite a different impression from the website/docs.

Every point of advertisement statements like "Rancher provides a full set of infrastructure services for containers, including networking, storage services, host management, load balancing and more." is to be taken with a huge bag of salt.

(And that's by no means unique to Rancher.)


I agree on the general comments on storage. The Docker volume system seems to to be the ideal place to do this.


I agree, Rancher just ties everything together really nicely and it works awesome with Kubernetes. I'm building a hosted Rancher cluster manager for my OSS project http://baasil.io/.


Rancher is very cool but I wish that rancher OS was easier to lockdown. If you run it outside of AWS or Exoscale you don't get a firewall so you have to do that on the hosts.


Quality is something long term, but because the world becomes faster and faster people care less about it. Hype can generate a lot of money in the short term. And that is what is valued the most in the current economy. Being a quality guy myself I also feel that this is painful, but I think it's hard to blame anybody for that. I don't know anybody who goes for the short term success because they want to live in that kind of world. It's just about the only thing that gets rewarded.


# Market-Driven-Development

Docker promises too much and delivers too little. Story of every software project in the last 50 years.


I wish Docker had some more open source competitors particularly some non-profit competitors. Yes I know Docker is open source but I have this terribly gut feeling about becoming too reliant on their technology. I feel like I'm going to screwed some day.

I guess I just know some day Docker's investors are going to want their money back.

Yes sure there are other quasi opensource products that are super critical that I use but they all have alternatives (for example Java has plenty of alternatives).

Hopefully I don't get downvoted to oblivion for this comment. I am sure my trust issues are illogical and I would really like to remove the inhibition to use docker but articles like this do not help.


I don't think that you are being completely illogical. One thing about the Docker stack, though, is that it is supported by every major vendor, and you may well be getting your server installations through a vendor (Red Hat, AWS etc.), so in that case you are insulated from problems. As other commenters have discussed, the vendors test and patch their Docker distributions themselves already, as well as contributing to development.

We also already have a functional replacement with rkt. It can use Docker images, and Kubernetes can use rkt as a run-time in place of Docker, so Docker is not irreplaceable.

I think that the most valuable bits of Docker today are the developer tooling - easy Windows and Mac installers, Docker Compose, the online documentation, and the Hub for grabbing ready-made images. None of which, AFAIK, makes much money for Docker Inc.


It seems that using Kubernetes is definitely more mature and usable than Swarm. But how would you rate the other Docker projects, like docker-machine and docker-compose. Does Kubernetes also subsume those projects?

These seem to be way more mature than Swarm.


Kubernetes is definitely the technically correct solution. The only really hard part is getting started, but if you have a cloud service provider that runs a Kubernetes cluster for you then you don't need to worry about that. :P


I don't know if Kubernetes is more usable - it's a MONSTER to host yourself.

Other docker projects suffer from various levels of similar issues. Docker-machine is nice, but has a ton of rough edges, especially when spinning up host on AWS. It feels like the programmer(s) of machine never used AWS beyond the simplest use case.



Kubernetes isn't particularly hard to host.

Bootstrap etcd

start kubelet

place api, proxy, scheduler and controller definitions manifests folder.

If you're running infrastructure that deals with problems such as maintaining the health of applications then it's going to generally be a whole lot more complex than that.


We've experimented with docker in a few places, and the deployment workflow is just painful:

  1. sudo docker build -t quay.io/foo/bar
  2. sudo docker push quay.io/foo/bar
  3. <login to production>
  4. sudo docker pull quay.io/foo/bar
  5. sudo docker kill foobar
  6. sudo docker rm foobar
  7. sudo docker run -p 80:80 -p 443:443 -e FOO=bar --name foobar --net=host -d quay.io/foo/bar
I can never understand how people talk about docker making deployments somehow easier.


How is that painful?

I run a service using docker and have your steps 4 through 7 as part of a systemd unit file. Updating the application requires a single systemd restart command.


could you talk about your deployment scripts ? i am trying to deploy a single flask app which uses redis. I'm not sure how to set up logging, etc. and whether redis and flask will go in the same VM.


Not much in the way of scripts, but the systemd file I use is something like this:

  [Unit]
  Description=App
  After=docker.service
  Requires=docker.service

  [Service]
  TimeoutStartSec=0
  ExecStartPre=/usr/bin/docker pull app
  ExecStartPre=-/usr/bin/docker kill app
  ExecStartPre=-/usr/bin/docker rm app
  ExecStart=/usr/bin/docker run --name app --rm=true -p 80:80 app

  [Install]
  WantedBy=multi-user.target
With something like dokku I could just push the git repo containing the Dockerfile and it would accomplish the same thing


You probably also want to add: PartOf=Docker.service

This will ensure that a restart of the docker service will trigger this service to be restarted.


ElasticBeanstalk streamlines things greatly, and when it works everything is pretty nice. When it fails, though... for instance, just a few days ago, one of the containers failed with an OOM error. For some reason--still unclear--the ECS and/or Docker daemons weren't able to start new containers to replace them, leaving the instance broken for hours. Auto-scaling groups will mitigate this, but it's still unnerving.

Still, I'm liking many aspects our tools. Using Docker with Rocker (https://github.com/grammarly/rocker/) has greatly sped-up CI builds by caching results when the source hasn't changed (especially important in multi-language shops; the Python guys don't want to wait on the Java code to build every time.) Just upload a tagged image to ECR, generate an "application version" referencing those images, and deploy via the Slack bot ("@bula deploy develop develop-XXX-e83fc3bd").


If you're using Kubernetes, steps 3-7 and replaced with a single "kubectl" line, and you can even eliminate that by baking it into your Quay setup. (Why would you ever do step 1-2, though? Quay supports Github hooks.)

We've started using a self-hosted Drone [1] install (not to be confused with the hosted drone.io service, which is not good) to build containers. Unlike Quay, it doesn't launch build VMs, but rather uses Docker containers, so it's very fast. It also supports the notion of build containers, so you can do things like compile C code or run NPM without ending up with any compilers or build tools in any of your image layers; it completely removes the need for a custom "base image" shared among apps. It also lets us add the Kubernetes deploy as a final step after publishing.

[1] http://readme.drone.io/


Bring back the 10,000 line bash scripts, Puppet, configuration drift, inconsistent environments. All is forgiven!


For us: Jenkins do steps 1 and 2. Steps 3-7 is simply "push button" via ansible.


Replace 4 to 7 with a single "kubectl rolling-update".


If your Unix user is in the docker group, no need to sudo.


I share the pain of this post - except my run-in with 1.12 occurred with Docker for Windows. The shared volumes and host networking were totally nondeterministic. I don't think I had really ever experienced software which seemed to fail so randomly.

On the other hand I think Docker can probably be forgiven for my particular frustration. For one, the software was in beta. And two, working with Windows and all the different flavors must be a nightmare.


(I work at Docker)

If you haven't already and can spare the time, could you file a pair of issues: perhaps one for the shared volume problem and another for the networking on https://github.com/docker/for-win ?

We've been fixing bugs in both areas and the fixes should arrive in the beta channel over the next few updates. Thanks for your patience!


Am I the only person around here who is skeptical of all this added complexity?

Some of it is clearly very useful, but some of it strikes me as a tower of babel built upon workarounds to problems that have simpler solutions.

What I see here are several different vendors vying to make sure they remain relevant. While I understand this and even the need for it, I also understand that it can drive poor engineering decisions in the long run.


Disclaimer: I work for Mesosphere

This is the reason the latest Mesos and DCOS has the universal containerizer. Docker is great for development but currently doesn't make sense for production. The latest DCOS uses the docker images without the docker process and provides the high scale production quality needed for a large datacenter.


> please take it slow and make Docker great again!

Perhaps they should create some sort of firewall? I mean, the network packets coming through... they bring malware... they're spam... and some, I assume, are good traffic.


Guys.. This was a trump joke


I agree very much. Trying to figure out how Docker hacks your host's iptables and how to deal with it in a production network is a pain.


We never let Docker set its own iptables rules. It is a pain at first, but it forces you to understand how the rules work, what is going on, and has the added bonus of keeping those rules consistent across Docker releases and in your own version control.


so as I understand you add and remove the port mapping and other FW entries with your own scripts whenever you spin up / stop a container?


Port mapping isn't handled by iptables, only the wiring of the docker virtual interface. We take care of that piece, but let docker do whatever it wants within that interface.


Those packets should be definitely be deported back to where they came from.

And we should make them pay for the firewall.


The good traffic should be already permitted in the firewall. If you're still having issues the firewall rules are probably not being enforced. I've seen this happen like when an any any allow rule is left in from debugging ("just to get things working").


But when your entire service is built on the notion of accepting all types of traffic, it's insulting to block it based on nothing more than opinion (I would point out for the avoidance of doubt, OP was making a Trump reference...)


This should be renamed to "the sad state of swarm mode"


I am looking more and more about using the Apcera Platform. They have everything I need in a container management platform for free: https://www.apcera.com/community-edition


Do you work for Apcera?

Your submission page is heavy on the linux-toys.com domain, implying that it's yours. That same site identifies the author as working at Apcera.

There is also a submission in your account that points directly to an Apcera blogpost, with the same author name.

Disclosure: I work for Pivotal, we donate the majority of engineering to Cloud Foundry. Apcera has beef with us.


Yes, I work for Apcera and linux-toys.com is my personal blog where I talk about open source technologies (mainly Docker lately) and software that I develop. Like the article states, I am a Docker fanboy and want them to be great again. The blog website is actually running on a four node Raspberry pi cluster that I referenced in the blog as a link. The reason I joined Apcera is because they are also developing cool container technology. Until recently, Apcera was only available to enterprises and didn’t have a solution for “homeduction” users like me to host and run applications like my blog and a few other services. With the community edition of the software, I can gather a few spare x86 boxes and make the switch. I will still be running the same Docker images but using Apcera’s orchestration software instead of swarm.


Thanks. I'm glad to hear Apcera is branching out and that you're excited to work there.


Trying to run Docker on your own to do anything meaningful can be a very painful exercise.

I gave up after trying to spin up a cluster on DigitalOcean using v1.12. Like the author, I couldn't get my containers to see each other, something that worked before v1.12.


I personally just getting into Docker and am looking at the swarm feature for deploying containerized compute nodes to our cluster. People here seem to be complaining specifically about Swarm mode, is there anything I should watch out for? What does Kubernetes provide that Docker swarm doesn't?

I've tested Docker Swarm a bit and it seems to work as advertised.

Is it just about node selection not being sophisticated enough? In that case I don't need to worry since all my nodes are the same, but if there's any words of warning I'm all ears. Thanks.


I've been using docker since version 0.6 and followed almost every version upgrade since and it's an absolute mess. This blog post is spot on.

My old team had a production environment running in docker containers for about a year (this was pre-swarm, pre-kubernetes) and then transitioned to just using ansible for application deployment in a more "traditional" manner because we spent more time trying to fix broken things with docker than it was worth.


Question: we are looking to deploy a flask + redis based application to a single server on AWS using docker. No load balancing, no multiple server

what is the current best practice to do this (with logging, etc.) ? should I even be considering something like marathon/k8s, etc ?

currently I have a fat docker VM with supervisord and all services running in a single VM with highly fragile logging. I dont think this can last.

getting started seems very intimidating in the Docker world.


Dokku should be able to handle all of your concerns. We do load-balancing on the server via nginx, use Docker's plumbing (the docker command/api) to build nice porcelain (our own cli tool) for stuff like restart policies etc. It's targeted specifically at single-server solutions, and migrating once you are large enough to another platform is easy as we tend to not build in platform lock-in.

Feel free to jump on our slack or irc: http://dokku.viewdocs.io/dokku/getting-started/where-to-get-...

Disclaimer: I am a maintainer of Dokku


I recently moved my django running on docker to google container engine. Basically by following this tutorial you can be up in 15 minutes: https://cloud.google.com/python/django/container-engine . k8s picks up your stdout logs and sends them to stackdriver and you don't need to do anything to set it up. I was running nginx+gunicorn inside my docker image, but nginx part have been taken care of by k8s.


I changed the baity, over-general title "The Sad State of Docker" to what the first paragraph of the article says it's actually about.

Generally we moderate HN stories/threads less when a YC startup or YC itself is at issue. But we do still moderate them some, because the standards of the site still apply.

(We haven't done anything here besides this title edit, though, in case anybody is wondering.)


Negative comments here seem to be a gross overreaction. This is primarily a code release. Code is always going to keep moving. I have used swarm and it's nowhere close to unstable as people say it is. Is it production ready? Probably not. But which software is production ready in every big release?

Unnecessary fear mongering by people who have an outside agenda.


Which release is going to be production ready, and which features are production ready in each release? If Docker Inc documented which features are production ready, I'd have more sympathy for that point of view.

What we've had to do at Red Hat is always stay a couple releases behind (we're shipping 1.10+patches right now) and backport all the fixes from upstream releases to make it stable (ie production ready). Docker keeps shipping new versions, with fixes for old issues but then a whole new set of issues added in.


OK downvoters, I get it. Everyone hates docker but uses it anyway :-)


is swarm mode same as docker swarm [1]? Curious, Why they choose the same name if that's not the case.

Why would one use docker swarm[1] vs just docker swarm mode.

1.https://github.com/docker/swarm


It's not the same. Strangely the docker-swarm tutorial doesn't mention swarm mode. I wasn't able to follow it successfully with Docker 1.12. That was when I realized docker-swarm != swarm mode, so I tried the swarm mode tutorial instead, and it worked as expected.

Maybe docker-swarm is not supported anymore with 1.12? It was all pretty confusing.


>Maybe docker-swarm is not supported anymore with 1.12?

I am using docker-swarm with 1.12 so it's definitely supported. Yea I am really confused too :D.


Originally, there was the stand alone version of docker swarm, invoked using "docker-swarm". This was more difficult to set up than version 1.12's new swarm mode, invoked using "docker swarm", which does almost the same thing, but more succinctly. Presumably, they chose the same name because they serve the same purpose and, if the latter is set to replace the former, why come up with a new name for basically the same concept?


>if the latter is set to replace the former

Do you know if there was any guidance by docker team on this? Seems like there is active development going on for both docker-swarm and swarm-mode.


Development is mainly in swarm mode, but yes the older swarm is still supported, it is just an application that uses docker so will always work.


I guess my main confusion is when would someone choose to use docker-swarm over swarm-mode.


https://github.com/docker/swarm is going away.

https://github.com/docker/swarmkit is what gets integrated into Docker engine as of docker-1.12.0.


>https://github.com/docker/swarm is going away.

Their readme says

"Docker does not currently have a plan to deprecate Docker Swarm."

Do you know if they officially say otherwise elsewhere?


If you need a distributed PaaS alternative for RPi and others, you can look at https://github.com/tinspin/rupy.

It's simple and stable!


One reason for release of Open source software is testing.


Yes, but that's why "alpha", "beta" and "RC" labels exist.


rekt.. I mean, rkt


IMHO, nobody should care / worry about the underlying orchestration. The industry is moving toward the public service model, and all these management problem is solved by the cloud platform, not app developers. Check out hyper.sh, that's what container service would be.


Or I could manage my own network, which isn't that hard and for a lot of (if not most) cases, less expensive.


Every time I hear "The industry is moving toward X", I can't help but imagine a car racing off a cliff at full speed.


So you are saying that a new major software release that is less than a month old still has some bugs? I am shocked. Shocked! Well, not that shocked.

If you can't take the bleeding part of bleeding edge, wait for things to mature before using them. Bitching and whining that they didn't create a perfect product out the gate only belittles the hard work it took to get a new product out the door.

This is how software releases work in the real world!


It's one thing to knowingly release beta software as "beta".

It's another thing to bundle beta software into an existing package and label it as stable in order to stay relevant in the orchestration game.


Some bugs are ok and expected like you said but major functionality issues that make the new feature useless should be addressed prior to a release and huge marketing efforts.



Don't know what all the fuss is about when essentially you're getting Docker for free?


It is not about cost. It is about marketing and generating all the hype over something that clearly does not function properly. They act like they are laying golden eggs when in fact they are broken eggs.


There is a reason why they call it the bleeding edge. It's up to you to evaluate the technology to see if it fits within your infrastructure. Calling docker out and saying it's broken when its a free product seems to be miss guided.


Docker isn't free for everyone. The Docker project is maintained by a for-profit company which sells enterprise support and services. They're getting more or less the same broken product we get for free. They just get to wait three months for a fix instead of six...


Free and broken are not mutually exclusive.


It nothing about that. The main point is the whole community bitching a moaning about software that in all intent and purposes is given away free without any contribution by the users. The source code of Docker is open source so why doesn't the guy go in there and contribute his time and or his own money to support the project that to him is so vital?

No doubt the guys working on Docker are super smart guys, though they too have bill's to pay and lives to live. So if you want to see Docker improve how about contributing back and or supporting their efforts with extra funding. Maybe they will have the funds to bring on more full time developers to hunt those obscure bugs or add those features that everyone wants?

Discussing and putting down the developer with article titles such as 'Sad state of docker' to me shows self entitled the community around these projects are.


> given away free

Except for, you know, the people who pay for it...https://www.docker.com/pricing

> without any contribution by the users

Then who are the over 1,000 contributors to the Docker repository?

> supporting their efforts with extra funding

I think they're doing just fine. https://www.crunchbase.com/organization/docker/funding-round...

> how self entitled the community around these projects are.

As a member of the "self entitled" community, I see a lot of ignorance in this comment. The developers of Docker are not just the >100 employees of Docker Inc, but also the over 1,000 members community who have contributed to the Docker project.

Frustration with Docker isn't about entitlement, it is about frustration with a governing body run by a for-profit company with $180 million of funding who has shown a pattern of favoring shiny new features over stability and left their own community to clean up the mess of bugs left in their wake.

So no, I don't think the community is self-entitled, I think the community just wants to see the project they've worked so hard to help build be competently maintained.


Opinions here are my own.

> It nothing about that. The main point is the whole community bitching a moaning about software that in all intent and purposes is given away free > without any contribution by the users.

I remember that Jessie did a talk where she said that something like 4/5 contributions to Docker are from people who don't work for Docker Inc. I was one of them (I spend most of my time contributing to runC now because there's less politics there). So your point is categorically invalid.

> The source code of Docker is open source so why doesn't the guy go in there and contribute his time and or his own money to support the project that to him is so vital?

Docker Inc does not operate Docker like a community project. Maintainers are elected behind closed doors, most of the maintainers work for Docker Inc, Docker Inc has a roadmap that the community does not have access to, maintainers of Docker will not merge code if it is not strictly inline with Docker Inc's roadmap, they openly attack distributions like RedHat and openSUSE for applying out-of-tree patches to Docker.

> No doubt the guys working on Docker are super smart guys

The community is also fairly clever too, especially considering that we outnumber the number of people working for Docker Inc.


> Docker Inc does not operate Docker like a community project. Maintainers are elected behind closed doors, most of the maintainers work for Docker Inc, Docker Inc has a roadmap that the community does not have access to, maintainers of Docker will not merge code if it is not strictly inline with Docker Inc's roadmap, they openly attack distributions like RedHat and openSUSE for applying out-of-tree patches to Docker.

Thank you, thank you, thank you. This here is the truly sad state of Docker. Glad to see someone else in this thread sick of closed door governance bullshit and flat out bullying.


Which is super ironic as Redhat has some very nice "enterprise" patches for docker, that I use and prefer the redhat fork of docker due to:

https://github.com/projectatomic/docker/blob/rhel7-1.10.3/RE...


Extra funding? Because $100 million isn't enough?


$100 million isn't that much money for large teams.


The cost or otherwise of Docker is totally orthogonal to whether or not it is a 'good', non-broken piece of software.


Kubernetes is also free software. So is Mesos. And Nomad. On the runtime side, rkt and runC are also free software. Just because something is free (and gratis) software doesn't mean that people cannot complain and suggest improvements. That's how all free software development has worked since day 0.


That's terrible logic. Imagine if Google just decided to delete people's email randomly in Gmail ... because free.


Exactly. If I make a claim, you don't have to pay me to earn the right to dispute that claim. "Here's a lamp. It's magic." "Uhh...It doesn't appear to actually be magic. It's not even a decent lamp." "How dare you say it's not magic? I didn't charge you for it!"


Are you paying for the service?


Actually we do pay for "free".

Why do you think "free" is used as a marketing strategy? Obviously there is a monetary benefit somewhere - or anyone attempting a "free" or even just a "low(er) price" strategy must be doing wrong. An example is Uber, which has just generated yet another billion dollar loss headline.

Apart from the data collection model, what it does is it removes the incentive of others to enter that market and possibly create something better.

And on an individual level, something can have a cost merely by existing - because I have top choose. And if I choose wrong I end up worse than where I could be.

Imagine I was an evil guy from a Disney movie. My next project to annoy people might be to create an app that seems to work - but has a lot of very annoying flaws. However, I never mention the flaws (of course not), instead I go all out on my marketing and make a lot of claims to get people to use my evil software. I then sit back and with an evil laugh enjoy the frustration that I created a million times - the (Disney-level) evil scheme succeeded. I wasted lots of people's time; I may also have sabotaged other (non-evil) projects by taking users away from them, and I managed to raise the world-wide frustration level.


Unless you write a check out to them, then I classify that as being free. You're feeling that you're entitled to a service that you don't pay anything for.


That is so random - what does your reply have to do with what I wrote?


You don't get Gmail ads when you're using Gmail?


It's an issue of credibility.

I'd much rather pay for something that works than get something for free that doesn't.


Somebody got it right.


So because docker is free it is worthless?


Nobody even hinted at this.

The point is, if something doesn't work as claimed, it adds real 'cost' which is often much more than the relatively cheap cost of buying something that is more reliable.

That it is free is almost irrelevant - the cost/benefit of implementation, training, reliability, support, and possibly 'savings' using it vis-a-vis other solutions probably far outweigh any price tag it might have.

Apparently they launched a rev. with a bunch of problems - this is bad, and costly.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: