Hacker News new | past | comments | ask | show | jobs | submit login
Docker for Beginners (prakhar.me)
326 points by vegasbrianc on Jan 12, 2016 | hide | past | favorite | 62 comments



Hi everyone,

Author here. There are a bunch of awesome[0] tutorials on Docker so why another one? Well, my motivation was to have a guide (for myself and for others) on how to deploy dockerized apps on the cloud. So in this tutorial, apart from giving an intro to docker, I demonstrate how to use Elastic Beanstalk for single-container and ECS for multi-container deployments.

Here's are the two apps we deploy on AWS for example

1. Catnip - A simple flask app (single-container): http://catnip.elasticbeanstalk.com/

2. Foodtrucks - A simple app to discover foodtrucks in SF (Flask + Elasticsearch in multi-containers): http://sf-foodtrucks.xyz/

I'm new to Docker myself so I'm sure I've made mistakes. Let me know if you have ideas on how to improve this!

[0] - http://docker.atbaker.me/


Awesome! I will try ECS :D 2 questions about ECS : I have few script I'd like to run on ECS. I created many images with the same dependencies, only the script code change. I was thinking attaching the script with volume . Is that a good way to do it ? Is ECS handle volume well ?


Thanks, this is really great!

I learned a lot about how to use Elastic Beanstalk and the ECS cli from this guide.


ECS is really the only way I'd go with Docker - So far in my experience, Docker in production is really a lot harder than the casual developer / devops person realizes based on the "getting started with docker" tutorials.

My conclusion with Docker is that, in general™, you really need to have a justifiable reason to go whole-hog into Docker, especially if you're not on AWS / considering ECS.

I'm glad the article covers ECS, as it makes a lot of the scheduling / config issues simpler!


> ECS is really the only way I'd go with Docker

ECS is AWS-specific, which is perfectly fine for some. But Kubernetes has been amazing for us. It abstracts many of the differences between AWS/Google Cloud, it's open source, and is far more powerful and flexible.

The only issue right now is that setting the cluster up involves running some shell scripts (yuck). We use Google Container Engine (hosted Kubernetes on Google Cloud), so we don't have to deal with that, but the option is there should we ever need to go multi-cloud.

Figured I'd toss that out there for anyone struggling with ECS (it can be a bit rigid) or keeping an eye on things beyond AWS. Kubernetes is still young and rough in areas, but it is a nice, opinionated way to orchestrate containers.


wow so instead of running my configuration management stuff I know need to either: - be on aws - be on gcloud - any other cloud that has docker support out of the box

OR: I need docker + a configuration management to setup my environment, or I need to somehow manage my coreos configs. Oh and finally don't forget to run a network over a network cause it's dockerish so for most clouds this means we run a network on a network which runs a network.

docker adds so much complexity. people just don't see this right upfront and use most of their time into these stuff, but there are easier ways to deploy.

YES if you are really big and if your servers needs to scale way beyond the most than you need it probably since configuration management won't help you and setting up servers even in the cloud take some time looking at some netflix articles. however i just don't get it why people use their time for docker when there are other things to do in their programs.


But it doesn't really, it is just another tool to learn like your configuration management software. Honestly, learning Docker to a high level was WAY easier than getting my Chef knowledge to a mediocre level. Further, it has other benefits like being MUCH easier to set up well isolated dev environments with Docker than with Chef. Since we switched we can usually have someone running the dev environment and tests within an hour or so of unboxing their laptop. With chef running the whole thing from scratch took that long by itself (after you installed all the software) and usually failed. And the idea that it is too complicated for something simple- I built a sample blogging app that you can deploy to a digital ocean or ec2 machine with like 3 or 4 commands (https://github.com/pbecotte/devblog). The system we have built at work has 8 separate services between a bunch of data stores, background workers, and a couple different pieces of our app- but Docker allows you to run the entire environment on your local machine and then deploy that same setup to our cluster, without any differences to account for making all those apps run on one VM for your local environment (which can get bad when one service requires a different version of a ruby gem than another for example).


How do you do the same thing without a cloud Environment?

Do you install the software manually? Do you configure your network manually? Do you configure your os manually? Do you install the docker daemon manually?

What about installations behind firewalls? Or about code that shouldn't belong to the docker registry since it shouldn't be pushed over the internet ?

etc...

Why do you have 8 separted services anyway? How many people does your company have? for 8 services you should at least have 8 * 3 people.


I second that. Docker makes it very easy to separate the whole application into a bunch of isolated services and then reproduce the whole production environment on a laptop for a very convenient development.


> wow so instead of running my configuration management stuff I know need to either: - be on aws - be on gcloud - any other cloud that has docker support out of the box

No. You need a networked computer with a quasi-recent kernel. There are no requirements to use a cloud provider or to use AWS/GCP.

> OR: I need docker + a configuration management to setup my environment, or I need to somehow manage my coreos configs. Oh and finally don't forget to run a network over a network cause it's dockerish so for most clouds this means we run a network on a network which runs a network.

This sounds like aimless rambling. You don't have to use any of this. You can, of course, but I can do some nasty stuff without Docker as well. Like I mentioned with Container Engine, setting up a Kubernetes cluster is like two mouse clicks. You can get as simple as ECS, or you can roll your own from the ground up. You have the option to pick a point on a spectrum. With ECS, it's Amazon's way or the highway.

> however i just don't get it why people use their time for docker when there are other things to do in their programs.

Because it saves us loads of time developing, testing, and deploying our systems. There is more initial setup work, but after that we are deploying images seamlessly, have a great rolling upgrade and rollback story out of the box, and get a lot of other bonuses like service discovery, auto-scaling (vertically and horizontally), and much better (higher) resource usage levels. And we're far from a mega-corp.

To me, it sounds like you may have skimmed some, perhaps even played a bit. But you ran into a snag, threw up your hands, and have summarily dismissed an entire ecosystem after your experience(s). I see enough un-informed or 100% incorrect things above to think you're missing some details.


> Because it saves us loads of time developing, testing, and deploying our systems. There is more initial setup work, but after that we are deploying images seamlessly, have a great rolling upgrade and rollback story out of the box, and get a lot of other bonuses like service discovery, auto-scaling (vertically and horizontally), and much better (higher) resource usage levels. And we're far from a mega-corp.

Question what benefits do you get with docker than without? Without I have Cgroups, my software is versioned i.e.: programm-X.X.X.jar / programm-X.X.X.rpm / whatever. Why do you get better / higher resource usage I mean you use an additional layer you could do the same with just CGroups. How do you do service discovery? Etcd could be installed without docker?! How do you auto-scaling without a cloud? I mean even without docker auto-scaling is trivial.


> Question what benefits do you get with docker than without? Without I have Cgroups, my software is versioned i.e.: programm-X.X.X.jar / programm-X.X.X.rpm / whatever. Why do you get better / higher resource usage I mean you use an additional layer you could do the same with just CGroups. How do you do service discovery? Etcd could be installed without docker?! How do you auto-scaling without a cloud? I mean even without docker auto-scaling is trivial.

Not to be dismissive, but why would we want to use cgroups? Docker works great for us. There are indeed tons of alternatives to everything. Why cgroups? Why not BSD and jails? Why not build on unikernels and VM images?

> Why do you get better / higher resource usage I mean you use an additional layer you could do the same with just CGroups.

Kubernetes (and most other Docker orchestration systems) come with a scheduler. Containers are allocated based on desired characteristics to the machines with the most spare capacity. You can cut down on idle capacity by intelligently spreading the load without having to think very much about it.

You could do the same with cgroups, but you're going to need to write an orchestration system, a scheduler, and you probably won't have the massive communities that some of the alternatives have already built.

> How do you do service discovery? Etcd could be installed without docker?!

Yes, it can be installed without Docker if you'd like. We run some things in Docker, and others outside ourselves. It's no silver bullet.

> How do you auto-scaling without a cloud? I mean even without docker auto-scaling is trivial.

Docker is just one piece of the puzzle, which is something you seem to be missing. In our case, Kubernetes brings it all together into an easy-to-use package.

Life is full of few absolutes. It's great if your cgroup deploy is working for you, it's just not a great fit for everyone. The comparison between Docker and cgroups is a bit apples to oranges, too. In reality, you need to compare a lot more than just the tech, the Docker ecosystem is the biggest appeal to us.


Why are shell scripts yuckworthy?


I can't speak for the commenter, but for me the ... concern ... with using shell scripts is the complete lack of understanding of what they are going to do. It's not exactly a lack of trust for the Kubernetes team as much as when it goes toes-up (not if), then one needs to be able to understand what it did in order to understand what it didn't.

I haven't used their cluster provisioning in a while, but back when I did it made a lot of modifications to AWS before falling over, at which point I need to either undo those changes or the scripts need to be smart enough to resume where they left off. Shell scripts are not well suited to that purpose, in my experience.

Yes, I am aware of their salt stack procedure, but it doesn't hold a candle to the simplicity of Kubernetes on CoreOS, which unlike the 23+ directories worth of salt things to read is something you would probably fit on 3 A4 sheets of paper: https://github.com/coreos/coreos-kubernetes/tree/master/mult... I provisioned a Vagrant cluster using the Kubernetes salt mechanism just for this comment, and the "Cockpit" username and password didn't work after it was finished. Which salt config contains that information? Beats me. I'm thankful that `vagrant ssh master` did as expected.

I recognize that the previous paragraph is just _my_ experience and _my_ preferences, but the anecdote is the reason why (in my experience) one must understand what the "magic" deployment process is doing, and only after that voluntarily cede control to the scripts merely as a labor-saving tactic. It does me no good to have a cluster brought to life that I then have zero idea how to maintain.


Totally agree! To be honest I'm still a bit wary of using ECS via the web console. Too many (new) terms to understand! Thankfully the folks at AWS were smart enough to make a CLI that understands docker-compose. So that's how I ended up using it.


We're working on a whole ECS toolkit at http://convox.com/.

There are similarities to the ECS CLI, but we are going for the whole "batteries included" approach.

Builds, Logs, cluster and app scaling, encrypted environment, and more.


> ECS is really the only way I'd go with Docker

I have not tried it myself personally but I have seen a blog post comparing it to Kubernetes: https://railsadventures.wordpress.com/2015/12/06/why-we-chos... It sounds like ECS is not really there either.


The weak part of Docker is volume/data management and it is interesting that the article almost never mentioned it. There are solutions that attach networking storage to containers, but that just adds a latency to the application. If one wants access to the fast local storage, then Docker requires significant extra efforts to set things up.


I think you can do exactly what you are asking for with the host machine's local storage with docker today. Where do these docs fall short for you?

https://docs.docker.com/engine/userguide/dockervolumes/

I would first start with the use case, checkout what came out in docker 1.9 and the read up on what the various volume plugins provide.

(I work for ClusterHQ and we read and hear a lot of feedback around the topic, also )


My point is that most Docker tutorials and documentation underestimates the complexity of persistent data management. Docker is not going to solve that. However, once one takes care about the data, Docker is really nice for service isolation and deployment.


The guys over at the great http://www.thecloudcast.net have been regularly following this topic recently. It's a treasure trove of devops information, anyone interested should subscribe IMHO.


I still don't know how to give docker an IP address or "physical" network card on my network.

I was able to do this very easily with solaris zones, and even BSD jails, but every installation of docker that is any way integrated into packages seems to be unable to do this.

perhaps I'm simply not using the right google search terms.


I've found pipework (https://github.com/jpetazzo/pipework) still to be the easiest way to manage networking in containers when you know what you want. Start up your container with --net=none and then something like:

  pipework mybr01 mycontainername 1.1.1.3/24@1.1.1.1
Docker 1.9's enhanced networking functionality is supposed to make this easier, although I've had a hard time understanding how to use it on my preferred RHEL/CentOS platform (the doco I've read either glosses over details or assumes you use docker-machine & swarm, neither of which I want to use in a simple deployment).

Weave is supposed to make this easy to (and offers some simplicity over docker 1.9's requirement for a clustered key-value store), but I got frustrated at their implicit iptables nat rule for networks created by default and haven't worked out how to stop that.


Is your intention to expose the ports of a docker container? If you want to do this search for "expose" ports. The docker run option is -p and -P for this.

Or to create a subnet with custom IP for the docker daemon's bridge network? Use -b option to docker daemon.

https://docs.docker.com/engine/userguide/networking/ is a good place to start. If you have a specific question, feel free to ask and I can try to answer.


no, that's not my intention, my intention is to not have to deal with iptables mangling packets and adding netfilter tags to everything.

not to mention port collisions with things that must run on predefined ports (think SMTP or pesky applications that keep redirecting you back to port 80)

I'm looking to expose 'an IP' similar to a bridged/open network in KVM.


Would doing --network=host do this? It makes the container use the hosts networking stack so there's no funny business.


Yes it would help here but would also expose new security loopholes.


I've had the same struggle. I'm moving my network to use a vxlan overlay for VMs so I ended up building a vxlan driver for docker networking which lets me do the same thing for containers. That project is here: https://github.com/clinta/docker-vxlan-plugin

Docker seems really intent on NATing containers behind the host, which IMO is not acceptable from a security perspective when I want to firewall outbound access based on a containers role.

Kubernetes is another option which creates a unique IP per container pod on the same network as the host. Not as flexible as a vxlan approach where containers can be micro-segmented into specific networks, but more like BSD jails that you're used to.


I did this now: https://github.com/pjperez/docker-wormhole

Example Docker images for iPerf client and server:

https://github.com/pjperez/docker-iperfserver

https://github.com/pjperez/docker-iperfclient

Edit to clarify: You can deploy each of the above images on different hosts located on different providers. There's no need for the hosts to have visibility between each other at all.


I haven't tried it yet with Docker, but I'm starting a networking service (pretty much SoftEther as a service) that might work on your case.

The idea is that your Docker containers will connect to a central server (443/TCP outgoing traffic) and have a virtual private network among them through this "hub", so they've got full access to each other. This is a layer 2 network and in my offering I have DHCP running by default to simplify things (100.64.0.0/24... naughty, I know :)). The communications are encrypted, so effectively you've got a sort of Virtual Private Network between your containers.

As I said, I haven't tried yet with Docker, but it's worth a shot. My service simplifies the process of getting up and running.

https://wormhole.network


Take a look at Calico - https://github.com/projectcalico/calico-containers

It's a networking solution that uses BGP to connect containers across multiple hosts, and so can very easily integrate with existing infrastructure.



This doesn't really help, exposing network ports is not the same as giving a container a seperate IP address.


Is this what you're looking for? https://github.com/docker/docker/pull/19001


Yep. Glad to see it was merged. I wonder when it will trickle down into my distro of choice. :)


You should check out get.docker.com. Most of the distro versions are pretty old/broken.


While the technical tutorial is good, I can't say anyone should choose Docker for the reasons outlines in the "Why should I use it?" section. "Because its popular" will only give you a headache when you are trying to deploy your simple rails app.


Great point! While the popular aspect is true to some extent, I've found that for my side projects (I'm still a student) Docker has vastly decreased the pain in deploying my projects. I used to have "but it works on my machine" moments but now using the same container in production is super awesome!

I'm sure Docker is not a panacea to all your infrastructure problems but it surely is a worthwhile tool to learn :)


>I've found that for my side projects (I'm still a student) Docker has vastly decreased the pain in deploying my projects. I used to have "but it works on my machine" moments but now using the same container in production is super awesome!

Thats a far stronger reason to use Docker - as I'm sure people have wrestled with this issue when trying to use Capistrano/Grunt/Ansible to deploy as well.


But "It's popular" seems to be one of the main reasons.

Otherwise more people would use Nix.


How does nix offer any of the advantages that docker does? Isn't it just another way to do config management? Plus the added complexity of using a very niche os. It seems like it's not popular for a reason...


Nix is a package manager with isolation, not a config manager. And it runs on any Linux distro or even Mac OS X, you don't need to use NixOS.

https://nixos.org/nix/


Prakhar, this is phenomenal. I'm a Columbia student as well and actually just came across your blog with this article. Hope to meet up at some point!

This is definitely a great start to Docker and I like how you provided an application to allow the reader to just work through the process of deploying something. Will definitely recommend this, as Docker is something easier shown than explained. Great stuff man!


Thanks Uday! I'm planning to hold a workshop in the coming months on campus where I'll be going over this stuff in greater detail. Feel free to join in!

PS: If you do come to the workshop, drop by and say Hi!


I'm always surprised to see a comparison to Virtual Machines. The containers seem more like an enhanced chroot jail than a CPU emulator, and I've always used those for the similar purpose of isolating a tricky build environment.

They even have some of the same restrictions(Docker needs root, as does chroot; they both work by making system calls lie to the process).

Whenever I hear a comparison with VMs, I wonder for a second, "Wait, is there some clever way to invoke the virtualization instructions without evicting references to the OS from the CPU's context to provide isolation without a separate guest OS?"


As a Docker noob, it would be so useful if there was a list of Docker use-cases that someone could link to?


At the risk of oversimplifying, you can use Docker wherever you might want to use a full-fledged VM. You can use it to run a job, host a webserver or any other scenario that you can think of. What makes Docker (rather containers) different from a VM is the way it sandboxes all your application's dependencies into an isolated sandbox.

So let's say you have a python app that for relies on a dependency that needs C bindings (e.g. ImageMagick). Instead of running `./app.py` freshly downloaded from some Git <repo>, you would run `docker run <repo> ./app.py`. In the former case, you would need to care of, say, the C dependencies. In the second case, they are packaged in the image that Docker will download from <repo> prior to run the ./app.py process in it. (Note that the two <repo> are not the same things. One is a Git repo, the other is a Docker repo - called an Image.)

Think of process of the building a container as taking a snapshot of the entire OS (such as VM images) but w/o the high overhead of running these images.

Feel free to reach out to me if you need more clarification!


If you're working on multiple projects simultaneously (e.g. multiple microservices within a company or multiple client projects if you are a freelancer), Docker is a godsend to quickly get your development environment setup. All you need to declare your "infrastructure" (e.g. PostgreSQL, Redis, etc.) in a docker-compose.yml and `docker-compose up` will get you up and running in no time.



Docker does great marketing but to my knowledge I can't use the toolbox to effortlessly (without Vagrant) set up a non-trivial microservice development flow on my Mac yet.

On the deployment side there are also still lots of inconsistencies between the different tools but I can see it becoming the go-to pragmatic way to bootstrap your private "cloud" very soon.

Again deployment is only part of the solution and I wish Docker would hire more good people to work on different development "best-practices" for Linux as well as OSX.

If Docker eventually wants to become a major service provider they should complement that kind of tooling with the same vigorous level of documentation and blogging as Heroku, Digital Ocean, Codeship and the likes come up with consistently.


On my Mac I run docker under CoreOS VM, the same OS I use on production, and use lsyncd to transparently synchronize all the files that I touch in the editor into the VM. Works like charm.


I bet you would help not only me but lots of people if you'd share that workflow.

Which hypervisor are you using?

Is Vagrant still part of that setup (a tool that should become obsolete with a pure docker development approach IMHO)?

I didn't find anything like that on either the Docker nor CoreOS official docs unfortunately. Both of those companies are fighting for territory in this super lucrative space, one of them should provide it without relying on the greater community (like one of you or me doing what's arguably their job for free).

Edit: I'm sorry if this comes off as the typical abrasive comment but I've been working with both those technologies for more than a year now and mind you not alone. A couple of good developers and sysadmins I work with are also still trying to figure out how to go about all of these issues and what I see is a big disconnect between what gets advertised vs. where we are at this point. As somebody who did FreeBSD jails and OpenVZ container system administration as well as general "distributed systems" development myself for many years I also admit to painfully miss the amazing simplicity of creating a monolith and effortlessly deploying it to Heroku. It's something I've become used to over the last couple of years and I miss it, even though I myself find the promise of the current "microservice" trend very interesting as well.


The setup is straightforward.

I use VirtualBox to run CoreOS, but VM can run anything as long as it comes with a recent Docker and does not require a lot of maintenance. Then I run lsyncd to synchronize files from the host transparently into that VM and edit whatever files I need in the Emacs on the host. When I need to run a docker command, I do it either from Emacs by prefixing the command with ssh local-vm-name or from a ssh session in a terminal.

To test the things I use, for example, another VM where /etc/hosts points for my production domains to the VM with Docker. Another useful thing is to expose during development, say, PHP/JS code that I edit directly into container for quick testing feedback. For that I can run the container with an extra host volume mount that override the software tree in the image with one that comes from Docker VM and where lsyncd copies all my changes that I made in Editor.


Thanks for answering, do I understand correctly that you develop directly on CoreOS with Emacs? If so I think your workflow might be more of an outlier as I reckon most people tend to work on OSX (possibly even running some IDE of some sorts).

I think a lot of confusion comes from unclear definitions so I try my best here, it would be great if you could chip in once again...

# Premise

1. our base operating system is Mac OSX which we will simply call osx

2. on top of osx we run a hypervisor (e.g. xhyve, vmware, virtualbox)

3. on top of our hypervisor we are running a virtual machine with CoreOS as our docker_engine (with the help of or without vagrant)

4. on top of our docker_engine we run an arbitrary set of docker_container instances most of which are comprised of a "bespoke" docker_image of our own; a certain "microservice" in development

# Questions

How do we:

a) map a ("microservice") project's source code directory located on our osx file system onto the currently instantiated development docker_image for our current project's development docker_container

osx => hypervisor => docker_engine => docker_container

b) and pass any file changes from osx down to the docker_container level as well (i.e. inotify, lsyncd, etc)

BTW, if we'd wanted to make this an even more helpful effort why not make this a proper gist?

https://gist.github.com/musha68k/399c66374ca54c665fd5


I do not run Emacs in VM, I edit files on the host! lsyncd just transparently copies all my changes into the VM, see the details in the gist. In another setup this also worked with Eclipse when lsyncd synchronized compiled classes into the VM.


So... you ensure that you are using the same OS in dev and production... doesn't this defeat one of the main reasons for using docker in the first place?


It's my understanding that containerization's major benefit is not for development but for system maintenance. It allows sysadmins the luxury to stop spawning VMs left and right, since you can safely avoid all the overhead of keeping full VMs worth of hardware resources for applications that will ultimately only need them in short bursts.

By itself, containerization is great already, without involving the devs at all.

What Docker attempts to provide is a Framework that allows Devs and Sysadmins the use of the same tools. Ideally, if you can get your devs to use this framework, you will unlock the next step: you no longer deliver sources or packages in production, but rather entire Docker Images, ready to use and with clearly identified interfaces with other systems. It's a dream come true for IT sysadmins, who can focus on their own problematics, monitoring, logging, security, resource management, network architecture etc.

And that's what great about the Docker effort: it's the devs trying to be the best wingmen in the world with their sysadmin pals.


Docker have a lot more than just the containers - they acquired Tutum for a cloud based deployment workflow and Universal Control Plane for on prem. The Docker hub and DTR also help with build and deployments of containers.

It's a set of tooling and an ecosystem and not just the container technology.


Thank you, I'm familiar with the various "merges and acquisitions" (fig, tutum cloud etc) but my point was that there are still a lot of unfulfilled promises and that Docker's admittedly great marketing can be misleading.


neat article. I didn't know now you can run Docker on EB!!

I recommend start with Docker Machine (https://docs.docker.com/machine/)

Another well-written beginner tutorial (but with a few hardcoded urls): http://stackengine.com/docker-101-01-docker-development-envi...


awesome - very well written m8


I still prefer using automation like ansible or saltstack over docker. You can't run it on very popular OpenVZ VPS and it's another useless layer of abstraction with security holes in it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: