In a classic "apps run on a known instance" model, when the instance starts having performance issues, I can ssh to it and use the usual tools (iosat, top, atop, netstat, etc...). With docker, how do you correlate instance/docker, and do perf analysis?
Install the tools you need via the Dockerfile (iostat, top, atop, etc.), and an sshd. Then, instead of running the single web process, run supervisord via CMD, which will subsequently launch both your web app process, AND sshd. From there, EXPOSE 80 22, and you can SSH into the container to run any perf analysis tools as usual.
Couldn't you just connect to the master host? I'm not sure how namespacing is done inside the kernel, but I would assume that the host would have access to containers info.
It seems like there is a high level of trust between micro-services, but it's not clear what is the basis of this trust. For example, is any service allowed full permissions on any other service? Is there authentication and authorization in the system?
The other Docker setups I've seen just go by port/host. This probably isn't enough, in many cases. Certainly you don't want to do that on a shared host.
I meant to add that, but essentially we map /var/log in the container to /var/log on the host, and then use rsyslogd to push that into a centralised logstash.
> Do you have to take pains to not accidentally log user and secure information to a third party when you use Papertrail?
I only use Papertrail for personal projects that don't have any real security requirements. $7/month is alot less hassle than the time it takes to setup Logstash+Kibana+ES.
However, for anything with security requirements I'd run Logstash+Kibana+ES over a VPN.
Typically, you want to supply the container with a mount point that is outside the container. This way, if the container is replaced, your data isn't impacted.
The traditional high-availability method is to run the database servers in pairs, and redeploy using the failover-failback method. You have DB servers A and B, with A as the primary and B mirroring A.
1. Promote B to primary and switch the clients over so that they write to B.
2. Redeploy A, and wait for A's replication to catch up to B.
3. Promote A back to primary and switch the client writes back to A.
4. Redeploy B and wait for B's replication to catch up to A.
5. Have a drink.
Responsible ops practice is to follow this procedure on every deploy, because the failover process has presumably been designed, engineered, rehearsed, and tested in production – as it has to be, because it might happen at any moment during an emergency – whereas the redeployment you're about to do has never been tried in production before and you can never be certain that it isn't going to take down your database server processes for a millisecond or an hour.
Docker doesn't really help or harm this process, though it does subtly encourage it, because the adoption of Docker and the adoption of an immutable-build philosophy often go hand in hand.
If you don't have firm confidence in your database failover procedure, you don't want to host your database in a Docker container.
Thanks for the updated post. Could you give us a little more information / git gist ;) on how you achieved this:
> Docker registry doesn’t inherently support the concept of versioning, so we have to manually add it using the Jenkins version numbers.
> shell scripting to ensure that only 3 were kept in place on each deployment,
Did this involve you appending the version numbers to the image tag? We have a pretty similar set up, and something which you might want to look at is the registry being a SPOF, if that is down then none of the new nodes created by the ELB can be provisioned - create an AWS autoscale group of 1 and assign it an elastic IP to ensure if it goes down the kind bots at Amazon will bring another up for you. (Will require some cloud-init scripting)
> Every time we checkin to GitHub, Jenkins is called via a post commit hook and builds, unit tests, and integration tests our code as is typical in a continuous integration setup.
Are the tests run against the docker image that is going to pushed on passing build?
> As images are pushed into the Docker registry they are versioned using the Jenkins build number.
Because you want your version numbers to make semantic sense. Your entire team intuitively understands that version 4137 is more recent than version 4134, that version 3527 is in the distant past, and that going from version 4138 to version 4137 is a sensible rollback, whereas going from version 4138 to version 4136 is either a mistake or a response to a major failure of QA.
Similarly, resist the urge to name servers generically. "There's something wrong with web-347!" is a sentence that you can shout across a crowded ops war room, whereas "web-129.22.8.44" or "web-a781bc23" or "instance i347bd944" are much harder to pronounce and much easier to typo.
Again not sure how Contino do it but if your using autoscaling groups the best way todo this is pass in a bootstrap script (shell or ansible) into the group when creating it so that will pull down the correct images if more instances are needed. To give yourself some control, you should probably do what we do - pass a simple wget in which pulls an ansible playbook stored in S3 so we can change version numbers etc without taking down the whole group. I've found there are many ways todo this but keeping things simple helps alot.
In a classic "apps run on a known instance" model, when the instance starts having performance issues, I can ssh to it and use the usual tools (iosat, top, atop, netstat, etc...). With docker, how do you correlate instance/docker, and do perf analysis?