Source is a basic understanding of containers and a basic understanding of databases.
a) Docker and other container systems are still very young. They're missing a lot of badly-desired features and the community is still coalescing on best practices and safe approaches. It's certain there are complex and significant bugs. This is triply-true if you're using an orchestration framework over the top like Kubernetes. This isn't a reason why DBs are a bad fit in itself, but it's a reason why running production data in Docker is a bad idea at present.
b) Docker is meant for stuff that's portable and can be isolated from its hardware. It's meant to make it easy to run many applications on one machine without the resource overhead of virtual machines. DBs are distinctly not well-suited to this environment. DBs require a good deal of tuning and hardware awareness to run properly. DBs want to run forever and keep frequently-used data in memory; often, restarting a DB server is a big deal because it makes the caches go cold and the server is slow until they warm back up. DBs don't take kindly to sharing a box with 30 other applications. On a VM at least you know you have a dedicated chunk of memory. Not the case on a Docker host. As I discuss in C, Docker is basically the antithesis of long-running; there are many ways to make your container suddenly disappear or stop.
c) There are many non-obvious gotchas involved in Docker usage; it's a very non-intuitive interface. For example, using `docker attach` to connect to a running container will often place the user in control of the process. A simple ^C, which most sysadmins would interpret as "OK, I'm done with this log", closes the container's process and cleans up your container. That's just one example of a risk among many brought to you by the counter-intuitive and unfriendly Docker UX. Not the kind of fragility we want for something as mission-critical as a database system. Again, Docker is meant for processes that are consequence-free if they're cleaned up. Their UX is obviously designed that way too. You're supposed to run 8 containers from the same image and if one goes away the LB detects it and it's nbd. Databases don't work like that.
d) Docker sometimes becomes a zombie and fails to respond to commands. You can't connect to any of your running containers. You can't issue start or stop commands. You can't get output from `docker ps`. This has happened to me on multiple occasions. Do we really want a production DB running in that context?
e) You mentioned it, but storage. Even persistent volumes can be a PITA to configure properly with Docker, and if you fail to do so, you are looking down a blackhole backed by the slow AUFS virtual filesystem. By default, all data written to a Docker container goes into its own differenced image and when you stop the container, it "goes away" (though it can usually be recovered by calling up the specific container ID instead of restarting from the image). These AUFS volumes have a habit of consuming a lot of disk space on the host, which sometimes causes point d to occur if you get to 0. Volumes cannot be mapped at build time and must be defined at runtime. There are many bugs with data volumes, including data loss bugs and filesystem feature bugs.
The long and short of it is that Docker is oriented toward applications that aren't close to the metal and can have their log and write output easily redirected to more durable systems. This fits the bill for many web apps (the application part, not the database parts). It absolutely does not fit the bill for long-running, close-to-the-metal, mandatory-firm-and-reliable-storage systems like databases.
I use Docker to run local development databases (as well as applications). I wouldn't use it for any DB more important than that. Docker is a cool set of abstractions around jails, but it is not a universal solution, just as cloud isn't.
Yeah, I understand that the concept of process isolation has existed for a long time. We used to know them as "chroot jails". cgroups obviously is a feature that has been dormant in the Linux kernel for a long time (until Docker came along and convinced everyone they had to use them, which actually isn't really a point in favor of containerization either: "we depend on this lightly tested feature of the kernel!"). Those things on their own are substantially different from what are today known as containers, a mix of concepts that involves custom daemons, complex network routing layers, bolt-on orchestration components like Swarm/Kube, and all sorts of other voodoo that makes today's understanding of "containers" distinct from historical uses either of the term or the concepts implemented via jails/zones/whatever.
Regarding d, this decision with Docker to use a daemon to control your processes (that is not init) continues to baffle me. It seems on the surface of it to be a really naive way to schedule your cgroups, but you could just as easily do it with a wrapper that terminates. Or just swallow that pill and go with systemd already.
I haven't seen any other container like system on Linux with that design. And I haven't seen it go zombie on me, but I have had it die on me for reasons I have never fully understood. There are bugs in there, just like with any young project of course. It's possible to recover from that state with intimate understanding of how Docker works, but it wreaks havoc on higher lever tools.
a) This is not true. There are mature, stable and unlike linux attempts, actually secure "containers". See Jails and Zones.
b) See a.
c) This is a docker issue, other platforms do not suffer these UX flaws.
d) See a.
e) See ZFS on FreeBSD and Illumos.
You can of course continue to use ill designed software, such as docker, and wait for someone to "make it right" or you can use alternatives that are designed well and work now.
> This is not true. There are mature, stable and unlike linux attempts
The comment above was about Docker specifically. I challenge you to run Docker on anything else than Linux (in production, that is).
> You can of course continue to use ill designed software, such as docker
Yeah, you don't use Docker because it's well designed (or even reliable). You use it because it is the strongest industry wide attempt at a standard there has been for the past ten years. There is a semi-standardized image format and api.
When the time comes to move all software to The Cloud, and some believe that to be inevitable (which you may not agree with), odds are that Docker is one of the accepted formats your provider understands.
Jails or zones has nothing to do with this. They are an implementation detail. I find it likely that the cloud will still run on Linux ten years from now.
The comment above, which you're referring to, starts with this:
> Source is a basic understanding of containers and a basic understanding of databases.
>
> a) Docker and other container systems are still very young. ...
So it's seems to be meant to be about container systems in general, although it then talks about Docker as if it was a representative example (which it isn't).
FWIW I see nothing wrong with using containers (even Docker) for databases. Of course, it depends on what you expect from that - it may for example make automated deployment much easier, etc.
Discussing the modern implementations of containers, like Docker, rkt, and LXC. Things like jails have been time-tested and most of the complaints in the post aren't applicable to them. "Container" has taken on new meaning to mean such new-fangled container frameworks. IMO calling something like a jail a container is now obtuse and incorrect.
My post was primarily referring to Docker, as you can tell from the grandparent that says "Docker and database do not mix", and then the rest of the post body that speaks to Docker-specific issues.
a) Docker and other container systems are still very young. They're missing a lot of badly-desired features and the community is still coalescing on best practices and safe approaches. It's certain there are complex and significant bugs. This is triply-true if you're using an orchestration framework over the top like Kubernetes. This isn't a reason why DBs are a bad fit in itself, but it's a reason why running production data in Docker is a bad idea at present.
b) Docker is meant for stuff that's portable and can be isolated from its hardware. It's meant to make it easy to run many applications on one machine without the resource overhead of virtual machines. DBs are distinctly not well-suited to this environment. DBs require a good deal of tuning and hardware awareness to run properly. DBs want to run forever and keep frequently-used data in memory; often, restarting a DB server is a big deal because it makes the caches go cold and the server is slow until they warm back up. DBs don't take kindly to sharing a box with 30 other applications. On a VM at least you know you have a dedicated chunk of memory. Not the case on a Docker host. As I discuss in C, Docker is basically the antithesis of long-running; there are many ways to make your container suddenly disappear or stop.
c) There are many non-obvious gotchas involved in Docker usage; it's a very non-intuitive interface. For example, using `docker attach` to connect to a running container will often place the user in control of the process. A simple ^C, which most sysadmins would interpret as "OK, I'm done with this log", closes the container's process and cleans up your container. That's just one example of a risk among many brought to you by the counter-intuitive and unfriendly Docker UX. Not the kind of fragility we want for something as mission-critical as a database system. Again, Docker is meant for processes that are consequence-free if they're cleaned up. Their UX is obviously designed that way too. You're supposed to run 8 containers from the same image and if one goes away the LB detects it and it's nbd. Databases don't work like that.
d) Docker sometimes becomes a zombie and fails to respond to commands. You can't connect to any of your running containers. You can't issue start or stop commands. You can't get output from `docker ps`. This has happened to me on multiple occasions. Do we really want a production DB running in that context?
e) You mentioned it, but storage. Even persistent volumes can be a PITA to configure properly with Docker, and if you fail to do so, you are looking down a blackhole backed by the slow AUFS virtual filesystem. By default, all data written to a Docker container goes into its own differenced image and when you stop the container, it "goes away" (though it can usually be recovered by calling up the specific container ID instead of restarting from the image). These AUFS volumes have a habit of consuming a lot of disk space on the host, which sometimes causes point d to occur if you get to 0. Volumes cannot be mapped at build time and must be defined at runtime. There are many bugs with data volumes, including data loss bugs and filesystem feature bugs.
The long and short of it is that Docker is oriented toward applications that aren't close to the metal and can have their log and write output easily redirected to more durable systems. This fits the bill for many web apps (the application part, not the database parts). It absolutely does not fit the bill for long-running, close-to-the-metal, mandatory-firm-and-reliable-storage systems like databases.
I use Docker to run local development databases (as well as applications). I wouldn't use it for any DB more important than that. Docker is a cool set of abstractions around jails, but it is not a universal solution, just as cloud isn't.