Interesting. I've been working on something related, a non intrusive docker 'extension', kliko. Kliko is a specification to formalize file based (no network yet) input/output flow for containers. It makes to possible to have a generic API for containers, so you can automatically create user interfaces for an application or chain them together in a pipeline.
There is doc available on the website, but I'm not sure if I explain the concept properly.
The assumption is that you have containers that operate on input files and generates output files. The behavior of the container depends on the given parameters which are defined in the kliko file. The user (or runner) will supply these parameters at runtime.
To illustrate, we use it for creating pipelines in radio astronomy where we operate on datasets of gigabyes or bigger. most of these tools are file based, they read files in and write files out. It is all quite complex and old software, so Docker is ideal for encapsulating this complexity. A scientist can easily recombine the various containers and play with the arguments. By the split of input/output the container effectively become 'functional', no side effects and the results can be cached if the parameters are the same. The intermediate temporary volumes can be memory based to speed things up. We use stdout for logging.
Pachyderm looks quite cool, but I think it's lacking a quick-start and a way of running things locally in a simple way.
I grabbed the repo and clicked through a few links in the docs and hit a 404. I searched on google and found a link to a way of running it just locally simply but that doesn't work with the new version. Then I followed the instructions and hit a problem installing something to do with k8 about mapped paths and the fix printed in the console doesn't work.
I understand that this is a personal complaint and others might not care at all about having it setup locally because it solves the big problems so well but I just want to try it at least locally.
I appreciate the basic idea, but I think the identification of containers with "function calls" seems more confusing than helpful. It's strange to think of, say, a running Redis instance as a "function call."
I think this article nails some of the basic issues people have with docker. That is, your app is usually not entirely self-contained. One generally needs to have a db, a cache db, some networking, something else, and dash of CI.
To run a containerized app ecosystem, you either type a load of crap in the `docker run` statement or you annotate the heck out of the compose file/dockerfile. Either way, you pray it works... I know that I have to wade into logs and docker exec bin/sh because of some kind of silly-ness a lot more than I would like.
The article is correctly, imo, pointing out there should be a better way to wire up basic containerized ecosystems.
Note that my only complaint is terminological, because I think "function call" signifies something completely different from "launching a long-lasting service".
Yes. If starting a Docker instance is a "function call", then what is the analog for making a HTTP request to a Docker-hosted server? The analogy was fine for describing the idea, and perhaps it even holds if the Docker image is a processing step instead of a service.
But if the Docker image is a service... We have a set of terminology (and technology!) already for services -- let's use those, rather than continue to force the analogy all the way down.
Noted. I think the intent is there... treat/use/consume containers like parts of your app. Ie. Call the DB container, do work, release the DB container, and so on. Not a completely separate "thing" that is a PITA to debug, prone to errors, and horrifically documented.
Why can't my app's ORM function call the DB container and do stuff? Why do I have to wire them up twice (code and OS layer) or more?
Wouldn't a better mental model be an object-as-a-process ala Smalltalk or Erlang?
I personally think the object analog is a for more useful. Rather than reinventing your own container composition system, you could simply adopt a dependency injection approach. It would be nice to see containers-object publish interfaces in a language/transport independent way that layers interfaces, endpoints and transport mechanisms.
I agree, the actor model is a significantly more usable metaphor for containers than functions. When you start thinking about supervisor trees, you start heading towards Kubernetes, which is interesting.
I think we're getting hung up on 'function' here: the article is really about the benefits of type systems (either runtime or compile time) as applied to the argument and result types of functions. The actor implementations I've used (mostly just Akka) lack strong types, but I think treating containerized apps as actors that can only send or receive specific message types would fix the problems the article brings up.
The format the author proposes for a new Dockerfile is semantically similar to how Juju charms describe services.
name: vanilla
summary: Vanilla is an open-source, pluggable, themeable, multi-lingual forum.
maintainer: Your Name <your@email.tld>
description: |
Vanilla is designed to deploy and grow small communities to scale.
This charm deploys Vanilla Forums as outlined by the Vanilla Forums installation guide.
tags:
- social
provides:
website:
interface: http
requires:
database:
interface: mysql
From having tried a few times on a mac, it's really hard to get started. The local provider is LXC-based so can only run on linux. I'd love it if it ran on docker or virtualbox.
I didn't know juju existed until now, but I've been working on something kind of similar which is a lot less hassle to get up and running (vagrant up, done) although I'm just one guy who doesn't have a hell of a lot of free time so it will be intended purely for creating local dev environments at least initially (and will come with a "do not run production services on this or you will die" style warning)
It takes some steps towards resolving some of the issues in the article as well as a number of other headaches I've encountered when trying to bend docker to my will and build a development environment I can use every day for everything without having to mess around with basic plumbing.
I can post a show HN about it in the days ahead if there's any interest - it's not anywhere near where I want it to be (least of all in terms of code quality) but the amount of time I can devote to coding is about to drop from "near zero" to "really REALLY near zero" for a while, and it's very usable and handy so it might be worth just tossing it out there as-is and coming back to it at a later date.
I read this as a specific proposal for future Docker, or some overlay description format for it. Also the author is talking about runtime checks while nix/guix provides install/configuration description.
Nix/guix solves some cool things, but doesn't do isolation (as in process isolation). It can do it (namespaces live in the kernel after all), but it's not a first-class thing.
That's cool! I see there's a lot of fun things already in. But also it seems like something the author didn't want. Nixos containers are explicitly whole system according to the documentation ("This command will return as soon as the container has booted and has reached multi-user.target.") rather than a single app.
'guix environment --container' allows one or more applications to be put into a container and you can optionally share files/directories from the host system with it. The distro built on Guix, GuixSD, has preliminary support for full-system containers using the same code that 'guix environment' uses. So you can have it whichever way you'd like.
It's init + your app, which is exactly what you want. PID 1 has some important responsibilities that aren't handled anywhere else. I'd argue that most init systems go far beyond what you actually want PID 1 to do, but that is a different discussion.
I definitely don't want any init in the container. I want the app runnning in there and nothing else. For a single application system, pid 1 has only 1 responsibility - collect zombies. I treat that as an emergency solution - apps should clean their children themselves under normal circumstances.
Everything else can be handled outside of the container.
Common Workflow Language http://commonwl.org is a spec (with multiple implementations) for wrapping command line tools (which may run inside a Docker container) as functional units.
The idea is a generalization of try/catch in which all effects are accomplished through handler blocks that receive a continuation. Just like when you make a kernel call, your program is a continuation given to the kernel. Usually the kernel calls you back once, but some syscalls return twice (fork) or not at all (abort) by manipulating processes.
Coupled with reusable handler block bundles, you can wrap a "container" around any expression in your entire program.
I usually think of containers as an expanded chroot jail.
They even serve a similar purpose: During the 32 bit to 64 bit switchover, I'd occasionally install 32-bit libs into a chroot jail to more easily compile software that depended on being 32-bit, without wanting to pollute the global space with 32-bit packages.
Could you point me to a succinct description of exactly the semantics of containers as an expanded chroot jail? I've been looking, but have so far not found anything.
The term "container" (on Linux) refers to the high-level combination of two concrete low-level interfaces: namespaces and control groups (cgroups).
I like to think of namespaces as converting certain global variables in the kernel to local variables for each process. These local variables are then inherited to child processes. chroots are the simplest example, although they predate namespaces. Ordinarily you'd think of a system as having a single root directory; somewhere in the kernel is a global variable DIR root. But in fact, each process has its own root directory pointer in the process structure. Most of those pointers have the same value in every process, but if you run chroot, you change that pointer for the current process and all its children.
The list of possible namespaces is in the clone(2) and unshare(2) manpages (`man 2 clone` and `man 2 unshare`); look for the options starting with CLONE_NEW. They all change some pointer in the process structure, either to a substructure of the original pointer (like chroot does), a deep copy of the structure, or to a new, empty structure.
CLONE_NEWIPC changes the pointer for routing System V IPCs. CLONE_NEWNET changes the pointer for the list of network device to a new structure with just a new loopback interface. CLONE_NEWNS copies the mount table instead of keeping a pointer to it, so your process can unmount filesystems without affecting the rest of the system, or vice versa. CLONE_NEWPID changes the pointer to pid 1 / the process ID table to point to yourself (effectively a chroot for process IDs). CLONE_NEWUSER changes the interpretation of user IDs, so you can have UID 0 in your process be a non-zero UID in the outside system. CLONE_NEWUTS creates a copy of the structure containing the machine's hostname, instead of keeping a pointer to the global structure.
cgroups are resource control. They let you say that a certain process tree has some maximum amount of RAM, or CPU shares, or so forth. This is useful for making containers perform the way you want, but doesn't really affect their semantics.
I'd guess that it's just a matter of how you view it - putting a minimum number of CPU shares on the root cgroup is the same as putting a maximum on the rest of the cgroups, right? But maybe one or the other documentation is wrong.
Both manpage and documentation are probably right, though I'd expect CPU throttling to be significant enough at least to be mentioned.
I'm not so sure about your guess, because that apparently works only when CPU is fully utilized. I'm not so sure about my reading comprehension either.
How do the copies of the variables that become local relate to the originals? What happens at
read/write, what happens if the host OS changes them? In other words, what is the semantics of
sharing/copying these variables?
For a chroot, you get a pointer to a subdirectory of the host root directory. Changes within that directory are visible in both directions. CLONE_NEWPID and CLONE_NEWUSER work similarly; every process has a PID and a UID outside of the container (that is, in the root namespace), but a subset of PIDs and UIDs are visible in the container, with their own values. Creating a process in a PID namespace causes it to get a PID counting from 1 in that namespace, as well as a PID counting from 1 in the parent namespace. A user account in a user namespace has a value (which could be 0) in the namespace, as well as a mapped value in the parent namespace.
CLONE_NEWIPC and CLONE_NEWNET create new, empty structures for the IPC namespace and network stack. Changes in one namespace aren't visible to another. You can move network devices between namespaces by using `ip link set dev eth1 netns 1234`, which will move eth1 out of the current process's network namespace and into process 1234's namespace. (This is occasionally useful with physical devices, but more useful with virtual devices like veth and macvlan.)
CLONE_NEWNS and CLONE_NEWUTS create a deep copy of the current namespace's mount table and hostname/domainname strings, respectively. Further changes in one namespace do not affect the other.
Has that been written up somewhere in a suitably abstract form?
Is there a list of variables that are affected, and how they are affected.
In particular, I wonder about network interfaces. Say your hardware
has a network interface that's got MAC address 0b:21:b5:e2:11:22 and
IPv4 address 123.234.34.45, how are these addresses affeced by
cloning?
You get a completely separate network stack. None of the network devices are copied/cloned. You can move a network device into the container, but it's no longer accessible in the host.
Since most people don't have spare physical devices, there are a couple of approaches using virtual devices. You can create a "veth" device pair, which is basically a two-ended loopback connection. Move one end of the veth into the container, configure them as 192.168.1.1 and 2 (or whatever), and set up NAT. Or you can create a "macvlan" device, which hangs off an existing device and generates a new MAC address. Any traffic destined for the macvlan's MAC address goes to the macvlan device; any other traffic goes to the parent device. So I can move the macvlan into the container and assign it the address of 123.234.34.46, and it will ARP for that address using its own MAC address.
The container also has its own routing table, iptables (firewall) rule set, etc. And anyone listening on a broadcast interface in the host won't get packets destined to the container, or vice versa. It's basically like a virtual machine.
Thanks for the description. I guess container networking like Weave works by 'hijacking' veth and executing NAT-like address translation.
like a virtual machine.
That's surprising. Containers were advertised to me as being much more lightweight than conventional virtualisation (e.g. VMware, Xen), because the former, unlike the latter, share the ambient operating system.
completely separate network stack
What does that mean exactly, does the container copy the actual code, or is the network stack's code shared, just run in a separate address space?
We use a swagger file the "function signature" for each dockerized module, and yes, the function analogy is quite appropriate if you modularize them well - a sloppy component is like a subroutine grabbing all over global variables even if it's in a docker VM, but a nice component does work like a function that just happens to sit somewhere beyond a network.
I've got a feeling that what the author describes already exists, just not on the same level. And those solutions have problem in common - not everybody wants to use that solution.
Specifically: puppet, chef, salt, juju, heat and others will happily deploy your application (container or not) and provide you with a generic interface for it. Docker is just one of the tiny building blocks here. They'll either check the model before doing anything or will check if it works by trying. Types or not, the system for implementing those interfaces exists.
But every month there's another configuration / orchestration / deployment system coming out. Some will use it, but it will only worsen the fragmentation. I feel like it would either have to be built into docker and enforced at that level, or it would be yet another system that 99% of operators doesn't use.
In my current project, I have my own mini-infrastructure, where each machine have a "watchdog" process that picks up items from global queue, and calls "docker run" with the item it picked up from the queue. Docker knows which program to start via ENTRYPOINT, and arguments to docker run are passed to this program. Each docker image gets its own queue and autoscaling instance group.
It works well for data processing tasks - my docker images are crawlers, indexers or analytics code in python or R. Deployment is quite simple - just push docker image, and it will be picked up on the next docker run. Images can add items to global queues, and for any bigger data they write things to a shared database.
Where each container is described by what types of things it takes as inputs and what it outputs -- ie as functions.
In practice we've built hundreds of these and composed them in endless variety for a large range of genomics tasks, processing petabytes of data at Seven Bridges Genomics.
At a low level isn't this about how you deal with foreign functions? You have the same issues of typing, validation and conurrency that you do with something like ctypes in python.
Yeah, seems like the question of data marshaling except in a more distributed and fault-tolerant sense. Maybe I'm misunderstanding this whole discussion?
Most examples seem to be concerned with I/O boundary crossing ("why ports, why volumes, why envs".) Is that the reason a type-system is mentioned, to provide safety, e.g. a firewall-like DSL, through documentation?
More emphasis could be put on at least two other values: discoverability and compatibility.
This is how I've been describing docker to people and also how I've been using it. It's why, when I need a specific version of elastic search or Postgres i just run the tagged service.
It turns any binary into a more portable executable.
Oops, I meant tagged images. If you look at the common images for like postgres, redis, and elasticsearch, they tag the images with the version of the service that's installed.
large programs written in assembler in the 1960s included exactly this sort of documentation convention: huge front-matter comments in English prose.
That is the current state of the container ecosystem. We are at the “late ’60s assembly language” stage of orchestration development. It would be a huge technological leap forward to be able to communicate our intent structurally.
Upon reading this, my first thought is: Please don't say XML!
https://github.com/gijzelaerr/kliko
Here is an example kliko file for a container defining the input and output:
https://github.com/kernsuite-docker/lwimager/blob/master/kli...