A Container Is a Function Call

gijzelaerr · on Aug 15, 2016

Interesting. I've been working on something related, a non intrusive docker 'extension', kliko. Kliko is a specification to formalize file based (no network yet) input/output flow for containers. It makes to possible to have a generic API for containers, so you can automatically create user interfaces for an application or chain them together in a pipeline.

https://github.com/gijzelaerr/kliko

Here is an example kliko file for a container defining the input and output:

https://github.com/kernsuite-docker/lwimager/blob/master/kli...

valarauca1 · on Aug 15, 2016

So one process writes to a flat file, then another process reads the flat file? Wouldn't named pipes be more useful here?

Also it feels like you are just making a seralization format on top of Stdin/Stdout.

gijzelaerr · on Aug 15, 2016

There is doc available on the website, but I'm not sure if I explain the concept properly.

The assumption is that you have containers that operate on input files and generates output files. The behavior of the container depends on the given parameters which are defined in the kliko file. The user (or runner) will supply these parameters at runtime.

To illustrate, we use it for creating pipelines in radio astronomy where we operate on datasets of gigabyes or bigger. most of these tools are file based, they read files in and write files out. It is all quite complex and old software, so Docker is ideal for encapsulating this complexity. A scientist can easily recombine the various containers and play with the arguments. By the split of input/output the container effectively become 'functional', no side effects and the results can be cached if the parameters are the same. The intermediate temporary volumes can be memory based to speed things up. We use stdout for logging.

ymt123 · on Aug 15, 2016

If I'm understanding correctly that sounds similar to Pachyderm (http://www.pachyderm.io)

IanCal · on Aug 16, 2016

Pachyderm looks quite cool, but I think it's lacking a quick-start and a way of running things locally in a simple way.

I grabbed the repo and clicked through a few links in the docs and hit a 404. I searched on google and found a link to a way of running it just locally simply but that doesn't work with the new version. Then I followed the instructions and hit a problem installing something to do with k8 about mapped paths and the fix printed in the console doesn't work.

I understand that this is a personal complaint and others might not care at all about having it setup locally because it solves the big problems so well but I just want to try it at least locally.

bane · on Aug 15, 2016

This would likely benefit other scientific fields -- bioinformatics for example has the same sort of software tooling.

mbrock · on Aug 15, 2016

I appreciate the basic idea, but I think the identification of containers with "function calls" seems more confusing than helpful. It's strange to think of, say, a running Redis instance as a "function call."

monkmartinez · on Aug 15, 2016

I disagree.

I think this article nails some of the basic issues people have with docker. That is, your app is usually not entirely self-contained. One generally needs to have a db, a cache db, some networking, something else, and dash of CI.

To run a containerized app ecosystem, you either type a load of crap in the `docker run` statement or you annotate the heck out of the compose file/dockerfile. Either way, you pray it works... I know that I have to wade into logs and docker exec bin/sh because of some kind of silly-ness a lot more than I would like.

The article is correctly, imo, pointing out there should be a better way to wire up basic containerized ecosystems.

mbrock · on Aug 15, 2016

Note that my only complaint is terminological, because I think "function call" signifies something completely different from "launching a long-lasting service".

jdmichal · on Aug 15, 2016

Yes. If starting a Docker instance is a "function call", then what is the analog for making a HTTP request to a Docker-hosted server? The analogy was fine for describing the idea, and perhaps it even holds if the Docker image is a processing step instead of a service.

But if the Docker image is a service... We have a set of terminology (and technology!) already for services -- let's use those, rather than continue to force the analogy all the way down.

monkmartinez · on Aug 15, 2016

Noted. I think the intent is there... treat/use/consume containers like parts of your app. Ie. Call the DB container, do work, release the DB container, and so on. Not a completely separate "thing" that is a PITA to debug, prone to errors, and horrifically documented.

Why can't my app's ORM function call the DB container and do stuff? Why do I have to wire them up twice (code and OS layer) or more?

recursive · on Aug 15, 2016

After a `function call` returns its result, it no longer exists.

monkmartinez · on Aug 15, 2016

Until the function is called again, no?

recursive · on Aug 16, 2016

Sure. However, you can't say... (reasonably) record your customer information in a function, and then retrieve it using the same function.

Maybe containers are like objects. Objects have state and multiple methods.

steego · on Aug 15, 2016

Wouldn't a better mental model be an object-as-a-process ala Smalltalk or Erlang?

I personally think the object analog is a for more useful. Rather than reinventing your own container composition system, you could simply adopt a dependency injection approach. It would be nice to see containers-object publish interfaces in a language/transport independent way that layers interfaces, endpoints and transport mechanisms.

felixgallo · on Aug 15, 2016

I agree, the actor model is a significantly more usable metaphor for containers than functions. When you start thinking about supervisor trees, you start heading towards Kubernetes, which is interesting.

pkinsky · on Aug 15, 2016

I think we're getting hung up on 'function' here: the article is really about the benefits of type systems (either runtime or compile time) as applied to the argument and result types of functions. The actor implementations I've used (mostly just Akka) lack strong types, but I think treating containerized apps as actors that can only send or receive specific message types would fix the problems the article brings up.

stcredzero · on Aug 15, 2016

Wasn't this the idea behind Java "Servlets" back in the day? Also, I'd say that it was process-as-an-object in Smalltalk.

jackweirdy · on Aug 15, 2016

The format the author proposes for a new Dockerfile is semantically similar to how Juju charms describe services.

    name: vanilla
    summary: Vanilla is an open-source, pluggable, themeable, multi-lingual forum.
    maintainer: Your Name <your@email.tld>
    description: |
      Vanilla is designed to deploy and grow small communities to scale.
      This charm deploys Vanilla Forums as outlined by the Vanilla Forums installation guide.
    tags:
      - social
    provides:
      website:
        interface: http
    requires:
      database:
        interface: mysql

https://jujucharms.com

mintplant · on Aug 15, 2016

> $ juju deploy haproxy

> $ juju deploy mediawiki

> $ juju add-relation haproxy mediawiki

> $ juju deploy mariadb

> $ juju add-relation mediawiki mariadb

> $ juju deploy memcached

> $ juju add-relation mediawiki memcached

> $ juju expose haproxy

That's really cool! Why isn't this talked about more? Bias against Canonical?

jackweirdy · on Aug 15, 2016

From having tried a few times on a mac, it's really hard to get started. The local provider is LXC-based so can only run on linux. I'd love it if it ran on docker or virtualbox.

SmurfJuggler · on Aug 15, 2016

I didn't know juju existed until now, but I've been working on something kind of similar which is a lot less hassle to get up and running (vagrant up, done) although I'm just one guy who doesn't have a hell of a lot of free time so it will be intended purely for creating local dev environments at least initially (and will come with a "do not run production services on this or you will die" style warning)

It takes some steps towards resolving some of the issues in the article as well as a number of other headaches I've encountered when trying to bend docker to my will and build a development environment I can use every day for everything without having to mess around with basic plumbing.

I can post a show HN about it in the days ahead if there's any interest - it's not anywhere near where I want it to be (least of all in terms of code quality) but the amount of time I can devote to coding is about to drop from "near zero" to "really REALLY near zero" for a while, and it's very usable and handy so it might be worth just tossing it out there as-is and coming back to it at a later date.

rdtsc · on Aug 15, 2016

Wonder why he never mentions Nix/NixOps or Guix.

If we talk about purely functional configuration and runtime changes those seem to be like the projects to focus on.

viraptor · on Aug 15, 2016

I read this as a specific proposal for future Docker, or some overlay description format for it. Also the author is talking about runtime checks while nix/guix provides install/configuration description.

Nix/guix solves some cool things, but doesn't do isolation (as in process isolation). It can do it (namespaces live in the kernel after all), but it's not a first-class thing.

arianvanp · on Aug 15, 2016

We van start nix expressions in isolated containers with a single command! https://nixos.org/releases/nixos/14.12/nixos-14.12.374.61adf...

viraptor · on Aug 15, 2016

That's cool! I see there's a lot of fun things already in. But also it seems like something the author didn't want. Nixos containers are explicitly whole system according to the documentation ("This command will return as soon as the container has booted and has reached multi-user.target.") rather than a single app.

k__ · on Aug 15, 2016

Doesn't Nix have nix-shell, which is like this?

rdtsc · on Aug 15, 2016

That's good stuff! Thanks for sharing that.

davexunit · on Aug 15, 2016

    guix environment --container

viraptor · on Aug 15, 2016

Same as the response to arianvanp. Looks good. But isn't this mostly about system in a container rather than one app (binary + deps)?

davexunit · on Aug 15, 2016

'guix environment --container' allows one or more applications to be put into a container and you can optionally share files/directories from the host system with it. The distro built on Guix, GuixSD, has preliminary support for full-system containers using the same code that 'guix environment' uses. So you can have it whichever way you'd like.

https://www.gnu.org/software/guix/manual/html_node/Invoking-...

viraptor · on Aug 15, 2016

What I meant is - what's the top level of the container? The target app, or an init system?

xj9 · on Aug 15, 2016

It's init + your app, which is exactly what you want. PID 1 has some important responsibilities that aren't handled anywhere else. I'd argue that most init systems go far beyond what you actually want PID 1 to do, but that is a different discussion.

viraptor · on Aug 16, 2016

I definitely don't want any init in the container. I want the app runnning in there and nothing else. For a single application system, pid 1 has only 1 responsibility - collect zombies. I treat that as an emergency solution - apps should clean their children themselves under normal circumstances.

Everything else can be handled outside of the container.

davexunit · on Aug 16, 2016

I'm not sure what xj9 was referring to, but it's not how the tool I'm describing (that I'm the primary author of) works.

davexunit · on Aug 16, 2016

The target application, which the user can define to be whatever they want. Here's launching a Ruby REPL as PID 1:

   guix environment --ad-hoc ruby --container -- irb

tetron · on Aug 15, 2016

Common Workflow Language http://commonwl.org is a spec (with multiple implementations) for wrapping command line tools (which may run inside a Docker container) as functional units.

brandonbloom · on Aug 15, 2016

Taken to its logical extreme, you get Algebraic Effect Handlers: http://math.andrej.com/2012/03/08/programming-with-algebraic...

The idea is a generalization of try/catch in which all effects are accomplished through handler blocks that receive a continuation. Just like when you make a kernel call, your program is a continuation given to the kernel. Usually the kernel calls you back once, but some syscalls return twice (fork) or not at all (abort) by manipulating processes.

Coupled with reusable handler block bundles, you can wrap a "container" around any expression in your entire program.

MichaelBurge · on Aug 15, 2016

I usually think of containers as an expanded chroot jail.

They even serve a similar purpose: During the 32 bit to 64 bit switchover, I'd occasionally install 32-bit libs into a chroot jail to more easily compile software that depended on being 32-bit, without wanting to pollute the global space with 32-bit packages.

mafribe · on Aug 15, 2016

   containers as an expanded chroot jail.

Could you point me to a succinct description of exactly the semantics of containers as an expanded chroot jail? I've been looking, but have so far not found anything.

geofft · on Aug 15, 2016

The term "container" (on Linux) refers to the high-level combination of two concrete low-level interfaces: namespaces and control groups (cgroups).

I like to think of namespaces as converting certain global variables in the kernel to local variables for each process. These local variables are then inherited to child processes. chroots are the simplest example, although they predate namespaces. Ordinarily you'd think of a system as having a single root directory; somewhere in the kernel is a global variable DIR root. But in fact, each process has its own root directory pointer in the process structure. Most of those pointers have the same value in every process, but if you run chroot, you change that pointer for the current process and all its children.

The list of possible namespaces is in the clone(2) and unshare(2) manpages (`man 2 clone` and `man 2 unshare`); look for the options starting with CLONE_NEW. They all change some pointer in the process structure, either to a substructure of the original pointer (like chroot does), a deep copy of the structure, or to a new, empty structure.

CLONE_NEWIPC changes the pointer for routing System V IPCs. CLONE_NEWNET changes the pointer for the list of network device to a new structure with just a new loopback interface. CLONE_NEWNS copies the mount table instead of keeping a pointer to it, so your process can unmount filesystems without affecting the rest of the system, or vice versa. CLONE_NEWPID changes the pointer to pid 1 / the process ID table to point to yourself (effectively a chroot for process IDs). CLONE_NEWUSER changes the interpretation of user IDs, so you can have UID 0 in your process be a non-zero UID in the outside system. CLONE_NEWUTS creates a copy of the structure containing the machine's hostname, instead of keeping a pointer to the global structure.

cgroups are resource control. They let you say that a certain process tree has some maximum amount of RAM, or CPU shares, or so forth. This is useful for making containers perform the way you want, but doesn't really affect their semantics.

cesnja · on Aug 15, 2016

But do cgroups really allow setting maximum CPU shares? man 7 cgroups says this about the cpu subsystem:

> Cgroups can be guaranteed a minimum number of "CPU shares" when a system is busy. This does not limit a cgroup's CPU usage if the CPUs are not busy.

Granted my english isn't the best, but this doesn't seem to indicate any throttling.

geofft · on Aug 15, 2016

I'm not very familiar with cgroups, but the documentation mention in that manpage ( https://www.kernel.org/doc/Documentation/scheduler/sched-bwc... ) talks about throttling and maximum CPU usage.

I'd guess that it's just a matter of how you view it - putting a minimum number of CPU shares on the root cgroup is the same as putting a maximum on the rest of the cgroups, right? But maybe one or the other documentation is wrong.

cesnja · on Aug 15, 2016

Both manpage and documentation are probably right, though I'd expect CPU throttling to be significant enough at least to be mentioned.

I'm not so sure about your guess, because that apparently works only when CPU is fully utilized. I'm not so sure about my reading comprehension either.

mafribe · on Aug 15, 2016

Thanks "geofft" and "cesnja".

   converting certain global variables

How do the copies of the variables that become local relate to the originals? What happens at read/write, what happens if the host OS changes them? In other words, what is the semantics of sharing/copying these variables?

geofft · on Aug 15, 2016

It depends on the thing you're unsharing.

For a chroot, you get a pointer to a subdirectory of the host root directory. Changes within that directory are visible in both directions. CLONE_NEWPID and CLONE_NEWUSER work similarly; every process has a PID and a UID outside of the container (that is, in the root namespace), but a subset of PIDs and UIDs are visible in the container, with their own values. Creating a process in a PID namespace causes it to get a PID counting from 1 in that namespace, as well as a PID counting from 1 in the parent namespace. A user account in a user namespace has a value (which could be 0) in the namespace, as well as a mapped value in the parent namespace.

CLONE_NEWIPC and CLONE_NEWNET create new, empty structures for the IPC namespace and network stack. Changes in one namespace aren't visible to another. You can move network devices between namespaces by using `ip link set dev eth1 netns 1234`, which will move eth1 out of the current process's network namespace and into process 1234's namespace. (This is occasionally useful with physical devices, but more useful with virtual devices like veth and macvlan.)

CLONE_NEWNS and CLONE_NEWUTS create a deep copy of the current namespace's mount table and hostname/domainname strings, respectively. Further changes in one namespace do not affect the other.

mafribe · on Aug 15, 2016

Thanks, this is very useful.

Has that been written up somewhere in a suitably abstract form?

Is there a list of variables that are affected, and how they are affected.

In particular, I wonder about network interfaces. Say your hardware has a network interface that's got MAC address 0b:21:b5:e2:11:22 and IPv4 address 123.234.34.45, how are these addresses affeced by cloning?

geofft · on Aug 18, 2016

You get a completely separate network stack. None of the network devices are copied/cloned. You can move a network device into the container, but it's no longer accessible in the host.

Since most people don't have spare physical devices, there are a couple of approaches using virtual devices. You can create a "veth" device pair, which is basically a two-ended loopback connection. Move one end of the veth into the container, configure them as 192.168.1.1 and 2 (or whatever), and set up NAT. Or you can create a "macvlan" device, which hangs off an existing device and generates a new MAC address. Any traffic destined for the macvlan's MAC address goes to the macvlan device; any other traffic goes to the parent device. So I can move the macvlan into the container and assign it the address of 123.234.34.46, and it will ARP for that address using its own MAC address.

The container also has its own routing table, iptables (firewall) rule set, etc. And anyone listening on a broadcast interface in the host won't get packets destined to the container, or vice versa. It's basically like a virtual machine.

mafribe · on Aug 18, 2016

Thanks for the description. I guess container networking like Weave works by 'hijacking' veth and executing NAT-like address translation.

   like a virtual machine.

That's surprising. Containers were advertised to me as being much more lightweight than conventional virtualisation (e.g. VMware, Xen), because the former, unlike the latter, share the ambient operating system.

    completely separate network stack

What does that mean exactly, does the container copy the actual code, or is the network stack's code shared, just run in a separate address space?

PeterisP · on Aug 15, 2016

We use a swagger file the "function signature" for each dockerized module, and yes, the function analogy is quite appropriate if you modularize them well - a sloppy component is like a subroutine grabbing all over global variables even if it's in a docker VM, but a nice component does work like a function that just happens to sit somewhere beyond a network.

viraptor · on Aug 15, 2016

I've got a feeling that what the author describes already exists, just not on the same level. And those solutions have problem in common - not everybody wants to use that solution.

Specifically: puppet, chef, salt, juju, heat and others will happily deploy your application (container or not) and provide you with a generic interface for it. Docker is just one of the tiny building blocks here. They'll either check the model before doing anything or will check if it works by trying. Types or not, the system for implementing those interfaces exists.

But every month there's another configuration / orchestration / deployment system coming out. Some will use it, but it will only worsen the fragmentation. I feel like it would either have to be built into docker and enforced at that level, or it would be yet another system that 99% of operators doesn't use.

kozikow · on Aug 15, 2016

In my current project, I have my own mini-infrastructure, where each machine have a "watchdog" process that picks up items from global queue, and calls "docker run" with the item it picked up from the queue. Docker knows which program to start via ENTRYPOINT, and arguments to docker run are passed to this program. Each docker image gets its own queue and autoscaling instance group.

It works well for data processing tasks - my docker images are crawlers, indexers or analytics code in python or R. Deployment is quite simple - just push docker image, and it will be picked up on the next docker run. Images can add items to global queues, and for any bigger data they write things to a shared database.

dkural · on Aug 15, 2016

We've been doing this for a while in the genomics space, check out "common workflow language": https://github.com/common-workflow-language/common-workflow-...

Where each container is described by what types of things it takes as inputs and what it outputs -- ie as functions.

In practice we've built hundreds of these and composed them in endless variety for a large range of genomics tasks, processing petabytes of data at Seven Bridges Genomics.

sly010 · on Aug 15, 2016

To think in terms of parametrizing containers I find the libcontainer spec [1] and runc to be way more useful than what docker has to offer.

[1] https://github.com/opencontainers/runc/tree/master/libcontai...

pmb · on Aug 15, 2016

Type-checked container interfaces is the reason for protocol buffers.

7952 · on Aug 15, 2016

At a low level isn't this about how you deal with foreign functions? You have the same issues of typing, validation and conurrency that you do with something like ctypes in python.

davesque · on Aug 15, 2016

Yeah, seems like the question of data marshaling except in a more distributed and fault-tolerant sense. Maybe I'm misunderstanding this whole discussion?

jadbox · on Aug 15, 2016

I think you're looking for Swagger. It's been around now for awhile, works well, and has a good ecosystem of tools around it.

kreisquadratur · on Aug 15, 2016

Most examples seem to be concerned with I/O boundary crossing ("why ports, why volumes, why envs".) Is that the reason a type-system is mentioned, to provide safety, e.g. a firewall-like DSL, through documentation? More emphasis could be put on at least two other values: discoverability and compatibility.

chewbacha · on Aug 15, 2016

This is how I've been describing docker to people and also how I've been using it. It's why, when I need a specific version of elastic search or Postgres i just run the tagged service.

It turns any binary into a more portable executable.

Great post!

bogomipz · on Aug 15, 2016

What is a "tagged service"? Could you elaborate? I'm assuming this has nothing to do with docker tags?

chewbacha · on Aug 17, 2016

Oops, I meant tagged images. If you look at the common images for like postgres, redis, and elasticsearch, they tag the images with the version of the service that's installed.

https://hub.docker.com/_/postgres/

blorgle · on Aug 15, 2016

Sounds like zerovm? http://www.zerovm.org/

duaneb · on Aug 15, 2016

Not at all; that is more to do with deterministic execution than useful abstraction.

stcredzero · on Aug 15, 2016

The crux of the matter:

large programs written in assembler in the 1960s included exactly this sort of documentation convention: huge front-matter comments in English prose.

That is the current state of the container ecosystem. We are at the “late ’60s assembly language” stage of orchestration development. It would be a huge technological leap forward to be able to communicate our intent structurally.

Upon reading this, my first thought is: Please don't say XML!

lil1729 · on Aug 15, 2016

Or in other words, how to "parameterise" a container. An interesting idea.

gsmethells · on Aug 15, 2016

I always thought Docker leaned more towards "process as a platform".