Minideb – A small image based on Debian designed for use in containers

vmarquet · on Sept 19, 2017

In addition to the "minimalist" aspect, this image seems to offer better practices on a security level than official Debian images. From their README: "The images are built daily and have the security release enabled, so will contain any security updates released more than 24 hours ago."

A recent analysis showed that the debian:latest image is "updated roughly every month" https://anchore.com/blog/look-often-docker-images-updated/

zenlikethat · on Sept 19, 2017

> In addition to the "minimalist" aspect, this image seems to offer better practices on a security level than official Debian images

I'm skeptical about this claim. Almost every image built from the Debian official image begins with `apt-get update` before you can actually install anything, which means you will almost always have the latest packages at the time of building.

nyrikki · on Sept 20, 2017

While not as small it is trivial to make an up-to-date debian base image (or Fedora/Arch) any time you want. If you care about security you probably don't want to use random unverified images anyway.

  $ sudo debootstrap stretch mydebian http://mirrors.kernel.org/debian/
  $ cd mydebian
  $ sudo tar -c . | docker import - mydebian

Plus you can add files to the system before taring etc...

If you have significant work to do on an image a Dockerfile can often be far more complex than this method.

gramakri · on Sept 20, 2017

apt update only updates the package list and not the packages themselves. So unless the docker file contains apt upgrade it still uses old packages.

zenlikethat · on Sept 20, 2017

There's nothing to upgrade. Pretty much nothing is installed already in most base images. Even `ca-certificates` etc. have to be installed.

yebyen · on Sept 20, 2017

are you saying that most base images don't have ssl?

because i'm a baseimage maintainer (http://github.com/phusion/baseimage) and I don't think that's true...

https://github.com/phusion/baseimage-docker/blob/master/imag...

nyrikki · on Sept 20, 2017

No offence intended and these terms are not tightly defined but I would call your imaged a 'baked' image.

My feeling is that I don't know how long a base-image will stick around. If ca-certificates is installed in my base image it may end up trusting revoked certificates.

IMHO it is better to know you need to install/bake in ca-certs from a trusted source than to having a built in, potentially compromised CA cert installed.

Baked images, which I use to reduce instantiation time, or 'golden' images that are immutable infrastructure tend to have shorter lifespans and the CA package is carried in the application dependencies and more likely to be up to date.

yebyen · on Sept 20, 2017

No offense inferred!

It is not intended that users will download this baseimage (although it is a supported configuration, you can use FROM phusion/baseimage) but, that this will be an image definition that users can easily rebuild and build off of it.

Step one in Docker competency is "do you know exactly where your image comes from, and can you rebuild it from scratch without trusting that some rando on the internet didn't put bad stuff in there?"

Step two is "ok, do you really actually build them, though"

This image has traditionally been based on LTS ubuntu, and if you look at the CentOS derived version that hasn't been updated since 2014 (pokle/centos-baseimage), they chose not to include ca-certificates or hardly anything else.

(I'm assuming that tianon/centos:6.5 does not install ca-certificates by default...)

I'm sure many people use FROM phusion/baseimage but personally, even as a maintainer, I don't. I'd change the image source to whatever upstream of Ubuntu I'm preferring today, and probably build that from scratch too. The value in this image is not that it comes pre-built, it's that the build is tested and supported. /side tangent

zenlikethat · on Sept 20, 2017

Ah, I wasn't being specific enough. "Official" (library) images are what I was referring to.

kasabali · on Sept 20, 2017

"base image" != "baseimage", obviously. there was no need for this cheap plug.

yebyen · on Sept 20, 2017

You can go ahead and explain what you think base image means then, because it's not obvious to me how this is different than that (and I'm a domain expert.)

A base image is an image that you're meant to build off of, it is not meant to be deployed as an application but as a base for your image. What part of what I said was disingenuous? I gave a link to some source code that I didn't write and provided a counter example, identifying myself as a contributor. What did you contribute?

kasabali · on Sept 21, 2017

Great misunderstanding of what I said, this is two in a row when considering the above, bravo.

What is wrong with your first comment is:

1. zenlikethat's comment obviously talks about base images in general, not about a specific base image ("...most base images...")

2. Your reply which disagrees with that is based on a very basic fallacy (there is a one base image which contains ssl so zenlikethat's claim is false!?)

3. Your example is a base image which you're a maintainer of

Considering (1) is obvious and (2) is a very basic fallacy nobody here should fall for, your comment seems like an intentional and unwarranted plug for "phusion/baseimage" instead of a valid and honest disagreement to zenlikethat's comment. (extra points for unwarranted plug of your domain expertise in immediate parent comment)

yebyen · on Sept 21, 2017

I'm supposed to provide examples that I am unfamiliar with?

Forgive me for misunderstanding your comment, but `"base image" != "baseimage"` did not have a great deal of substance to it, nor did it provide me with any insight about the topic.

I provided an example, to facilitate the discussion. So back to my last question, what did you contribute?

Edit: https://en.1jux.net/scale_images/357695_b.jpg

Have a laugh and a beer!

viraptor · on Sept 19, 2017

I couldn't find anything in the post that correlates the Debian updates with security notices (which is your main point). If a security advisory comes out every 2-5 weeks and Debian updates on the same schedule, then I don't see a problem. But the data is just not there.

(These have to be advisories actually affecting the image, not all of them)

vietjtnguyen · on Sept 19, 2017

I don't really work in this domain so maybe I'm missing something. If the goal is to essentially get the bare minimum needed to run a program into a Docker image why not develop your program in your desired environment and then use something like CDE [1] to copy (or obtain a list of) all the files touched in the desired invocation of the program. That copy or list can then be put into a tarball and imported with "docker import". Philip Guo even writes about this possible use [2].

Here's a silly example:

  cde python -c "import numpy as np; print(np.random.randn(3, 3).tolist())"
  pushd cde-package/cde-root/; tar cavf ../../cde-image.tar *; popd
  docker import cde-image.tar $USER:python-randn33
  docker run $USER:python-randn33 python -c "import numpy as np; print(np.random.randn(3, 3).tolist())"
  docker run -t -i $USER:python-randn33 python

If you look at the resulting "cde-image.tar" you'll find it to be quite bare. Mines had only 387 entries (files and folders).

[1]: http://www.pgbovine.net/cde.html

[2]: http://pgbovine.net/automatically-create-docker-images.htm

zenlikethat · on Sept 19, 2017

Probably because syscall interception is not sufficient to create robust Linux program images. It will be an awkward moment if a stat, open, etc. that the program attempts in production doesn't work as expected because it wasn't run in development / bundling images. You'd have to execute every possible code path in the CDE bundling step to work properly.

vietjtnguyen · on Sept 19, 2017

So it becomes a matter of whether or not you can achieve good coverage of your execution paths to account for all possible filesystem touches? Further invocations of "cde" with respect to the same "cde-package" folder will actually append to the "cde-root" file system copy so if you could manage to canvas your program's execution paths then the resulting file tree copy should be sufficient?

rkangel · on Sept 19, 2017

You're right it is a question of coverage of execution paths, but that's a non-trivial problem.

Have a look at the lengths that AFL uses to get even close: http://lcamtuf.coredump.cx/afl/

[tl;dr it intruments execution while using a genetic algorithm to mutate inputs optimising for code coverage]

Statically determining dependencies is a lot easier and a lot more reliable! Particularly as you only need the base image once, and any extras on top are another layer on the Docker FS.

vishvananda · on Sept 19, 2017

I'm also a fan of minimal images. Cde is an iteresting solution, but for dynamic languages like python packing everything into a virtualenv and shipping that is a reasonable solution. To automatically grab linked libraries you can use something like smith[1]

[1]: https://github.com/oracle/smith

monkmartinez · on Sept 19, 2017

I imagine that if you created a product that could run my Python code without building a full container or image, you might call it "serverless."

option_greek · on Sept 20, 2017

https://github.com/alexellis/faas

kitotik · on Sept 20, 2017

Or a Unikernel

zwerdlds · on Sept 19, 2017

for anyone else interested in the actual stats, I grifted this from their pr on the image:

The minideb image currently weighs in at around 50MB uncompressed. For comparison the debian library image is 123MB, the alpine image is 5MB, and the newly released amazonlinuximage is 328MB.

zorbash · on Sept 19, 2017

Current debian:stable-slim is 55.3MB and is already optimized for containers (see: https://gist.github.com/Zorbash/183b80d37bd0a09434e3a2b1a958...).

wiremine · on Sept 19, 2017

Good data point!

Anybody have experience with minideb vs. debian:stable-slim? Any pros/cons to either approach?

chipironcin · on Sept 19, 2017

I can list some pros (comparing to debian:stable-slim) for Minideb: It is easy to build, has good documentation, has no blobs committed in the repo, is automatically built and tested on a daily basis.

Worth also mentioning it is the base image for ALL the Bitnami Containers that at the same time are also automatically built, updated and tested. You can take a look at all of them in Github. There are a ton of them

ridruejo · on Sept 19, 2017

Once you do anything interesting (i.e. install Python) then Alpine and minideb size are basically identical

pebers · on Sept 19, 2017

I guess it's a bit more specific than you meant it, but our standard Python image is ~20MB (alpine + python3, basically); that's still under half of minideb.

Does look interesting for things that need glibc compatibility though. There are some packages to help with that in Alpine but they only go so far.

ridruejo · on Sept 19, 2017

Thanks, this is where I got the data from:

https://dzone.com/articles/minideb-a-minimalist-debian-based...

pebers · on Sept 19, 2017

From looking at what I think is the Dockerfile for that image (https://github.com/docker-library/python/blob/b1512ead24c6b1...), that image is complex; it's downloading & building Python in it and adding & removing a dev toolchain, in a few different layers.

I'm not surprised that I got something a lot smaller from just running `apk add python3.6`, although as a result they are not comparing apples to apples; their minideb example does pretty much exactly the equivalent (i.e. downloading the distro-provided package, not compiling it within their image).

raesene6 · on Sept 19, 2017

Yep. I like alpine and use it for my images for preference, but some things like getting Ruby on Rails working with therubyracer can bascially hit a wall in alpine, so this could be pretty handy.

SjuulJanssen · on Sept 20, 2017

In my experience python in alpine is is sometimes 2x slower. How does ruby perform under Alpine?

c2h5oh · on Sept 19, 2017

On the other hand if you don't need Python because it's a container running a service like Redis or Nginx you can keep entire Alpine-based image around 25-50% of the size or bare minideb image.

ComputerGuru · on Sept 19, 2017

I know old school native code is looked down upon these days, but there are people that write code without bloated runtime dependencies.

You don’t need to add python “to do anything interesting.”

TheDong · on Sept 20, 2017

Your numbers are wrong. Here's the actual values for each of those images uncompressed:

    debian	101M
    alpine	4.1M
    amazonlinux	160M

These were determined with the following:

    for image in debian:latest alpine:latest amazonlinux:latest; do
        docker pull $image
        size=$(docker save $image | wc -c)
        echo "\n$image is $(echo $size | numfmt --to=iec)\n"
    done

Please note that it's important to only do this test with either a completely clean image store OR to save an exact sha256 image digest tag.

If you try to do a `docker save` on a store where you have multiple copies of an image, it's easy to mess up and save old layers too. I suspect that's what happened with your amazonlinux test above.

zwerdlds · on Oct 3, 2017

Sorry, not my numbers - as noted in the comment, I took them from the press release.

0x006A · on Sept 20, 2017

debian:stable-slim 56M

smegel · on Sept 19, 2017

BusyBox image between 1-5mb.

SwellJoe · on Sept 19, 2017

This seems smarter than some of the container OS approaches that start from distros that have weak or no package management, and rely entirely on the "container" model to provide updates and some combination of spit and duct tape to build them. There's a smallish Fedora for containers (exists in the Docker registry) as well; it's about 70MB, which is still a little beefy.

Anybody know how big minideb is?

Edit: zwerdids posted that it's ~50MB so a wee bit smaller than the currently commonly used Fedora container image. And an order of magnitude bigger than an Alpine image.

CoreXtreme · on Sept 19, 2017

It just saves top 5-6MB over Debian slim version of images. It's not worth my time to use this instead of officially debian supplied images.

Edit: I would like to know where exactly this 5-6MB is saved.

kasabali · on Sept 20, 2017

I guess removal of some essential packages

angrygoat · on Sept 20, 2017

The `install_packages` command looks like a big win compared to the rather spammy form most Dockerfiles use now to install packages, e.g:

    $ install_packages mime-support

vs:

    $ apt-get update && apt-get install -y --no-install-recommends mime-support && \
        apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

That's a great win in itself, this is excellent.

geofft · on Sept 19, 2017

Neat. Some overlap with https://github.com/GoogleCloudPlatform/distroless ; I think one big difference is that Google's approach uses bazel to download and unpack .debs, and this one uses standard Debian tools (debootstrap). But the end result sounds similar.

downrightmike · on Sept 19, 2017

Picking up where Damn Small Linux left off. http://www.damnsmalllinux.org/download.html

ridruejo · on Sept 19, 2017

There's a bunch of them, especially around networking use cases. Alpine itself was a fork of another embedded Linux distro. One appeal of Minideb and a big reason why we developed it is that at the end of the day, it is still Debian and you have access to all the DEB packages out there, which tend to cover more ground and be more actively maintained than more niche Linux distros

mst · on Sept 19, 2017

wiremine's question upthread as to how this compares to debian:stable-slim seems like it would benefit from an answer from you or another maintainer (assuming I correctly read the "we" in your comment)

ridruejo · on Sept 19, 2017

As some other comment pointed out, we update it daily. When we started debian-slim was not an option for us, but it has caught in terms of size and features, so we will definitely take another look

jayrwren · on Sept 19, 2017

I love minideb. It is a great compromise when you need glibc but would have otherwise used alpine.

viraptor · on Sept 19, 2017

Curious: what are the scenarios in which you needed glibc?

davedunkin · on Sept 20, 2017

Node.js packages with native code and .NET Core are a couple I’ve come across. Basically any C-based code prebuilt for generic Linix.

nerdponx · on Sept 20, 2017

But couldn't you rebuild them yourself?

jayrwren · on Sept 21, 2017

Go code with C dependencies.

simplehuman · on Sept 19, 2017

Starred, thanks.

A little OT but how does bitnami make money? They don't seem to charge AMIs atleast. So, I guess they charge AWS/GCE for providing 1-click images? Or are they a consulting company (if so, why choose them over the original app authors)? Or both?

ridruejo · on Sept 19, 2017

Bitnami co-founder here. Most of what we produce is open source and we aim to make money in ways that are useful to companies but not limit or handicap our offering and alienate people. For example, we offer optional backup and monitoring services through Bitnami Cloud Hosting (https://bitnami.com/cloud/hosting) and we also have commercial services for ISVs that want to package their commercial apps through our platform. We also provide support for infrastructure providers (i.e. cloud vendors) that want the applications integrated with their platforms in specific ways

edsiper2 · on Sept 19, 2017

is there any big difference with the Debian image provided by Google ?

   launcher.gcr.io/google/debian8

(besides debian version)

amq · on Sept 19, 2017

If I run dozens of containers based on the same Debian image, would Minideb or even Alpine bring a big change, considering that Docker caches the layers?

devonkim · on Sept 19, 2017

The point of smaller images to me isn't about disk savings as much as minimizing dependencies and surface area of attacks such as for glibc, bash, and OpenSSL in the past several years. Updating container images quickly is absolutely essential given the myriad of possible problems if they were to become stale.

I suppose it wouldn't hurt to have smaller image layers when updating these containers more frequently to save on bandwidth at least.

0xbadcafebee · on Sept 19, 2017

Reducing attack surface by only minimizing dependencies is a bit like putting your house on stilts.

wvenable · on Sept 19, 2017

You're arguing a straw man by putting the word only in there.

0xbadcafebee · on Sept 20, 2017

The author said 3 different things in their comment. I was answering this:

"The point [..] isn't about disk savings as much as minimizing dependencies and surface area of attacks such as for glibc, bash, and OpenSSL in the past several years."

Technically they didn't say specifically how they would minimize surface area of attacks, so my assumption that they meant only by minimizing dependencies (seeing as their comment was followed by a list of dependencies) may have been faulty. Thanks for letting me know that in such a kind way.

ridruejo · on Sept 19, 2017

Can you elaborate why you think that is the case? This is a well-established security practice. I don't see much upside to having code or binaries around that are not needed but can be potentially exploited. One of the first things I did when I used to manage servers was shutdown and remove any services not needed, disable all Apache modules not in use, etc.

0xbadcafebee · on Sept 19, 2017

A house on stilts makes it difficult to rob, but not for the man who walks on stilts. Security practices need to be implemented holistically or they are easily defeated. By themselves they aren't worth much and end up being unnecessarily cumbersome.

Removing outlying code that could be used as part of an attack can be useful for complex attacks. But they are essentially outliers - the actual code that you are actually running and is the actual target is still there, waiting to be pwnd. The time you spend trimming fat can often be better used to actually harden a system's access control or policies/procedures, perform auditing, etc.

CoreXtreme · on Sept 19, 2017

Yes, they can certainly help in cases where you are offering your images to others.

Assuming, initial size is zero. If you install 100MB of packages in layer 1.

Then in second layer, you uninstall of those packages. Your image size will still be 100MB.

If you are optimizing for high density, a smaller size certainly helps.

amq · on Sept 19, 2017

Others are very likely to already have the Debian base image cached - by using pretty much any official library image.

luord · on Sept 19, 2017

This is great, I created my own base images (for python and js, mostly) using debian as base; this shall be the new base.

thinbeige · on Sept 19, 2017

Slightly OT: Which distribution (for being a Docker host) has the best unattended security updates incl. reboots?

Requirements:

- Quick to setup, best would be a one-liner and not something I have to google everytime

- Update and reboot times can be slightly randomized, so the entire cluster won't go down

Zhann__ · on Sept 19, 2017

You looking for CoreOS?

eeZah7Ux · on Sept 20, 2017

Debian.

sintaxi · on Sept 19, 2017

I use debian images for my infrastructure on VPS providers. Can anyone tell me what makes this "container specific"?

jdwestby · on Sept 19, 2017

One of the ways it reduces the size is by pulling out things that are very unlikely to be required in a container, but are important for running on real hardware. Things like udev, systemd, file system tools etc.

Scarbutt · on Sept 19, 2017

size.

tokenizerrr · on Sept 19, 2017

Off topic but what software is everyone using for their registry with ACLs and pruning old images?

wolfgang42 · on Sept 19, 2017

GitLab Container Registry. It integrates beautifully with the rest of GitLab, including the permissions and CI system. I believe it also has special support for Kubernetes though I've not tried using that.