Hacker News new | past | comments | ask | show | jobs | submit login
Minideb – A small image based on Debian designed for use in containers (github.com/bitnami)
234 points by nikolay on Sept 19, 2017 | hide | past | favorite | 73 comments



In addition to the "minimalist" aspect, this image seems to offer better practices on a security level than official Debian images. From their README: "The images are built daily and have the security release enabled, so will contain any security updates released more than 24 hours ago."

A recent analysis showed that the debian:latest image is "updated roughly every month" https://anchore.com/blog/look-often-docker-images-updated/


> In addition to the "minimalist" aspect, this image seems to offer better practices on a security level than official Debian images

I'm skeptical about this claim. Almost every image built from the Debian official image begins with `apt-get update` before you can actually install anything, which means you will almost always have the latest packages at the time of building.


While not as small it is trivial to make an up-to-date debian base image (or Fedora/Arch) any time you want. If you care about security you probably don't want to use random unverified images anyway.

  $ sudo debootstrap stretch mydebian http://mirrors.kernel.org/debian/
  $ cd mydebian
  $ sudo tar -c . | docker import - mydebian
Plus you can add files to the system before taring etc...

If you have significant work to do on an image a Dockerfile can often be far more complex than this method.


apt update only updates the package list and not the packages themselves. So unless the docker file contains apt upgrade it still uses old packages.


There's nothing to upgrade. Pretty much nothing is installed already in most base images. Even `ca-certificates` etc. have to be installed.


are you saying that most base images don't have ssl?

because i'm a baseimage maintainer (http://github.com/phusion/baseimage) and I don't think that's true...

https://github.com/phusion/baseimage-docker/blob/master/imag...


No offence intended and these terms are not tightly defined but I would call your imaged a 'baked' image.

My feeling is that I don't know how long a base-image will stick around. If ca-certificates is installed in my base image it may end up trusting revoked certificates.

IMHO it is better to know you need to install/bake in ca-certs from a trusted source than to having a built in, potentially compromised CA cert installed.

Baked images, which I use to reduce instantiation time, or 'golden' images that are immutable infrastructure tend to have shorter lifespans and the CA package is carried in the application dependencies and more likely to be up to date.


No offense inferred!

It is not intended that users will download this baseimage (although it is a supported configuration, you can use FROM phusion/baseimage) but, that this will be an image definition that users can easily rebuild and build off of it.

Step one in Docker competency is "do you know exactly where your image comes from, and can you rebuild it from scratch without trusting that some rando on the internet didn't put bad stuff in there?"

Step two is "ok, do you really actually build them, though"

This image has traditionally been based on LTS ubuntu, and if you look at the CentOS derived version that hasn't been updated since 2014 (pokle/centos-baseimage), they chose not to include ca-certificates or hardly anything else.

(I'm assuming that tianon/centos:6.5 does not install ca-certificates by default...)

I'm sure many people use FROM phusion/baseimage but personally, even as a maintainer, I don't. I'd change the image source to whatever upstream of Ubuntu I'm preferring today, and probably build that from scratch too. The value in this image is not that it comes pre-built, it's that the build is tested and supported. /side tangent


Ah, I wasn't being specific enough. "Official" (library) images are what I was referring to.


"base image" != "baseimage", obviously. there was no need for this cheap plug.


You can go ahead and explain what you think base image means then, because it's not obvious to me how this is different than that (and I'm a domain expert.)

A base image is an image that you're meant to build off of, it is not meant to be deployed as an application but as a base for your image. What part of what I said was disingenuous? I gave a link to some source code that I didn't write and provided a counter example, identifying myself as a contributor. What did you contribute?


Great misunderstanding of what I said, this is two in a row when considering the above, bravo.

What is wrong with your first comment is:

1. zenlikethat's comment obviously talks about base images in general, not about a specific base image ("...most base images...")

2. Your reply which disagrees with that is based on a very basic fallacy (there is a one base image which contains ssl so zenlikethat's claim is false!?)

3. Your example is a base image which you're a maintainer of

Considering (1) is obvious and (2) is a very basic fallacy nobody here should fall for, your comment seems like an intentional and unwarranted plug for "phusion/baseimage" instead of a valid and honest disagreement to zenlikethat's comment. (extra points for unwarranted plug of your domain expertise in immediate parent comment)


I'm supposed to provide examples that I am unfamiliar with?

Forgive me for misunderstanding your comment, but `"base image" != "baseimage"` did not have a great deal of substance to it, nor did it provide me with any insight about the topic.

I provided an example, to facilitate the discussion. So back to my last question, what did you contribute?

Edit: https://en.1jux.net/scale_images/357695_b.jpg

Have a laugh and a beer!


I couldn't find anything in the post that correlates the Debian updates with security notices (which is your main point). If a security advisory comes out every 2-5 weeks and Debian updates on the same schedule, then I don't see a problem. But the data is just not there.

(These have to be advisories actually affecting the image, not all of them)


I don't really work in this domain so maybe I'm missing something. If the goal is to essentially get the bare minimum needed to run a program into a Docker image why not develop your program in your desired environment and then use something like CDE [1] to copy (or obtain a list of) all the files touched in the desired invocation of the program. That copy or list can then be put into a tarball and imported with "docker import". Philip Guo even writes about this possible use [2].

Here's a silly example:

  cde python -c "import numpy as np; print(np.random.randn(3, 3).tolist())"
  pushd cde-package/cde-root/; tar cavf ../../cde-image.tar *; popd
  docker import cde-image.tar $USER:python-randn33
  docker run $USER:python-randn33 python -c "import numpy as np; print(np.random.randn(3, 3).tolist())"
  docker run -t -i $USER:python-randn33 python
If you look at the resulting "cde-image.tar" you'll find it to be quite bare. Mines had only 387 entries (files and folders).

[1]: http://www.pgbovine.net/cde.html

[2]: http://pgbovine.net/automatically-create-docker-images.htm


Probably because syscall interception is not sufficient to create robust Linux program images. It will be an awkward moment if a stat, open, etc. that the program attempts in production doesn't work as expected because it wasn't run in development / bundling images. You'd have to execute every possible code path in the CDE bundling step to work properly.


So it becomes a matter of whether or not you can achieve good coverage of your execution paths to account for all possible filesystem touches? Further invocations of "cde" with respect to the same "cde-package" folder will actually append to the "cde-root" file system copy so if you could manage to canvas your program's execution paths then the resulting file tree copy should be sufficient?


You're right it is a question of coverage of execution paths, but that's a non-trivial problem.

Have a look at the lengths that AFL uses to get even close: http://lcamtuf.coredump.cx/afl/

[tl;dr it intruments execution while using a genetic algorithm to mutate inputs optimising for code coverage]

Statically determining dependencies is a lot easier and a lot more reliable! Particularly as you only need the base image once, and any extras on top are another layer on the Docker FS.


I'm also a fan of minimal images. Cde is an iteresting solution, but for dynamic languages like python packing everything into a virtualenv and shipping that is a reasonable solution. To automatically grab linked libraries you can use something like smith[1]

[1]: https://github.com/oracle/smith


I imagine that if you created a product that could run my Python code without building a full container or image, you might call it "serverless."



Or a Unikernel


for anyone else interested in the actual stats, I grifted this from their pr on the image:

The minideb image currently weighs in at around 50MB uncompressed. For comparison the debian library image is 123MB, the alpine image is 5MB, and the newly released amazonlinuximage is 328MB.


Current debian:stable-slim is 55.3MB and is already optimized for containers (see: https://gist.github.com/Zorbash/183b80d37bd0a09434e3a2b1a958...).


Good data point!

Anybody have experience with minideb vs. debian:stable-slim? Any pros/cons to either approach?


I can list some pros (comparing to debian:stable-slim) for Minideb: It is easy to build, has good documentation, has no blobs committed in the repo, is automatically built and tested on a daily basis.

Worth also mentioning it is the base image for ALL the Bitnami Containers that at the same time are also automatically built, updated and tested. You can take a look at all of them in Github. There are a ton of them


Once you do anything interesting (i.e. install Python) then Alpine and minideb size are basically identical


I guess it's a bit more specific than you meant it, but our standard Python image is ~20MB (alpine + python3, basically); that's still under half of minideb.

Does look interesting for things that need glibc compatibility though. There are some packages to help with that in Alpine but they only go so far.


Thanks, this is where I got the data from:

https://dzone.com/articles/minideb-a-minimalist-debian-based...


From looking at what I think is the Dockerfile for that image (https://github.com/docker-library/python/blob/b1512ead24c6b1...), that image is complex; it's downloading & building Python in it and adding & removing a dev toolchain, in a few different layers.

I'm not surprised that I got something a lot smaller from just running `apk add python3.6`, although as a result they are not comparing apples to apples; their minideb example does pretty much exactly the equivalent (i.e. downloading the distro-provided package, not compiling it within their image).


Yep. I like alpine and use it for my images for preference, but some things like getting Ruby on Rails working with therubyracer can bascially hit a wall in alpine, so this could be pretty handy.


In my experience python in alpine is is sometimes 2x slower. How does ruby perform under Alpine?


On the other hand if you don't need Python because it's a container running a service like Redis or Nginx you can keep entire Alpine-based image around 25-50% of the size or bare minideb image.


I know old school native code is looked down upon these days, but there are people that write code without bloated runtime dependencies.

You don’t need to add python “to do anything interesting.”


Your numbers are wrong. Here's the actual values for each of those images uncompressed:

    debian	101M
    alpine	4.1M
    amazonlinux	160M
These were determined with the following:

    for image in debian:latest alpine:latest amazonlinux:latest; do
        docker pull $image
        size=$(docker save $image | wc -c)
        echo "\n$image is $(echo $size | numfmt --to=iec)\n"
    done
Please note that it's important to only do this test with either a completely clean image store OR to save an exact sha256 image digest tag.

If you try to do a `docker save` on a store where you have multiple copies of an image, it's easy to mess up and save old layers too. I suspect that's what happened with your amazonlinux test above.


Sorry, not my numbers - as noted in the comment, I took them from the press release.


debian:stable-slim 56M


BusyBox image between 1-5mb.


This seems smarter than some of the container OS approaches that start from distros that have weak or no package management, and rely entirely on the "container" model to provide updates and some combination of spit and duct tape to build them. There's a smallish Fedora for containers (exists in the Docker registry) as well; it's about 70MB, which is still a little beefy.

Anybody know how big minideb is?

Edit: zwerdids posted that it's ~50MB so a wee bit smaller than the currently commonly used Fedora container image. And an order of magnitude bigger than an Alpine image.


It just saves top 5-6MB over Debian slim version of images. It's not worth my time to use this instead of officially debian supplied images.

Edit: I would like to know where exactly this 5-6MB is saved.


I guess removal of some essential packages


The `install_packages` command looks like a big win compared to the rather spammy form most Dockerfiles use now to install packages, e.g:

    $ install_packages mime-support
vs:

    $ apt-get update && apt-get install -y --no-install-recommends mime-support && \
        apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
That's a great win in itself, this is excellent.


Neat. Some overlap with https://github.com/GoogleCloudPlatform/distroless ; I think one big difference is that Google's approach uses bazel to download and unpack .debs, and this one uses standard Debian tools (debootstrap). But the end result sounds similar.


Picking up where Damn Small Linux left off. http://www.damnsmalllinux.org/download.html


There's a bunch of them, especially around networking use cases. Alpine itself was a fork of another embedded Linux distro. One appeal of Minideb and a big reason why we developed it is that at the end of the day, it is still Debian and you have access to all the DEB packages out there, which tend to cover more ground and be more actively maintained than more niche Linux distros


wiremine's question upthread as to how this compares to debian:stable-slim seems like it would benefit from an answer from you or another maintainer (assuming I correctly read the "we" in your comment)


As some other comment pointed out, we update it daily. When we started debian-slim was not an option for us, but it has caught in terms of size and features, so we will definitely take another look


I love minideb. It is a great compromise when you need glibc but would have otherwise used alpine.


Curious: what are the scenarios in which you needed glibc?


Node.js packages with native code and .NET Core are a couple I’ve come across. Basically any C-based code prebuilt for generic Linix.


But couldn't you rebuild them yourself?


Go code with C dependencies.


Starred, thanks.

A little OT but how does bitnami make money? They don't seem to charge AMIs atleast. So, I guess they charge AWS/GCE for providing 1-click images? Or are they a consulting company (if so, why choose them over the original app authors)? Or both?


Bitnami co-founder here. Most of what we produce is open source and we aim to make money in ways that are useful to companies but not limit or handicap our offering and alienate people. For example, we offer optional backup and monitoring services through Bitnami Cloud Hosting (https://bitnami.com/cloud/hosting) and we also have commercial services for ISVs that want to package their commercial apps through our platform. We also provide support for infrastructure providers (i.e. cloud vendors) that want the applications integrated with their platforms in specific ways


is there any big difference with the Debian image provided by Google ?

   launcher.gcr.io/google/debian8
(besides debian version)


If I run dozens of containers based on the same Debian image, would Minideb or even Alpine bring a big change, considering that Docker caches the layers?


The point of smaller images to me isn't about disk savings as much as minimizing dependencies and surface area of attacks such as for glibc, bash, and OpenSSL in the past several years. Updating container images quickly is absolutely essential given the myriad of possible problems if they were to become stale.

I suppose it wouldn't hurt to have smaller image layers when updating these containers more frequently to save on bandwidth at least.


Reducing attack surface by only minimizing dependencies is a bit like putting your house on stilts.


You're arguing a straw man by putting the word only in there.


The author said 3 different things in their comment. I was answering this:

"The point [..] isn't about disk savings as much as minimizing dependencies and surface area of attacks such as for glibc, bash, and OpenSSL in the past several years."

Technically they didn't say specifically how they would minimize surface area of attacks, so my assumption that they meant only by minimizing dependencies (seeing as their comment was followed by a list of dependencies) may have been faulty. Thanks for letting me know that in such a kind way.


Can you elaborate why you think that is the case? This is a well-established security practice. I don't see much upside to having code or binaries around that are not needed but can be potentially exploited. One of the first things I did when I used to manage servers was shutdown and remove any services not needed, disable all Apache modules not in use, etc.


A house on stilts makes it difficult to rob, but not for the man who walks on stilts. Security practices need to be implemented holistically or they are easily defeated. By themselves they aren't worth much and end up being unnecessarily cumbersome.

Removing outlying code that could be used as part of an attack can be useful for complex attacks. But they are essentially outliers - the actual code that you are actually running and is the actual target is still there, waiting to be pwnd. The time you spend trimming fat can often be better used to actually harden a system's access control or policies/procedures, perform auditing, etc.


Yes, they can certainly help in cases where you are offering your images to others.

Assuming, initial size is zero. If you install 100MB of packages in layer 1.

Then in second layer, you uninstall of those packages. Your image size will still be 100MB.

If you are optimizing for high density, a smaller size certainly helps.


Others are very likely to already have the Debian base image cached - by using pretty much any official library image.


This is great, I created my own base images (for python and js, mostly) using debian as base; this shall be the new base.


Slightly OT: Which distribution (for being a Docker host) has the best unattended security updates incl. reboots?

Requirements:

- Quick to setup, best would be a one-liner and not something I have to google everytime

- Update and reboot times can be slightly randomized, so the entire cluster won't go down


You looking for CoreOS?


Debian.


I use debian images for my infrastructure on VPS providers. Can anyone tell me what makes this "container specific"?


One of the ways it reduces the size is by pulling out things that are very unlikely to be required in a container, but are important for running on real hardware. Things like udev, systemd, file system tools etc.


size.


Off topic but what software is everyone using for their registry with ACLs and pruning old images?


GitLab Container Registry. It integrates beautifully with the rest of GitLab, including the permissions and CI system. I believe it also has special support for Kubernetes though I've not tried using that.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: