In addition to the "minimalist" aspect, this image seems to offer better practices on a security level than official Debian images. From their README: "The images are built daily and have the security release enabled, so will contain any security updates released more than 24 hours ago."
> In addition to the "minimalist" aspect, this image seems to offer better practices on a security level than official Debian images
I'm skeptical about this claim. Almost every image built from the Debian official image begins with `apt-get update` before you can actually install anything, which means you will almost always have the latest packages at the time of building.
While not as small it is trivial to make an up-to-date debian base image (or Fedora/Arch) any time you want. If you care about security you probably don't want to use random unverified images anyway.
$ sudo debootstrap stretch mydebian http://mirrors.kernel.org/debian/
$ cd mydebian
$ sudo tar -c . | docker import - mydebian
Plus you can add files to the system before taring etc...
If you have significant work to do on an image a Dockerfile can often be far more complex than this method.
No offence intended and these terms are not tightly defined but I would call your imaged a 'baked' image.
My feeling is that I don't know how long a base-image will stick around. If ca-certificates is installed in my base image it may end up trusting revoked certificates.
IMHO it is better to know you need to install/bake in ca-certs from a trusted source than to having a built in, potentially compromised CA cert installed.
Baked images, which I use to reduce instantiation time, or 'golden' images that are immutable infrastructure tend to have shorter lifespans and the CA package is carried in the application dependencies and more likely to be up to date.
It is not intended that users will download this baseimage (although it is a supported configuration, you can use FROM phusion/baseimage) but, that this will be an image definition that users can easily rebuild and build off of it.
Step one in Docker competency is "do you know exactly where your image comes from, and can you rebuild it from scratch without trusting that some rando on the internet didn't put bad stuff in there?"
Step two is "ok, do you really actually build them, though"
This image has traditionally been based on LTS ubuntu, and if you look at the CentOS derived version that hasn't been updated since 2014 (pokle/centos-baseimage), they chose not to include ca-certificates or hardly anything else.
(I'm assuming that tianon/centos:6.5 does not install ca-certificates by default...)
I'm sure many people use FROM phusion/baseimage but personally, even as a maintainer, I don't. I'd change the image source to whatever upstream of Ubuntu I'm preferring today, and probably build that from scratch too. The value in this image is not that it comes pre-built, it's that the build is tested and supported. /side tangent
You can go ahead and explain what you think base image means then, because it's not obvious to me how this is different than that (and I'm a domain expert.)
A base image is an image that you're meant to build off of, it is not meant to be deployed as an application but as a base for your image. What part of what I said was disingenuous? I gave a link to some source code that I didn't write and provided a counter example, identifying myself as a contributor. What did you contribute?
Great misunderstanding of what I said, this is two in a row when considering the above, bravo.
What is wrong with your first comment is:
1. zenlikethat's comment obviously talks about base images in general, not about a specific base image ("...most base images...")
2. Your reply which disagrees with that is based on a very basic fallacy (there is a one base image which contains ssl so zenlikethat's claim is false!?)
3. Your example is a base image which you're a maintainer of
Considering (1) is obvious and (2) is a very basic fallacy nobody here should fall for, your comment seems like an intentional and unwarranted plug for "phusion/baseimage" instead of a valid and honest disagreement to zenlikethat's comment. (extra points for unwarranted plug of your domain expertise in immediate parent comment)
I'm supposed to provide examples that I am unfamiliar with?
Forgive me for misunderstanding your comment, but `"base image" != "baseimage"` did not have a great deal of substance to it, nor did it provide me with any insight about the topic.
I provided an example, to facilitate the discussion. So back to my last question, what did you contribute?
I couldn't find anything in the post that correlates the Debian updates with security notices (which is your main point). If a security advisory comes out every 2-5 weeks and Debian updates on the same schedule, then I don't see a problem. But the data is just not there.
(These have to be advisories actually affecting the image, not all of them)
I don't really work in this domain so maybe I'm missing something. If the goal is to essentially get the bare minimum needed to run a program into a Docker image why not develop your program in your desired environment and then use something like CDE [1] to copy (or obtain a list of) all the files touched in the desired invocation of the program. That copy or list can then be put into a tarball and imported with "docker import". Philip Guo even writes about this possible use [2].
Here's a silly example:
cde python -c "import numpy as np; print(np.random.randn(3, 3).tolist())"
pushd cde-package/cde-root/; tar cavf ../../cde-image.tar *; popd
docker import cde-image.tar $USER:python-randn33
docker run $USER:python-randn33 python -c "import numpy as np; print(np.random.randn(3, 3).tolist())"
docker run -t -i $USER:python-randn33 python
If you look at the resulting "cde-image.tar" you'll find it to be quite bare. Mines had only 387 entries (files and folders).
Probably because syscall interception is not sufficient to create robust Linux program images. It will be an awkward moment if a stat, open, etc. that the program attempts in production doesn't work as expected because it wasn't run in development / bundling images. You'd have to execute every possible code path in the CDE bundling step to work properly.
So it becomes a matter of whether or not you can achieve good coverage of your execution paths to account for all possible filesystem touches? Further invocations of "cde" with respect to the same "cde-package" folder will actually append to the "cde-root" file system copy so if you could manage to canvas your program's execution paths then the resulting file tree copy should be sufficient?
[tl;dr it intruments execution while using a genetic algorithm to mutate inputs optimising for code coverage]
Statically determining dependencies is a lot easier and a lot more reliable! Particularly as you only need the base image once, and any extras on top are another layer on the Docker FS.
I'm also a fan of minimal images. Cde is an iteresting solution, but for dynamic languages like python packing everything into a virtualenv and shipping that is a reasonable solution. To automatically grab linked libraries you can use something like smith[1]
for anyone else interested in the actual stats, I grifted this from their pr on the image:
The minideb image currently weighs in at around 50MB uncompressed. For comparison the debian library image is 123MB, the alpine image is 5MB, and the newly released amazonlinuximage is 328MB.
I can list some pros (comparing to debian:stable-slim) for Minideb:
It is easy to build, has good documentation, has no blobs committed in the repo, is automatically built and tested on a daily basis.
Worth also mentioning it is the base image for ALL the Bitnami Containers that at the same time are also automatically built, updated and tested.
You can take a look at all of them in Github. There are a ton of them
I guess it's a bit more specific than you meant it, but our standard Python image is ~20MB (alpine + python3, basically); that's still under half of minideb.
Does look interesting for things that need glibc compatibility though. There are some packages to help with that in Alpine but they only go so far.
From looking at what I think is the Dockerfile for that image (https://github.com/docker-library/python/blob/b1512ead24c6b1...), that image is complex; it's downloading & building Python in it and adding & removing a dev toolchain, in a few different layers.
I'm not surprised that I got something a lot smaller from just running `apk add python3.6`, although as a result they are not comparing apples to apples; their minideb example does pretty much exactly the equivalent (i.e. downloading the distro-provided package, not compiling it within their image).
Yep. I like alpine and use it for my images for preference, but some things like getting Ruby on Rails working with therubyracer can bascially hit a wall in alpine, so this could be pretty handy.
On the other hand if you don't need Python because it's a container running a service like Redis or Nginx you can keep entire Alpine-based image around 25-50% of the size or bare minideb image.
Your numbers are wrong. Here's the actual values for each of those images uncompressed:
debian 101M
alpine 4.1M
amazonlinux 160M
These were determined with the following:
for image in debian:latest alpine:latest amazonlinux:latest; do
docker pull $image
size=$(docker save $image | wc -c)
echo "\n$image is $(echo $size | numfmt --to=iec)\n"
done
Please note that it's important to only do this test with either a completely clean image store OR to save an exact sha256 image digest tag.
If you try to do a `docker save` on a store where you have multiple copies of an image, it's easy to mess up and save old layers too. I suspect that's what happened with your amazonlinux test above.
This seems smarter than some of the container OS approaches that start from distros that have weak or no package management, and rely entirely on the "container" model to provide updates and some combination of spit and duct tape to build them. There's a smallish Fedora for containers (exists in the Docker registry) as well; it's about 70MB, which is still a little beefy.
Anybody know how big minideb is?
Edit: zwerdids posted that it's ~50MB so a wee bit smaller than the currently commonly used Fedora container image. And an order of magnitude bigger than an Alpine image.
Neat. Some overlap with https://github.com/GoogleCloudPlatform/distroless ; I think one big difference is that Google's approach uses bazel to download and unpack .debs, and this one uses standard Debian tools (debootstrap). But the end result sounds similar.
There's a bunch of them, especially around networking use cases. Alpine itself was a fork of another embedded Linux distro. One appeal of Minideb and a big reason why we developed it is that at the end of the day, it is still Debian and you have access to all the DEB packages out there, which tend to cover more ground and be more actively maintained than more niche Linux distros
wiremine's question upthread as to how this compares to debian:stable-slim seems like it would benefit from an answer from you or another maintainer (assuming I correctly read the "we" in your comment)
As some other comment pointed out, we update it daily. When we started debian-slim was not an option for us, but it has caught in terms of size and features, so we will definitely take another look
A little OT but how does bitnami make money? They don't seem to charge AMIs atleast. So, I guess they charge AWS/GCE for providing 1-click images? Or are they a consulting company (if so, why choose them over the original app authors)? Or both?
Bitnami co-founder here. Most of what we produce is open source and we aim to make money in ways that are useful to companies but not limit or handicap our offering and alienate people. For example, we offer optional backup and monitoring services through Bitnami Cloud Hosting (https://bitnami.com/cloud/hosting) and we also have commercial services for ISVs that want to package their commercial apps through our platform. We also provide support for infrastructure providers (i.e. cloud vendors) that want the applications integrated with their platforms in specific ways
If I run dozens of containers based on the same Debian image, would Minideb or even Alpine bring a big change, considering that Docker caches the layers?
The point of smaller images to me isn't about disk savings as much as minimizing dependencies and surface area of attacks such as for glibc, bash, and OpenSSL in the past several years. Updating container images quickly is absolutely essential given the myriad of possible problems if they were to become stale.
I suppose it wouldn't hurt to have smaller image layers when updating these containers more frequently to save on bandwidth at least.
The author said 3 different things in their comment. I was answering this:
"The point [..] isn't about disk savings as much as minimizing dependencies and surface area of attacks such as for glibc, bash, and OpenSSL in the past several years."
Technically they didn't say specifically how they would minimize surface area of attacks, so my assumption that they meant only by minimizing dependencies (seeing as their comment was followed by a list of dependencies) may have been faulty. Thanks for letting me know that in such a kind way.
Can you elaborate why you think that is the case? This is a well-established security practice. I don't see much upside to having code or binaries around that are not needed but can be potentially exploited. One of the first things I did when I used to manage servers was shutdown and remove any services not needed, disable all Apache modules not in use, etc.
A house on stilts makes it difficult to rob, but not for the man who walks on stilts. Security practices need to be implemented holistically or they are easily defeated. By themselves they aren't worth much and end up being unnecessarily cumbersome.
Removing outlying code that could be used as part of an attack can be useful for complex attacks. But they are essentially outliers - the actual code that you are actually running and is the actual target is still there, waiting to be pwnd. The time you spend trimming fat can often be better used to actually harden a system's access control or policies/procedures, perform auditing, etc.
One of the ways it reduces the size is by pulling out things that are very unlikely to be required in a container, but are important for running on real hardware. Things like udev, systemd, file system tools etc.
GitLab Container Registry. It integrates beautifully with the rest of GitLab, including the permissions and CI system. I believe it also has special support for Kubernetes though I've not tried using that.
A recent analysis showed that the debian:latest image is "updated roughly every month" https://anchore.com/blog/look-often-docker-images-updated/