Hacker News new | past | comments | ask | show | jobs | submit login

TLDR: Docker uses aufs to provide copy-on-write snapshots, integral to docker container-image builds. aufs is not that widely used, and reportedly has a depth limit of 42. This script flattens an entire build process to a single snapshot to avoid said issue.

Context: docker people have already announced an intention to work to unlink themselves from aufs dependency.

Alternatives/Reality-check: LVM2 can provide snapshots at the block layer: either through the normal approach with a single depth limit (though you can un-snapshot a snapshot through a process known as a merge, and then snapshot again as required), or through the new/experimental thin provisioning driver to get arbitrary depth (but 16GB max volume size). In both cases it's filesystem neutral, and the first approach is very widely deployed which means no roll-thy-own-kernel requirement. zfs and btrfs also provide snapshots, but are historically respectively poorly supported/slow (userspace driver or build your own kernel for zfs) and unfinished/in development (btrfs). Linux also supports the snapshot-capable filesystems fossil (from plan9), gpfs (from IBM), nilfs (from NTT). A related set of options are cluster filesystems with built-in replication, see https://en.wikipedia.org/wiki/Clustered_file_system#Distribu... Overall, the architectural perspective on various storage design options can be hard to grasp without digging, and higher-layer solutions such as NoSQL distributed datastore applications remain strong options in many cases.

Trend/future?: Containers in general are moving towards formalizing the "here's what I need: x-depth snapshots with y-availability and z-redundancy" environment requirements specifications for software. In the nearish future I predict that we'll see this in terms of all types of resources (network access at layers 2 and 3, CPU, memory, disk IO, disk space, etc.) for complex, multi-component software systems as CI/CD processes mature and container-friendly software packaging becomes normalized (we're already much of the way there for single hosts - eg. with Linux cgroups). Infrastructure will become 'smarter', and the historical disconnect between network gear and computing hosts will begin to break down. Systems and network administration will tend to merge, and the skillsets will become rarer as a result of automation.




LVM snapshots have some issues:

* you have to preallocate the size of the snapshot back storage. * you create N snapshots of the same base block device. For each block changed in the base, each of the snapshots will get a copy on write block added to the snapshot backing storage. * you cannot resize snapshot (I mean logical volume size, not the storage area for cow data) * you cannot shrink snapshot backing storage

snapshot aware filesystems solve these issues. The slowness of ZFS you mention is only true for the fuse based toy driver. The license incompatibility between ZFS and the linux kernel is source of much confusion. All it means is that you cannot distribute linux kernel binaries linked with ZFS code (where a kernel module can be seen as parts of the linux kernel API linked with ZFS code). However nothing prevents you compiling the module on your machine, and there is a nicely packaged solution for doing this for, with support for distributions:

http://zfsonlinux.org/

there is also a new place for promoting zfs: http://open-zfs.org

AuFS seems to me a rather pragmatic approach for those who don't need the advanced features and performance of an advanced filesystem, yet don't want to waste IO bandwidth just to provision a lightweight container.


All good points. I guess in response the only two things I would add are: (1) If snapshots are for backup (most frequent use case? I guess so!) then LVM2 can do it for you without an exotic FS already. Sure, you may have to preallocate. But it's generic (not filesystem-linked), so if you're an infrastructure provider it future proofs your backup implementation. Sometimes that's worth a lot more due to engineering and testing cycles. (2) You probably can shrink snapshot backing storage if you remove them, for example after the snapshot is complete and the data has been subsequently copied elsewhere to long term storage (cheaper/slower/remoter/more geographically dispersed disks?). You can make a new one next time you need it. That said, people who are that short on disk space are few and far between these days... it's cheap.


The issue here is not only the depth limit, but also:

* performance overhead of each layer, however small

* disk space for files removed in intermediate steps (scenario: ADD huge-ass source tarball, commit, RUN compile+install+remove, commit - user has still download the huge-ass source tarball to use the final image which doesn't have it)

* there's often just no need to publish intermediate layers; there may even be a good reason to not publish them (say, I distribute a program compiled with a proprietary compiler as a step of the build, but can't distribute the compiler itself)

* simplicity of having just one image for user to download and for publisher to distribute, rather than whole chain (this will be more important when we are able to use anything else than the registry to distribute images)


All valid points.

I guess a lot depends on other aspects of your project. For example, if you are looking at distributing frequently, and rsync is an option, then bandwidth concerns are effectively nullified. Likewise, disk space diffs for a few installs on a base filesystem are not big and thus not really expensive to keep. But I agree with you.

One aspect is crypto: signing a tarball is easier than a bunch of files.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: