*TLDR*: Docker uses *aufs* to provide copy-on-write snapshots, integral to docke...

ithkuil · on Sept 23, 2013

LVM snapshots have some issues:

* you have to preallocate the size of the snapshot back storage. * you create N snapshots of the same base block device. For each block changed in the base, each of the snapshots will get a copy on write block added to the snapshot backing storage. * you cannot resize snapshot (I mean logical volume size, not the storage area for cow data) * you cannot shrink snapshot backing storage

snapshot aware filesystems solve these issues. The slowness of ZFS you mention is only true for the fuse based toy driver. The license incompatibility between ZFS and the linux kernel is source of much confusion. All it means is that you cannot distribute linux kernel binaries linked with ZFS code (where a kernel module can be seen as parts of the linux kernel API linked with ZFS code). However nothing prevents you compiling the module on your machine, and there is a nicely packaged solution for doing this for, with support for distributions:

http://zfsonlinux.org/

there is also a new place for promoting zfs: http://open-zfs.org

AuFS seems to me a rather pragmatic approach for those who don't need the advanced features and performance of an advanced filesystem, yet don't want to waste IO bandwidth just to provision a lightweight container.

contingencies · on Sept 23, 2013

All good points. I guess in response the only two things I would add are: (1) If snapshots are for backup (most frequent use case? I guess so!) then LVM2 can do it for you without an exotic FS already. Sure, you may have to preallocate. But it's generic (not filesystem-linked), so if you're an infrastructure provider it future proofs your backup implementation. Sometimes that's worth a lot more due to engineering and testing cycles. (2) You probably can shrink snapshot backing storage if you remove them, for example after the snapshot is complete and the data has been subsequently copied elsewhere to long term storage (cheaper/slower/remoter/more geographically dispersed disks?). You can make a new one next time you need it. That said, people who are that short on disk space are few and far between these days... it's cheap.

mpasternacki · on Sept 23, 2013

The issue here is not only the depth limit, but also:

* performance overhead of each layer, however small

* disk space for files removed in intermediate steps (scenario: ADD huge-ass source tarball, commit, RUN compile+install+remove, commit - user has still download the huge-ass source tarball to use the final image which doesn't have it)

* there's often just no need to publish intermediate layers; there may even be a good reason to not publish them (say, I distribute a program compiled with a proprietary compiler as a step of the build, but can't distribute the compiler itself)

* simplicity of having just one image for user to download and for publisher to distribute, rather than whole chain (this will be more important when we are able to use anything else than the registry to distribute images)

contingencies · on Sept 23, 2013

All valid points.

I guess a lot depends on other aspects of your project. For example, if you are looking at distributing frequently, and rsync is an option, then bandwidth concerns are effectively nullified. Likewise, disk space diffs for a few installs on a base filesystem are not big and thus not really expensive to keep. But I agree with you.

One aspect is crypto: signing a tarball is easier than a bunch of files.