What's the situation on btrfs? It was supposed to make ZFS not that big deal any...

nisa · on March 7, 2016

There are lot of missing parts. Everything beyond using it for single disks or with RAID1 is still experimental. It's slow for a lot of use cases like VMs or OLTP ( http://blog.pgaddict.com/posts/friends-dont-let-friends-use-...) and using nodatacow does not count!

RAID1 is implemented to use the oddness of the pid to decide which disk to read from... RAID5/6 have huge performance problems in certain use-cases. Write hole is apparently still a thing.

Quota seems to be still broken or difficult to handle, the tooling is not exactly nice or intuitive (this may change in the future).

Hardware failure handling is nonexistent - it can't detect a broken drive.

Lot's of other gotchas. Just read the mailing list. I've been fighting against btrfs for a while and I'm happy when I'm not having to deal with it. Works probably fine for desktop usage but don't bother stressing it - at the moment it will break horrible.

cmurf · on March 7, 2016

Lots of gotchas yes, but it's about as rare to see data loss on Btrfs as it is on say linux-raid@ list for mdadm or LVM and those are much more mature. More often is the case of some suboptimal behavior and the recommendation to get the data off the volume and onto a newly created one. Very inconvenient for some users to be sure, but also not data loss. About the worst I've seen lately that wasn't hardware related needed data scraped using btrfs restore (the tool the extracts data from an unmountable filesystem).

rleigh · on March 8, 2016

This hasn't been my experience. When an md array has lost a disk, I have added a new one and the array has transparently resynched in the background. No interruption in service or loss of data. When a Btrfs array has lost a disk, Btrfs oopsed the kernel, and stopped working. When I rebooted with the hardware problem resolved, it toasted both the outdated mirror and the up-to-date disk, causing irretrievable dataloss. Any attempt to boot the system led to Btrfs causing an immediate kernel panic. I had to put the disks into a system without Btrfs kernel support and use dd to zero out the Btrfs partitions before I could use them again.

That was admittedly a few years back, but it's still a long way from production use in many respects, while ZFS has been in production use for years and is well documented, has good tools, and is very robust. After several years of Btrfs problems, I gave up on it and moved to ZFS, initially on Linux and then on FreeBSD due to it being better integrated.

Filligree · on March 8, 2016

Just incidentally, 'wipefs -a' should clear out the partitions in a much speedier manner.

cmurf · on March 7, 2016

Single, raid 0, 1, 10 are stable on stable hardware. Where ZFS is definitely more mature is parity raid. If you need parity raid and a snapshotting Cow filesystem and you're not a tester, then you'd want to pick ZFS. But Btrfs is ~ 6 years behind ZFS maturity wise? Maybe in some sense longer because ZFS had a more aggressive start?

There are definitely things Btrfs can already do that ZFS cannot, like read-writable snapshots, no parent-child dependency (you can create a snapshot and immediately delete the original), growing pools is easier/faster without having to build new arrays or sequentially replace devices and resilver, etc. And just in the last week there's an experimental encryption patch for per subvolume encryption (eventually per file similar to f2fs and ext4), so the situation is heavy development.

cmurf · on March 7, 2016

An often missed feature is the seed device. Any Btrfs volume can be flagged as a read-only seed. Mount it, you can add a 2nd device, and now re-mount. It's a union filesystem like thing. Changes only go to the 2nd device. If you used 'btrfs remove" on the 1st device, data is migrated to the 2nd device and the 1st is removed, so it replicates the filesystem. The seed setting can be applied to multiple device volumes. So you can add this 2nd device, make that volume a read-only seen, and add a 3rd device for read-write, and so on.

cmurf · on March 7, 2016

Another is Btrfs is quite usable on embedded systems, e.g. 32-bit ARM. I'm curious if anyone would use ZFS in such a case?

tristanz · on March 7, 2016

I can confirm that btrfs has major problems with tons of snapshots. The entire system will hang for minutes on random intervals and go completely unresponsive for hours when using admin commands. The many snapshots use case is pretty common when dealing with containers, even it everything works fine for simple setups.

cmurf · on March 7, 2016

How many is tons? I've not had such problems with hundreds of snapshots. There are reports on the btrfs list with problems involving many thousands. I have no idea how ZFS snapshots perform when there are hundreds to thousands.

The other thing that matters a ton is kernel version, the development is so heavy there are thousands of insertions+deletions per kernel cycle that even in a year it's ordinary to consider those kernels "old" on the list. Most distros by default use old kernels by Btrfs development standards. The exception is Fedora, and openSUSE who do lots of backports explicitly to support Btrfs.

tristanz · on March 7, 2016

Yeah, I'm talking 10k+, so this is only specific types of applications. In basic benchmarks, ZFS seemed to perform fine well into with 10s of thousands of snapshots. This is still far more snapshots than basic containers will create, so it's really only a problem for situations where you're doing tons of checkpointing.

cmurf · on March 7, 2016

Yeah I've been using Docker on Btrfs and haven't had issues. I do wonder why it takes so long (maybe 4 seconds) to delete containers, when deleting the snapshot for that container is instant. I haven't found dm based containers to be any faster.

gtirloni · on March 7, 2016

Maybe the application is ignoring the SIGTERM? https://docs.docker.com/compose/reference/stop/

cmurf · on March 7, 2016

No I mean when I use 'docker rm' when the container isn't even running.

joshbaptiste · on March 7, 2016

Depends on which mailing list you read, the usual consensus I see is it's kinda production ready, though openSUSE has switched to Btrfs as its main file system.

cyphar · on March 7, 2016

[Disclaimer: I work for SUSE]

SUSE Linux Enterprise also uses btrfs by default. There's also a bunch of features like booting into snapshots and snapper (snapshots taken before zypper transactions and on regular intervals with decaying granularity) which are in both OpenSUSE and SLE.

JoshTriplett · on March 7, 2016

It would if done, but it's not quite there yet.