One btrfs bug which is 100% reproducible: \* Start with an ext3 filesystem 70% f...

orev · on March 20, 2024

As a Linux user since kernel 0.96, I have never once considered doing an in-place migration to a new file system. That seems like a crazy thing to try to do, and I would hope it’s only done as a last resort and with all data fully backed up before trying it.

I would agree that if this is presented in the documentation as something it supports, then it should work as expected. If it doesn’t work, then a pull request to remove it from the docs might be the best course of action.

londons_explore · on March 20, 2024

The design of the convert utility is pretty good - the convert is effectively atomic - at any point during conversion, if you kill power midway, the disk is either a valid ext4 filesystem, or a valid btrfs filesystem.

jorvi · on March 21, 2024

It is atomic, but at least for me it left both types of checks that BTRFS can do with unrecoverable errors for a few blocks.

All in all not too terrible and I just accepted the bogus data as good by resetting (?) the checksums for those blocks. But acting as if it’s just a terminal command and Bob’s your uncle.. that’s just wrong.

thfuran · on March 20, 2024

>That seems like a crazy thing to try to do

It seems like a reasonable thing to want to do. Would you never update an installed application or the kernel to get new features or fixes? I don't really think there's much fundamental difference. If what you mean is "it seems likely to fail catastrophically", well that seems like an indication that either the converter or the target filesystem isn't in a good state.

orev · on March 20, 2024

Live migration of a file system is like replacing the foundation of your house while still living in it. It’s not the same thing as updating apps which would be more like redecorating a room. Sure it’s possible, but it’s very risky and a lot more work than simply doing a full backup and restore.

tredre3 · on March 21, 2024

What's a full backup and restore in your analogy? Building a new house? Moving the house to a safe place, replace the foundation, bring the house back?

geraldhh · on March 21, 2024

> Moving the house to a safe place, replace the foundation, bring the house back?

this seems sensible

bartvk · on March 20, 2024

> It seems like a reasonable thing to want to do

This was actually a routine thing to do, under iOS and macOS, with the transition from HFS to APFS.

thijsvandien · on March 20, 2024

Don't forget about FAT32 to NTFS long before that.

geraldhh · on March 21, 2024

both transitions were mandated by a supporting organization thou.

yjftsjthsd-h · on March 20, 2024

There's a world of difference between a version update and completely switching software. Enabling new features on an existing ext4 file system is something I would expect to be perfectly safe. In-place converting an ext4 file system into btrfs... in an ideal world that would work, of course, but it sounds vastly more tricky even in the best case.

KronisLV · on March 20, 2024

> Would you never update an installed application or the kernel to get new features or fixes?

Honestly, if that was an option without the certainty of getting pwned due to some RCE down the road, then yes, there are cases where I absolutely wouldn't want to update some software and just have it chugging away for years in its present functional state.

thfuran · on March 20, 2024

And there are no cases where you actually want a new feature?

KronisLV · on March 21, 2024

> And there are no cases where you actually want a new feature?

Sure there are, sometimes! But at the same time, in other situations stability and predictability would take precedence over anything else, e.g. any new features that might get released probably wouldn't matter too much for a given use case.

For example, I could take a new install of MariaDB that has come out recently and use it for the same project for 5 years with no issues, because the current feature set would be enough for the majority of use cases. Of course, that's not entirely realistic, because of the aforementioned security situation.

Applies the same to OS and/or kernel versions, like how you could take a particular version of RHEL or Ubuntu LTS and use it for the lifespan of some project, although in that case you do at least get security updates and such.

tuyiown · on March 20, 2024

I don't know if you talk only about linux or you meant your comment as a generalization, but have you heard of in place APFS migration from HFS+ ?

bayindirh · on March 20, 2024

Similarly Microsoft offered FAT32 to NTFS in-place migration, and it did the required checks before starting to ensure it completes successfully. It was more than 20 years ago IIRC.

tuyiown · on March 21, 2024

The notable thing with APFS is that the migration came automatically with an OS update (to high sierra) by default.

bayindirh · on March 21, 2024

Yes, I think also mounting ext3 systems as ext4 and migrating it to ext4 as you access it is a good way too. IIRC you had to enable a couple of flags in the ext3 system with tune2fs to enable ext4 features as well, but it didn't break, and migrated as you write & delete files.

jcalvinowens · on March 20, 2024

It's open source: if things don't get fixed, it's because no user cares enough to fix it. I'm certainty not wasting my time on that, nobody uses ext3 anymore!

You are as empowered to fix this as anybody else, if it presents a real problem for you.

romanows · on March 20, 2024

I think the parent comment's fix is to not use btrfs and warn others about how risky they estimate it is.

noncoml · on March 20, 2024

> You are as empowered to fix this as anybody else, if it presents a real problem for you.

I’m getting sick and tired of this argument. It’s very user unfriendly

Do you work in software? If yes, do you own a component? Have you tried using this argument with your colleagues that have no idea about your component?

Of course they theoretically are empowered to fix the bug, but that doesn’t make it easier. They may have no idea about how filesystem internal. Or they may be just users and have no programming background at all

jcalvinowens · on March 21, 2024

It's not unfriendly, it's just how it is. The fact you're comparing it to industry shows you don't get it.

Open source doesn't work for you. You can't demand that it do things you want: you have to push and justify them yourself, and hopefully you find other people along the way who want the same things and can help you.

I taught myself to program as a teenager trying to fix bugs in Linux. If I could do it, anybody can. That's what makes it empowering.

mardifoufs · on March 21, 2024

Well I think you're completely right, in a way. But I also don't see the problem with people telling other people to not use the software or to complain about it. They are totally in their right to complain about stuff. Now obviously if the complaining involves requiring people to work for some feature you want, that's entitled and wack. Same goes for throwing a fit because maintainers went for systemd instead of your own favorite init system.

But being open source doesn't mean people shouldn't or can't just say "this is bad" or "don't use x" or even "this X feature doesn't work when using Y" (now If what they are saying isn't true, sure we should call them out)

It's normal and in fact I think it should be encouraged, as not everyone can be aware of potential problems they could have with a piece of OSS that is already known in some obscure dev mailing list. The more people are informed and aware of potential issues, the less they will be surprised and complain about it when they start using it.

(Fwiw I love btrfs, and I think it's very reliable)

noncoml · on March 21, 2024

Read the thread again. No one asked anyone to work for free.

Please don’t put words in other peoples mouth.

bayindirh · on March 20, 2024

You can reliably change ext3 with ext4 in the GP's comment. It's not an extX problem, it's an BTRFS problem.

If nobody cares that much, maybe deprecate the tool, then?

Also, just because it's Open Source (TM) doesn't mean developers will accept any patch regardless of its quality. Like everything, FOSS is 85% people, 15% code.

> You are as empowered to fix this as anybody else, if it presents a real problem for you.

I have reported many bugs in Open Source software. If I had the time to study code and author a fix, I'd submit the patch itself, which I was able to do, a couple of times.

Zardoz84 · on March 20, 2024

for me it's a problem of the tool converting in place EXT to BTRFS. Not necessarily a problem of BTRFS.

bayindirh · on March 20, 2024

I tried to say it’s a problem of BTRFS the project. Not BTRFS the file system.

jcalvinowens · on March 20, 2024

> If nobody cares that much, maybe deprecate the tool, then?

If you think that's the right thing to do, you're as free as anybody else to send documentation patches. I doubt anybody would argue with you here, but who knows :)

> Also, just because it's Open Source (TM) doesn't mean developers will accept any patch regardless of its quality.

Of course not. If you want to make a difference, you have to put in the work. It's worth it IMHO.

bayindirh · on March 20, 2024

> If you think that's the right thing to do, you're as free as anybody else to send documentation patches :)

That won't do anything. Instead I can start a small commotion by sending a small request (to the mailing lists) to deprecate the tool, which I don't want to do. Because I'm busy. :)

Also, I don't like commotions, and prefer civilized discussions.

> Of course not. If you want to make a difference, you have to put in the work. It's worth it IMHO.

Of course. This is what I do. For example, I have a one liner in Debian Installer. I had a big patch in GDM, but after coordinating with the developers, they decided to not merge the fix + new feature, for example.

jcalvinowens · on March 20, 2024

> For example, I have a one liner in Debian Installer. I had a big patch in GDM, but after coordinating with the developers, they decided to not merge the fix + new feature, for example

Surely you were given some justification as to why they didn't want to merge it? I realize sometimes these things are intractable, but in my experience that's rare... usually things can iterate towards a mutually agreeable solution.

bayindirh · on March 20, 2024

The sad part is they didn’t.

GTK has (or had) a sliding infoline widget, which is used to show notifications. GDM used it for password related prompts. It actually relayed PAM messages to that widget.

We were doing mass installations backed by an LDAP server which had password policies, including expiration enabled.

That widget had a bug, prevented it displaying a new message when it was in the middle of an animation, which effectively ate messages related to LDAP (Your password will expire in X days, etc.).

Also we needed a keyboard selector in that window, which was absent.

I gave heads up to the GDM team, they sent a “go ahead” as a reply. I have written an elaborate patch, which they rejected and wanted a simpler one. I iterated the way they wanted, they said it passes the muster and will be merged.

Further couple of mails never answered. I’m basically ghosted.

But the merge never came, and I moved on.

baq · on March 20, 2024

This sort of thing happens all the time in big orgs which develop a single product internally with a closed source… there isn’t any fix for this part of human nature, apparently.

cesarb · on March 20, 2024

> deleting files will not free any space.

Does a rebalance fix it? I have once (and only once, back when it was new) hit a "out of disk space" situation with btrfs, and IIRC rebalancing was enough to fix it.

> for a common use case

It might have been a common use case back when btrfs was new (though I doubt it, most users of btrfs probably created the filesystem from scratch even back then), but I doubt it's a common use case nowadays.

bayindirh · on March 20, 2024

From my perspective, A filesystem is a critical infrastructure in an OS, and failing here and there and not fixing these bugs because they're not common is not acceptable.

Same for the RAID5/6 bugs in BTRFS. What's their solution? A simple warning in the docs:

> RAID5/6 has known problems and should not be used in production. [0]

Also the CLI discourages you from creating these things. Brilliant.

This is why I don't use BTRFS anywhere. An FS shall be bulletproof. Errors must only cause from hardware problems. Not random bugs in a filesystem.

[0]: https://btrfs.readthedocs.io/en/latest/mkfs.btrfs.html#multi...

chronid · on March 20, 2024

Machines die. Hardware has bugs, or is broken. Things just bork. It's a fact of life.

Would I build a file storage system around btrfs? No - without proper redundancy at least. But I'm told at least Synology does.

I'm pretty sure there's plenty of cases where it's perfectly usable - the feature set it has today is plenty useful and the worst case scenario is an host reimage.

I can live with that. applications will generally break production ten billion times before btrfs does.

bayindirh · on March 20, 2024

> Machines dies. Hardware has bugs, or is broken. Things just bork. It's a fact of life.

I know, I'm a sysadmin. I care for hardware, mend it, heal it, and sometimes donate, cann-bird or bury it. I'm used to it.

> worst case scenario is an host reimage...

While hosting PBs of data on it? No, thanks.

> Would I build a file storage system around btrfs? No - without proper redundancy at least.

Everything is easy for small n. When you store 20TB on 4x5TB drives, everything can be done. When you have a >5PB of storage on racks, you need at least a copy of that system running hot-standby. That's not cheap in any sense.

Instead, I'd use ZFS, Lustre, anything, but not BTRFS.

> I can live with that - applications will generally break production ten billion times before btrfs does.

In our case, no. Our systems doesn't stop because a daemon decided to stop because a server among many fried itself.

chronid · on March 21, 2024

I have worked on and around systems with an order of magnitude more data and a single node failing did not matter. We weren't using btrfs anyway (for data drives) and it definitely was not cheap. But storage never is.

But again, most systems are not like that. Kubernetes cluster nodes? Reimage at will. Compute nodes for vms backed by SAN? Reimage at will. Btrfs can actually make that reimage faster and it's pretty reliable on a single flash drive so why not?

bayindirh · on March 21, 2024

Well, that was my primary point. BTRFS is not ready for these kind of big installations handled by ZFS or Lustre at this point.

On the other hand, BTRFS’ single disk performance, esp, for small files is visibly lower than EXT4 and XFS, so why bother?

There are many solutions for EXT4 which allows versioning, and if I can reimage a node (or 200) in 5 minute flat, why should I bother with the overhead of BTRFS?

It’s not that I haven’t tried BTRFS. Its features are nice, but from my perspective, it’s not ready for prime time, yet. What bothers me is the mental gymnastics pretending that it’s mature at this point.

It’ll be good file system. An excellent one in fact, but it still needs to cook.

wongarsu · on March 20, 2024

My impression of btrfs is that it's very useful and stable if you stay away from the sharp edges. Until you run into some random scenario that leads you to an unrecoverable file system.

But it has been that way for now 14 years. Sure, there are far fewer sharp edges now than there were back then. For a host you can just reimage it's fine, for a well-tested fairly restricted system it's fine. I stay far away from it for personal computers and my home-built NAS, because just about any other fs seems to be more stable.

bayindirh · on March 21, 2024

The thing is, none of the systems I have the luxury to run a filesystem which can randomly explode any time because I pressed a button developers didn't account for, yet.

I have bitten by ReiserFS' superblock corruption once, and that time I had plenty of time to rebuild my system leisurely. My current life doesn't allow for that. I need to be able to depend on my systems.

Again, I believe BTRFS will be an excellent filesystem in the long run. It's not ready yet for "format, mount and forget" from my perspective. Only I'm against is, "it runs on my machine, so yours' is a skill issue" take, which is harmful on many levels.

roelschroeven · on March 21, 2024

Synology uses btrfs on top of classic mdadm RAID; AFAIK they don't use btrfs's built-in RAID, or even any of btrfs's more advanced features.

nolist_policy · on March 20, 2024

You do you.

Personally, btrfs just works and the features are worth it.

Btrfs raid always gets brought up in these discussions, but you can just not use it. The reality is that it didn't have a commercial backer until now with Western Digital.

bayindirh · on March 21, 2024

If it works for you, then it's great. However, this doesn't change the fact that it does not work for many others.

If I'm just not gonna use BTRFS' RAID, I can just use mdadm + any file system I want. In this case, any file system becomes "anything but btrfs" from my point of view.

I've burnt by ReiserFS once. I'm not taking the same gamble with another FS, thanks.

chasil · on March 20, 2024

A rebalance means that every file on the filesystem will be rewritten.

This is drastic, and I'd rather perform such an operation on an image copy.

This is one case where ZFS is absolutely superior; if a drive goes offline, and is returned to a set at a later date, the resilver only touches the changed/needed blocks. Btrfs forces the entire filesystem to be rewritten in a rebalance, which is much more drastic.

I am very willing to allow ZFS mirrors to be degraded; I would never, ever let this happen to btrfs if at all avoidable.

o11c · on March 20, 2024

The desired "compress every file" operation will also cause every file on the filesystem to be rewritten though ...

eru · on March 20, 2024

> It might have been a common use case back when btrfs was new (though I doubt it, most users of btrfs probably created the filesystem from scratch even back then), but I doubt it's a common use case nowadays.

It's perhaps not as common as it once was, but you'd expect it to be common enough to work, and not some obscure corner case.

lxgr · on March 20, 2024

There’s literally no way I could migrate my NAS other than through an in-place FS conversion since it’s >>50% full.

The same probably applies to may consumer devices.