> but the only way to encounter this bug is to be quite cavalier about how you handle a degraded array, by doing something that you shouldn't expect to be safe.
The way to encounter it is to mount the array a second time. That's hardly "cavalier handling".
A decade in, and RAID 1 doesn't work right. Saying this clearly offends the feels here at HN, but it's a fact.
> The way to encounter it is to mount the array a second time. That's hardly "cavalier handling".
`mount` is not the same as `mount -o degraded,rw`. The latter should raise the eyebrows of anyone paying attention to what they're typing. Odds are that you'll have to consult the docs to even find these mount options, because it doesn't happen automatically. This is where any careful, sane user who's concerned about their data would spend a few more minutes thinking through their entire recovery procedure.
There's one corner case to RAID1 recovery where the current tooling does not fully prevent a careless user from putting the FS in a bad state. This is not the same as "RAID1 doesn't work right".
Consider that most users will be careless, because running a filesystem to them is as exiting as any of the other dozen OS services. Having a corner case like this in RAID1 to me is like having a corner case in encryption...
That I immediately run into this case on my first btrfs RAID1 problem, makes me think that there is a steep slope downwards to this corner.
If when a drive in your RAID fails, your first response is anything but checking the docs for the recovery procedure, you're going to end up disappointed sooner or later. When your fault-tolerant system tells you something broke leaving it in a fragile state and your response is to tell the system to ignore that and pretend everything is normal, expect trouble. You're working in a domain where there is no one right answer, so the system cannot magically anticipate how to handle the exceptional situation. None of the above is the fault of btrfs. None of this is avoidable. This problem has to be faced by ZFS, too.
What is avoidable is that btrfs could make it harder to do the equivalent of `mount -o degraded,rw`. But ultimately, there will be some mechanism for modifying a degraded array, and it'll get documented and then excerpted in blog posts and StackOverflow answers without all the context, and users will find a way to work themselves into a corner. There are all kinds of ways to do this with ZFS, too. ZFS tends to default to the approach of requiring you to copy all your data elsewhere and rebuild your array from scratch. What btrfs is doing here is no worse, except that it's a bit less up-front about the limitation because it's actually completely avoidable and this is a fixable UI bug, not a deep-seated architectural limitation.
No, probably not. But this particular bug isn't the reason why users who need very high level assurances should avoid btrfs for now. The nature of this bug does not does lend itself to using it as an argument that btrfs is unsafe in general.
The way to encounter it is to mount the array a second time. That's hardly "cavalier handling".
A decade in, and RAID 1 doesn't work right. Saying this clearly offends the feels here at HN, but it's a fact.