However RAIDZ2 can survive up to one of those stressed disks failing, while a 2-mirror has everything riding on that single survivor. It seems intuitive that mirrors would be safer, yeah, but if you run the binomial distribution numbers there are some realistic combinations of array size and individual drive failure probabilities where RAIDZ2 really does appear to be safer
Imagine a gambler telling you "you can either throw one k-faced die and you lose if it comes up on 1, or throw n k-faced dice and you lose if two come up on 1". Depending on k and n, the second really can be the better choice
Often people cite the faster rebuild of mirrors as a safety advantage, but the same amount of rebuild IO will occur regardless of how long it takes. Yes, there will be more non-rebuild IO in a larger time window, but unless that routine load was causing disks to fail weekly then I doubt it will change the numbers non-negligibly. It will of course affect array performance though, so mirrors for performance is a good argument
If we were to only compare survival after a fixed number of drives failed vs. storage efficiency, RAIDzN should always come out ahead in any configuration - with mirrors you can get unlucky drives choice fail, with RAIDzN any choice is a good choice. Only way to have RAIDz fail sooner than mirror is to have a comparatively less redundant setup (your choice of N and K).
Realistically though, RAIDz recovery is longer and more stressful, so more of your drives can fail in the critical period, and, assuming you have backups, your storage is there for for usability - mirroring gives you a performant usable system during a fast recovery for the price of a small chance of complete data loss (but you have backups?) vs RAIDz that gives you long recovery pains on a degraded system, but I expect a smaller chance of data loss on a lightly loaded system.
Well, from my probe of 500 servers I have at work I have seen RAID1 rebuild failing zero times and RAID6 failing once (but we did recover it).
Granted, that was a bit of uncommon case where:
* someone forgot to order spares after taking last one
* we still had our consumable buying pipeline going thru helpdesk
* helpdesk didn't had any importance communicated about it and because of some accounting bullshit the purchase got delayed long enough
* the drives in question were all from some segate's fuckup of a model with much higher failure rates.
One disk failed with some media errors, remaining 2 got kicked out of array for same reason during resilvering
We ddrescue'd the 2 on the pair of fresh ones and the bad blocks didn't land in the same place on both drives so it made full recovery. But we did learn many lessons from that..
Imagine a gambler telling you "you can either throw one k-faced die and you lose if it comes up on 1, or throw n k-faced dice and you lose if two come up on 1". Depending on k and n, the second really can be the better choice
Often people cite the faster rebuild of mirrors as a safety advantage, but the same amount of rebuild IO will occur regardless of how long it takes. Yes, there will be more non-rebuild IO in a larger time window, but unless that routine load was causing disks to fail weekly then I doubt it will change the numbers non-negligibly. It will of course affect array performance though, so mirrors for performance is a good argument