Hacker News new | past | comments | ask | show | jobs | submit login

I’m not an expert whatsoever but what I’ve been doing for my NAS is using mirrored VDEVs. Started with one and later on added a couple more drives for a second mirror.

Coincidentally one of the drives of my 1st mirror died few days ago after rebooting the host machine for updates and I replaced it today, it’s been resilvering for a while.




I’ve read this is suboptimal because you are now stressing the drive that has the only copy of your data to rebuild. what are your thoughts?


Mirrored vdevs resilver a lot faster than zX vdevs. Much less chance of the remaining drive dying during a resilver if it takes hours rather than days.


Is the amount of data read/written the same? I'm not clear why wall time is the relevant metric, unless that's the critical driver of failure likelihood. I would have guessed it's not time but bytes.


A mirror resilver is a relatively linear and sequential rewrite, so its very fast. Raidz resilvers require lots of random reads and writes across all the drives, and requires waiting for all drives to read data before it can be written to the replaced drive - "herding cats" sounds appropriate here.


That makes sense. Usually the received wisdom I hear is "RAIDZ is slower due to the parity calculations", which has always seemed dubious to me on modern CPUs, given we can do much more complex compression and encryption at way faster then disk speed (for HDDs, at least)


Usually what makes RAID-Z slower is not really the parity calculations, it's the fact that each RAID-Z vdev only has the read IOPS of the slowest disk in it.

So for example, a pool with 2 RAID-Z vdevs each containing 5 disks effectively has only 2x the IOPS of a single disk, while a pool with 5 mirror vdevs (i.e. RAID-10) has 10x the IOPS of a single disk.

It's not a big problem if you mostly do sequential I/O but it's a huge difference if/when you do small random reads (e.g. traversing a non-cached directory tree).

In the context of resilvers, currently RAID-Z pools can only be done by traversing the block tree which, due to fragmentation of CoW filesystems, usually leads to a lot of random reads/writes, while a RAID-10 pool can basically resilver a pool while doing almost fully sequential I/O which can be much, much faster.


Do random reads and writes out more stress on the drive mechanism? I’m wondering if this increase in resliver time adds risk of data loss, which is probably the more important factor than clock time. (But I’m open to learning if I am wrong)


I am backing up the whole NAS in case such a failure happens, and as was mentioned in other replies getting the replacement drive in the pool is way quicker on a mirrored vdev.


....vs stressing all of them for parity rebuild.


However RAIDZ2 can survive up to one of those stressed disks failing, while a 2-mirror has everything riding on that single survivor. It seems intuitive that mirrors would be safer, yeah, but if you run the binomial distribution numbers there are some realistic combinations of array size and individual drive failure probabilities where RAIDZ2 really does appear to be safer

Imagine a gambler telling you "you can either throw one k-faced die and you lose if it comes up on 1, or throw n k-faced dice and you lose if two come up on 1". Depending on k and n, the second really can be the better choice

Often people cite the faster rebuild of mirrors as a safety advantage, but the same amount of rebuild IO will occur regardless of how long it takes. Yes, there will be more non-rebuild IO in a larger time window, but unless that routine load was causing disks to fail weekly then I doubt it will change the numbers non-negligibly. It will of course affect array performance though, so mirrors for performance is a good argument


If we were to only compare survival after a fixed number of drives failed vs. storage efficiency, RAIDzN should always come out ahead in any configuration - with mirrors you can get unlucky drives choice fail, with RAIDzN any choice is a good choice. Only way to have RAIDz fail sooner than mirror is to have a comparatively less redundant setup (your choice of N and K).

Realistically though, RAIDz recovery is longer and more stressful, so more of your drives can fail in the critical period, and, assuming you have backups, your storage is there for for usability - mirroring gives you a performant usable system during a fast recovery for the price of a small chance of complete data loss (but you have backups?) vs RAIDz that gives you long recovery pains on a degraded system, but I expect a smaller chance of data loss on a lightly loaded system.


Well, from my probe of 500 servers I have at work I have seen RAID1 rebuild failing zero times and RAID6 failing once (but we did recover it).

Granted, that was a bit of uncommon case where:

* someone forgot to order spares after taking last one

* we still had our consumable buying pipeline going thru helpdesk

* helpdesk didn't had any importance communicated about it and because of some accounting bullshit the purchase got delayed long enough

* the drives in question were all from some segate's fuckup of a model with much higher failure rates.

One disk failed with some media errors, remaining 2 got kicked out of array for same reason during resilvering

We ddrescue'd the 2 on the pair of fresh ones and the bad blocks didn't land in the same place on both drives so it made full recovery. But we did learn many lessons from that..


Wouldn't it be better to apply stress over more drives to minimize the chances that you lose a second drive during a rebuild?


To replace drive in RAID5/6 array, you need to read the entirety of every drive

To replace drive in RAID1 you need to read the entirety of one drive.

Instead of reading one whole drive worth of stress, you're reading N drives worth of stress

But it is funny that the people making arguments about "stressing drives" don't even fucking know how RAID works...


However, RAID1, you stress the drive that has your only copy of the data. With 5/6 (RAIDz w/ 2 spares as well) you can have multiple copies of the data, so although you are overall increasing stress on your fleet, wouldn't it have a lower probability of data loss?

Drive stress (for me) is mostly a concern about data loss.


With RAID5 you'd have higher probability of data loss. Remember, you just loss the only redundancy you had, brining your redundancy to same level as RAID1 with failed drive.

... except now you have more drives that can fail.

With RAID6 yes, you can still fail one more time and still keep your data, which is why it is recommended.

Data safety wise I'd go RAID6 -> RAID1/10 (particularly linux implementation can do raid10 on odd number of drives which is nice) -> RAID5


I believe this is the article I read when I started:

https://jrs-s.net/2015/02/06/zfs-you-should-use-mirror-vdevs...


Note that the listed probabilities of surviving N-disk failures are not correct, though after calculation, though the differences may not be that important.

Listed survival probability of N-disk failure in 8-drive/4-vdev mirror.

1: 1; 2: 0.857; 3: 0.667; 4: 0.400; 5: 0; 6: N/A; 7: N/A; 8: N/A

Proper survival probability:

1: 1; 2: 0.857; 3: 0.571; 4: 0.229; 5: 0; 6: 0; 7: 0; 8: 0

Comparatively, survival probability for 8-drive RAIDz4 with equal amount of usable space:

1: 1; 2: 1; 3: 1; 4: 1; 5: 0; 6: 0; 7: 0; 8: 0

Personally, I'd use multi-vdev mirror pools only for data that is either backed-up or data I can afford to lose completely.


That talks more about RAID5/Z1 vs mirroring.

You still need 6/Z2 if you want to have reasonable fault tolerance, unless you want to waste ungodly amount of space.


My reasoning for using mirrored vdevs was more about the ease of expanding the storage than the redundancy.


We just used RAID6... expanding by more drives is easy while keeping data waste low and the "we want to put bigger drives now" case isn't all that problematic considering that what you save you waste on low efficiency of RAID1 setup.

We did had a big case so we just ran 2xRAID6 setup and that one time where it was needed we just replaced to bigger drives one by one, while using the removed ones as spares for other machines. But that's benefit of scale bigger than "a NAS server".


Yeah mine is a home NAS, not a professional deployment. At first I was a bit disappointed at losing half the storage due to using mirrors but honestly it’s not a big deal considering the prices I managed to find on the drives. And I also wanted to expand at my own pace. The drive I recently used to replace the dead 4TB drive cost me 110€ and it’s a 10TB drive.

I got 2 so I’ll also replace the sibling of the dead 4TB one.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: