This article seems right up my alley, so here are some thoughts: - ZFS is pretty...

barrkel · on May 29, 2022

Raidz (or preferably raidz2) is good for archival / media streaming / local backup and has good sequential read/write performance, while striped mirrors - raid10 - are better for random access read and write and are a little bit more redundant (i.e. reliable), but costs more in drives for the same usuable space.

Raidz needs to read all of every drive to rebuild after a drive replacement while a striped mirror only needs to read one. However if you're regularly scrubbing zfs then you read it all regularly anyway.

Raidz effectively has a single spindle for random or concurrent I/O since a whole stripe needs to be read or written at a time. Raidz also had a certain amount of wastage owing to how stripes round out (it depends on how many disks are in the array), but you still get a lot more space than striped mirrors.

For a home user on a budget raidz2 usually makes more sense IMO, unless you need more concurrent & random I/O, in which case you should probably build and benchmark different configurations.

I've been using zfs for over 10 years, starting with Nexenta, a defunct oddity with Solaris kernel and Ubuntu userland. These days I use Zfs on Linux. I've never lost data since I started.

K0SM0S · on May 30, 2022

Side note, since it's probably a niche use-case, but using NVMe's in Z1 was both pretty effective with VMs and cheap enough on usable capacity (4×1TB for ~3TB usable).

I feel SSDs are in a totally different category that makes Z1 an actual option, whereas I don't trust it for spinners. Key difference being that a failed SSD can usually be read (only) whereas a failed spinner is usually as good as bricked.

Nursie · on May 30, 2022

I use SATA ssds in RAIDZ2, and have 6*1TB for approx 4TB usable. They're fantastic and I have yet to have an issue replacing a drive. Nor do I need to do it very often as my workloads are pretty low, it's mostly an archival box.

It's been great, right up until one of the sticks of RAM started to fail...

eatbitseveryday · on May 30, 2022

> Raidz needs to read all of every drive to rebuild after a drive replacement while a striped mirror only needs to read one. However if you're regularly scrubbing zfs then you read it all regularly anyway.

Not quite. Each vdev rebuild uses disks within it. A pool with multiple vdev each being raidz does not need to read all disks to rebuild a single raidz vdev. Your statement compares one vdev vs many vdevs. It just happens to be the case folks assume one large vdev with raidz and multiple vdev with mirroring.

If you have a 3- or 4- way mirror wouldn’t ZFS read from all disks in the vdev to rebuild any added disks to the mirror (there can be more than one)?

barrkel · on May 30, 2022

Sure, you can build striped raidz and 3+ way mirrors and other more exotic variants but the two most typical corners of the configuration space are plain raidz2+ for all the drives vs striped mirrors. You lose usable space not putting all your drives into a single parity array and maximizing usable space is usually why you go with parity. Mixing multiple arrays only makes sense from this perspective if you have a mix of drive sizes.

eatbitseveryday · on May 31, 2022

> You lose usable space not putting all your drives into a single parity array

Well if you're aiming for a specific ratio but want greater capacity without upgrading all disks to larger capacities, your only option is to use more vdevs configured just the same.

Example: 66% usable capacity, 6 disks in a raidz2 (4 usable + 2 parity) or 9 disks in a raidz3 (6 usable + 3 parity). If you want to add capacity but maintain your parity ratio (for a given fault tolerance risk) there is no raidzN with N > 3, so you must add vdev.

Increasing the size of the raidz vdev means you're reducing your failure tolerance.

agilob · on May 29, 2022

>- ZFS shouldn't be limited to FreeBSD. The Linux port has come quite a long way. I'd advise you to use PPA over repo though, as many key features are missing from the version on repos.

FreeBSD migrated from own ZFS to OpenZFS so you have single ZFS implementation in BSD and Linux https://openzfs.github.io/openzfs-docs/Getting%20Started/Fre...

vetinari · on May 29, 2022

> I'd advise you to use PPA over repo though, as many key features are missing from the version on repos.

I would advise using ZFS only with distros that come with it (i.e. Ubuntu, Proxmox), especially if you plan to have your / on it. I wasted too much time on CentOS with ZFS, would not do it again.

AviationAtom · on May 29, 2022

ZFS on Root just sounds like pain to me. I opt for MD RAID on root and then ZFS my other volumes.

I would also say Ubuntu is probably the better choice for Linux ZFS, as CentOS seems to be lacking good support.

mustache_kimono · on May 29, 2022

Once you try it, you're never going back. Snapshots are made for things like system administration. Upgrade borked your system? Just rollback.

Want to use the last version of your firewall config? I wrote a utility you might like to try, httm[1], which allows you to restore from your snapshot-ed unique versions.

If you like ZFS, then trust me you have to have ZFS on root.

[1]: https://crates.io/crates/httm

AviationAtom · on May 29, 2022

Had you previously done a Show HN on this? I feel like I saw it once before.

mustache_kimono · on May 29, 2022

Someone else posted about it awhile ago: https://news.ycombinator.com/item?id=31184404

_abox · on May 29, 2022

ZFS on root is really amazing on FreeBSD and the advantage is that you can snapshot your boot drive.

mvanbaak · on May 29, 2022

have a look at Boot Environments. It really is amazing.

_abox · on May 29, 2022

Yes I know, bectl. I use it.

I just didn't want to mention it because the discussion was mainly about Linux. But FreeBSD has a really strong toolchain for this indeed.

indigodaddy · on May 30, 2022

Also a Solaris thing, yeah? (beadm etc)

aborsy · on May 29, 2022

Yeah, Ubuntu has done a pretty good job with ZFS on root installation.

Zero setup, works out of box. Highly recommend ZFS and Ubuntu with ZFS!

AviationAtom · on May 29, 2022

I think for me I am more concerned about trying to get the OS bootable again, if something becomes corrupted on the OS level. Even with MD RAID it came be a bit of a struggle to recover, but ZFS on Root seemed much harder to troubleshoot and repair. Perhaps I am mistaken in this belief though?

aborsy · on May 29, 2022

In Ubuntu’s implementation, root and boot are separate pools (bpool, rpool). Both are (and can be manually) snapshoted. So if boot is corrupted, you roll back. I should say I haven’t tried it though, to see how boot selection works (rolling back rpool is straightforward though).

The boot corruption could occur with the default file system ext4 also, except with ext4 there l is no recourse.

Needless to say, you can always boot from a live USB and mount your ZFS pool (and perhaps roll back).

mustache_kimono · on May 29, 2022

I've had to recover a ZFS on root system, whose bootloader installation I had somehow screwed up, and the process is pretty straight forward.

See: https://openzfs.github.io/openzfs-docs/Getting%20Started/Ubu...

aborsy · on May 29, 2022

Isn’t ZFS there precisely to address your concern?!

If OS doesn’t boot, you boot from the latest snapshot! Every time you run apt-get upgrade, a system snapshots is taken automatically and an entry is added to boot menu.

AviationAtom · on May 29, 2022

I guess I was referring to more to corruption resulting in an unbootable system. If you can't boot in then how would you roll it back?

_abox · on May 29, 2022

That's where backups come in. Any filesystem can get corrupted. Though for ZFS it's less likely than with something like ext4. Even though both have journalling, only ZFS has copy on write.

vetinari · on May 30, 2022

My problem was rather with problematic updates of zfs itself.

The update "helpfully" updated initramfs for older kernels too... and if something broke, it broke previous versions too, so they all were unbootable. Eventually I ended up with an USB stick at hand with known bootable environment :(

KennyBlanken · on May 29, 2022

> I don't recall why, but I've heard that RAIDZ should be avoided, in favor of stripped mirrors.

Most people care about random IO (also once your filesystem has been populated and in use for a while, true linear IO really ceases to be due to fragmentation.) Striped arrays lose random IO performance as drive count goes up; an array of mirrored pairs gains random IO performance. This is less of an issue with tiered storage and cache devices, especially given you almost have to work to find an SSD less than 256GB these days.

You can only upgrade a zdev by upgrading all its drives; it's a lot nicer cash-flow-wise to gradually upgrade a mirrored pair here and there, or upgrade exactly how many pairs you need to for the space you need.

With RAID-Z you have a drive fail and pray a second doesn't fail during the resilver. With RAID-Z2 you can have any two drives fail. With mirrors you can lose 50% of your drives (provided that they're the right drives.)

magicalhippo · on May 29, 2022

> also once your filesystem has been populated and in use for a while, true linear IO really ceases to be due to fragmentation

Enough concurrent clients doing sequential IO also looks like random IO to a storage server.

Nursie · on May 30, 2022

> You can only upgrade a zdev by upgrading all its drives

This is no longer the case, or at least, should no longer be the case soon. The ability to add drives to a zpool has been announced, and will trickle through to stable before too long.

SkyMarshal · on May 29, 2022

> - ZFS shouldn't be limited to FreeBSD. The Linux port has come quite a long way. I'd advise you to use PPA over repo though, as many key features are missing from the version on repos.

Agreed. Also for anyone using NixOS, I've found its ZFS support is first class and easy to set up:

https://www.reddit.com/r/NixOS/comments/ops0n0/big_shoutout_...

AviationAtom · on May 29, 2022

I also forgot to mention, your guidance on ZFS memory requirements is outdated. From what I have heard recent releases have drastically reduced the size of ARC cache necessary. One person reported it working phenomenally on a newer Raspberry Pi.

_abox · on May 29, 2022

True but COW is even harder on an SD than ext4 is so I would really not use it on a pi unless it's not using SD storage :)

AviationAtom · on May 29, 2022

I do think many folks using such applications are booting from disk. IIRC, Raspberry Pi 4 supports disk booting natively.

agapon · on May 29, 2022

Been using ZFS on eMMC with things like Orange Pi-s and Rock64-s for a few years, so far works good for me.

gjulianm · on May 29, 2022

> - TrueNAS is more targeted towards enterprise applications. If you want good utility as a home user then give Proxmox or the like a look. Then you can make it into more than just a NAS (if you're open to it).

I have questions about this. I'm thinking of building my own NAS server, and I don't know which OS to use. On the one hand it looks like people recommend TrueNAS a lot, which is nice now that they have a Linux version, but I'm not really sure what does it offer over a raw Debian apart from web/configuration and some extra tools? I have quite some experience in running Debian systems and managing RAIDs (not with ZFS but doesn't seem too much of a jump) and I worry that TrueNAS, while nice at the beginning, might end up being limiting if I start to tweak too much (I plan on using that NAS for more things than just storage).

aborsy · on May 29, 2022

What you miss with raw Debian compared to TrueNAS is compatibility. TrueNAS makes sure that all pieces are compatible with one another so that when you update each piece or OS, the storage doesn’t break. The whole package is tested throughly before release.

Also, TrueNAS makes setup painless: users, permissions, shares, vdevs, ZFS tuning, nice dashboard etc. With Debian, you get a lot of config files and ansible playbooks that become hard to manage.

Ideally you won’t run other stuff on a NAS, outside Docker.

AviationAtom · on May 29, 2022

There's been a movement in the industry to bring storage back onto servers. They use the fancy buzz term of "hyper convergence" now though.

I will definitely argue that TrueNAS gives stability and ease of management. Some of that can be found with Proxmox too though. I think it just really depends on which medium you prefer. Perhaps trying both is the best option?

thefunnyman · on May 30, 2022

I’d try both, they tend to target different use cases. Proxmox doesn’t come with a lot of the tooling for things like SMB, LDAP, etc that Truenas ships with but you may find you don’t actually want any of these extras. In that case Proxmox would be a better choice IMO since it’s Debian based and a bit more streamlined.

fomine3 · on May 30, 2022

Proxmox is my way. It supports newer ZFS version and support term for major version is fine. I put NAS functionality on LXC container that passthru'd host's filesystem.

AviationAtom · on May 29, 2022

If you want it to be strictly a NAS then TrueNAS should suffice. If you want to do anything more then I'd consider Proxmox or Ubuntu.

sngz · on May 30, 2022

> If you want good utility as a home user then give Proxmox or the like a look.

Isn't proxmox just virtualization? didn't know it can be used as a NAS too.

AviationAtom · on May 31, 2022

As thefunnyman pointed out, TrueNAS will make for a more user friendly sharing experience, but you could also just load a container up with Samba on Proxmox, to give you the NAS feature you seek. The heavy lifting of software RAID/ZFS would then be handled by Proxmox.

thefunnyman · on May 30, 2022

IMO NAS setup is much more straightforward on truenas. It’s doable in Proxmox inside a VM or LXC, but true as exposes this all directly via a nice UI. I personally use Proxmox with a simple Debian NFS VM, but for less technical users that just want stuff to work I tend to recommend they stick with truenas.

sngz · on May 31, 2022

how well does it handle running the NFS VM and using the NAS to serve files to jellyfin / plex running in a VM?

cm2187 · on May 29, 2022

Is there any reason not to use zfs for a NAS? I am thinking of a NAS hosting multimedia files, surely the ability to roll back must comes at a cost in term of disk space for large files.

simcop2387 · on May 30, 2022

Because it's a copy on write file system The snapshots don't take up any extra space until you do any writing. The common parts between snapshots are shared between them in storage so you only pay the cost for any changes (rounded to some block size).

cm2187 · on May 30, 2022

But if I remux a large video, won't I have two versions of that video taking space on the disk? If I do that often, that will eat the capacity very quickly.

mardifoufs · on May 29, 2022

I know software raid is better overall but are there any advantages to hardware raid anymore? Is it just worse at everything?

simonjgreen · on May 29, 2022

I would go as far as to say "hardware raid" these days is limiting, expensive, and less performant that can be achieved with software RAID.

Anthony-G · on May 29, 2022

I’m in the process of setting up a home server after buying a pair of matching 3TB Western Digital “Red” disks. I plan on installing them in a HPE ProLiant MicroServer G7 Server / HP Micro G7 N40L that I was gifted a couple of years ago. Even though it comes with a hardware RAID, I was considering setting up RAID 1 using Linux software RAID. However, according to the Linux Raid Wiki¹, Hardware RAID 1 is better than Software RAID 1.

> This is in fact one of the very few places where Hardware RAID solutions can have an edge over Software solutions - if you use a hardware RAID card, the extra write copies of the data will not have to go over the PCI bus, since it is the RAID controller that will generate the extra copy.

I was intending to use these disks for local backup and for storing rips of my extensive CD and DVD collection. As sibling comments mention, the possibility of the hardware controller failing is a worry, so I’d need to have a backup strategy for the backup disks. Since it’s going to be a home server, down-time wouldn’t be a problem.

I don’t have much experience with either hardware or software RAID so I’d welcome any advice.

¹ https://raid.wiki.kernel.org/index.php/Overview#What_is_RAID...

justsomehnguy · on May 30, 2022

> > This is in fact one of the very few places where Hardware RAID solutions can have an edge over Software solutions - if you use a hardware RAID card, the extra write copies of the data will not have to go over the PCI bus, since it is the RAID controller that will generate the extra copy.

This is not a thing to bother (especially after we moved from PCI to PCI-E) for a home user.

The only great thing about HW RAID is what in case your primary drive fail-but-not-fail-completely, ie it would still be seen in the BIOS and BIOS would try to boot from it (but it wouldn't be able to, because drive is half-dead) is what for the BIOS a controller presents a single device and so it would allow booting from a healthy drive.

But again, if this is not a server in a remote oil digging site served twice a year by air (been there, done that) this is not a thing to bother for a home user.

> the possibility of the hardware controller failing is a worry, so I’d need to have a backup strategy for the backup disks

If you use a basic mirror (striped or not) the recovery process is straightforward - for a simple mirror just stick it in any other system/controller, for a striped you would need GetDataBack or R-Studio or just find a newer model of the RAID card from the same vendor.

In your case I would advise to have a single disk in the ODD bay as a boot/system drive and use both your HDDs as an LVM PVs, without fdisk shenanigans. If you/when you decide to upgrade/replace disks the migration would be just a couple of commands like

    pvcreate /dev/sdc
    vgextend your_vg /dev/sdc
    pvmove /dev/mapper/your_vg__your_lv /dev/sda /dev/sdc
    vgreduce /dev/sda

Anthony-G · on May 30, 2022

Wow! That's all really informative. I had thought that a non-striped, mirrored disk should be usable if the H/W RAID controller failed (given that all the data is stored on the disk) but I didn’t know for sure. I didn’t think it would also be possible to recover striped disks.

Thanks for the great advice on using LVM and the commands for replacing a disk. I only recently cane across a system that used the disk block device itself as a physical volume – rather than partitioning it via fdisk or parted. Other than using LVM for creating snapshots, I haven't really used it for anything interesting.

justsomehnguy · on May 30, 2022

> but I didn’t know for sure

The array metadata is just stored at the end of the disk, so this is not a problem to just attach a disk somewhere rlse

> commands for replacing a disk

Take it with a grain of salt, I just wrote them from memory. But the overall process is exactly as I said.

Glad you found that helpful.

foobarian · on May 30, 2022

Funny, I have the identical setup with a small SSD for /, and two of the 3TB Reds for my media/music/files. I have seen enough pain from HW Raid controller setups (custom drivers, configs, long recovery) that HW Raid was right out. Then, for soft raid I had bad experiences in the past where the system refused to come up with one disk in the mirror pair bad. I'm sure there is a way to configure this properly but at this point I said fuck it and just mirror one of the WDs by hand to the other via rsync. Think hard (or even test): what actions will need to happen when one of your disks fails? And second, do you need backup or RAID more? Fat-finger deleting a file on a RAID-backed file system will leave you with no backup unless it has snapshots.

Some notes.

- Not having Raid really doesn't matter. The primary purpose seems to be to save a little bit of space by clever checksumming or increase read performance from parallel operation, but none of this is valuable to me.

- I use ext4. I think it would make sense to move to a snapshot-capable system to make the periodic rsync-backups more correct (I don't even bother dropping to r/o mode, since it's fairly static data).

- What really keeps me up at night: bit flips silently corrupting files. I think btrfs or ZFS are supposed to solve this through constant background checksumming. I really need a periodic process to checksum every file and take action on exceptions. Note that RAID will not help you with this.

- This has worked pretty well so far. Twice already (over 12 years) have I had one disk in the pair fail, upon which time I would order a new pair (they were bigger/cheaper by that point and I figured the other one would fail soon) and rebuild the server.

Anthony-G · on May 30, 2022

Thanks so much for sharing your experience. I look after a couple of server in my current job that have hardware RAID (Dell) but they were set up by my predecessors and they’ve never given any trouble since I started working there (touch wood). I have also not had any problems with Linux Software RAID, e.g., I was pleasantly surprised to discover that I could move the RAID 1 disks from my previous home server (a repurposed Dell workstation) to the Proliant microserver and my Debian OS had no problem recognising them.

The consensus seems to be that RAID isn’t particularly useful for a home server. Backup would be more useful as I would be more likely to accidentally delete or over-write a file than for a disk to fail catastrophically) so I think I might use the second disk, similar to how you use yours. I would also be better served by using ZFS; its de-duplication would also be useful – so I’m going to try it as an experiment.

zepearl · on May 29, 2022

> according to the Linux Raid Wiki¹, Hardware RAID 1 is better than Software RAID 1.

Don't take that too much into consideration - the article was last updated in 2007 ( https://raid.wiki.kernel.org/index.php?title=Overview&action... ) so it lacks some details (the same can be said as well for many ZFS-related infos that you might find) => nowadays doublechecking articles related to raid and ZFS is a must.

In my case I bought some HBA (Host Bus Adapter) cards (e.g. LSI SAS 9211-8i), set their BIOS to not do anything special with the HDDs connected to it (to be able to use them as well with other controllers) and used mdadm (earlier) or ZFS (now) to create my RAIDs => it works well, i get max throughput of ~200MiB per disk, I have all fancy features of ZFS without the problem of proprietary stuff related to the controller card :)

Anthony-G · on May 30, 2022

Thanks for the tips and feedback. I’ll stick with Software RAID. I wasn’t sure that the system would have enough resources to support ZFS but I think I’ll try it out before I commit large amounts of data to it.

AviationAtom · on May 29, 2022

I think the worst point is vendor lock-in. If your controller fails, and a replacement isn't available, then you may be dead in the water. That kind of goes against the very point of RAID.

_abox · on May 29, 2022

But this is why businesses spend so much on their RAID controllers. To make sure they're in warranty and that kind of thing doesn't happen.

Incidentally its also pretty great because no business buys them second hand without warranty. So they're usually available for half nothing.

I don't use raid cards right now but I do use fibre channel which is also dirt cheap second hand

AdrianB1 · on May 29, 2022

3 weeks ago I had a controller failure in a manufacturing plant in Latin America. The contract with the manufacturer was to provide a replacement on site in 4 hours. Guess what, 8 hours later the technician with the replacement controller was still on the way.

With TrueNAS I can move my drives to any other computer with the right interface and they will just work. I did this in the past 10 years of using TrueNAS.

_abox · on May 29, 2022

I can imagine but companies want to offload responsibility.

If the manufacturer is late they can blame them, after all your IT manager paid for 4hr support so they've covered their ass.

If your TrueNAS fails at work, it's your ass on the line :P

I totally agree these special drive formats are really annoying. In this case I'd probably keep a spare on hand myself.

simonjgreen · on May 30, 2022

Spending lots of money on hardware support doesn't guarantee it won't fail, or even that you will get replacements fast, it just guarantees that a contract exists saying something. As an architect we should engineer our solutions to cater for hardware failure.

I call this sort of thing a technical guarantee rather than a commercial guarantee