Oh, cool! I was referring to SHR. I thought it was a proprietary format and didn't realize you could access it from non-Synology systems. I've updated the post:
This article seems right up my alley, so here are some thoughts:
- ZFS is pretty amazing in it's abilities, with it ushering in the age of software RAID over hardware RAID
- ZFS shouldn't be limited to FreeBSD. The Linux port has come quite a long way. I'd advise you to use PPA over repo though, as many key features are missing from the version on repos.
- TrueNAS is more targeted towards enterprise applications. If you want good utility as a home user then give Proxmox or the like a look. Then you can make it into more than just a NAS (if you're open to it).
- If you want to make things even more simple then consider something like UnRAID.
- ZFS' snapshotting can really shine on a virtualization server application, with the ability to revert KVM VMs to a previous state in a matter of seconds. Lookup Jim Salter's (great dude) Sanoid project to see a prime example.
- I don't recall why, but I've heard that RAIDZ should be avoided, in favor of stripped mirrors.
Raidz (or preferably raidz2) is good for archival / media streaming / local backup and has good sequential read/write performance, while striped mirrors - raid10 - are better for random access read and write and are a little bit more redundant (i.e. reliable), but costs more in drives for the same usuable space.
Raidz needs to read all of every drive to rebuild after a drive replacement while a striped mirror only needs to read one. However if you're regularly scrubbing zfs then you read it all regularly anyway.
Raidz effectively has a single spindle for random or concurrent I/O since a whole stripe needs to be read or written at a time. Raidz also had a certain amount of wastage owing to how stripes round out (it depends on how many disks are in the array), but you still get a lot more space than striped mirrors.
For a home user on a budget raidz2 usually makes more sense IMO, unless you need more concurrent & random I/O, in which case you should probably build and benchmark different configurations.
I've been using zfs for over 10 years, starting with Nexenta, a defunct oddity with Solaris kernel and Ubuntu userland. These days I use Zfs on Linux. I've never lost data since I started.
Side note, since it's probably a niche use-case, but using NVMe's in Z1 was both pretty effective with VMs and cheap enough on usable capacity (4×1TB for ~3TB usable).
I feel SSDs are in a totally different category that makes Z1 an actual option, whereas I don't trust it for spinners. Key difference being that a failed SSD can usually be read (only) whereas a failed spinner is usually as good as bricked.
I use SATA ssds in RAIDZ2, and have 6*1TB for approx 4TB usable. They're fantastic and I have yet to have an issue replacing a drive. Nor do I need to do it very often as my workloads are pretty low, it's mostly an archival box.
It's been great, right up until one of the sticks of RAM started to fail...
> Raidz needs to read all of every drive to rebuild after a drive replacement while a striped mirror only needs to read one. However if you're regularly scrubbing zfs then you read it all regularly anyway.
Not quite. Each vdev rebuild uses disks within it. A pool with multiple vdev each being raidz does not need to read all disks to rebuild a single raidz vdev. Your statement compares one vdev vs many vdevs. It just happens to be the case folks assume one large vdev with raidz and multiple vdev with mirroring.
If you have a 3- or 4- way mirror wouldn’t ZFS read from all disks in the vdev to rebuild any added disks to the mirror (there can be more than one)?
Sure, you can build striped raidz and 3+ way mirrors and other more exotic variants but the two most typical corners of the configuration space are plain raidz2+ for all the drives vs striped mirrors. You lose usable space not putting all your drives into a single parity array and maximizing usable space is usually why you go with parity. Mixing multiple arrays only makes sense from this perspective if you have a mix of drive sizes.
> You lose usable space not putting all your drives into a single parity array
Well if you're aiming for a specific ratio but want greater capacity without upgrading all disks to larger capacities, your only option is to use more vdevs configured just the same.
Example: 66% usable capacity, 6 disks in a raidz2 (4 usable + 2 parity) or 9 disks in a raidz3 (6 usable + 3 parity). If you want to add capacity but maintain your parity ratio (for a given fault tolerance risk) there is no raidzN with N > 3, so you must add vdev.
Increasing the size of the raidz vdev means you're reducing your failure tolerance.
>- ZFS shouldn't be limited to FreeBSD. The Linux port has come quite a long way. I'd advise you to use PPA over repo though, as many key features are missing from the version on repos.
> I'd advise you to use PPA over repo though, as many key features are missing from the version on repos.
I would advise using ZFS only with distros that come with it (i.e. Ubuntu, Proxmox), especially if you plan to have your / on it. I wasted too much time on CentOS with ZFS, would not do it again.
Once you try it, you're never going back. Snapshots are made for things like system administration. Upgrade borked your system? Just rollback.
Want to use the last version of your firewall config? I wrote a utility you might like to try, httm[1], which allows you to restore from your snapshot-ed unique versions.
If you like ZFS, then trust me you have to have ZFS on root.
I think for me I am more concerned about trying to get the OS bootable again, if something becomes corrupted on the OS level. Even with MD RAID it came be a bit of a struggle to recover, but ZFS on Root seemed much harder to troubleshoot and repair. Perhaps I am mistaken in this belief though?
In Ubuntu’s implementation, root and boot are separate pools (bpool, rpool). Both are (and can be manually) snapshoted. So if boot is corrupted, you roll back. I should say I haven’t tried it though, to see how boot selection works (rolling back rpool is straightforward though).
The boot corruption could occur with the default file system ext4 also, except with ext4 there l is no recourse.
Needless to say, you can always boot from a live USB and mount your ZFS pool (and perhaps roll back).
Isn’t ZFS there precisely to address your concern?!
If OS doesn’t boot, you boot from the latest snapshot! Every time you run apt-get upgrade, a system snapshots is taken automatically and an entry is added to boot menu.
That's where backups come in. Any filesystem can get corrupted. Though for ZFS it's less likely than with something like ext4. Even though both have journalling, only ZFS has copy on write.
My problem was rather with problematic updates of zfs itself.
The update "helpfully" updated initramfs for older kernels too... and if something broke, it broke previous versions too, so they all were unbootable. Eventually I ended up with an USB stick at hand with known bootable environment :(
> I don't recall why, but I've heard that RAIDZ should be avoided, in favor of stripped mirrors.
Most people care about random IO (also once your filesystem has been populated and in use for a while, true linear IO really ceases to be due to fragmentation.) Striped arrays lose random IO performance as drive count goes up; an array of mirrored pairs gains random IO performance. This is less of an issue with tiered storage and cache devices, especially given you almost have to work to find an SSD less than 256GB these days.
You can only upgrade a zdev by upgrading all its drives; it's a lot nicer cash-flow-wise to gradually upgrade a mirrored pair here and there, or upgrade exactly how many pairs you need to for the space you need.
With RAID-Z you have a drive fail and pray a second doesn't fail during the resilver. With RAID-Z2 you can have any two drives fail. With mirrors you can lose 50% of your drives (provided that they're the right drives.)
> You can only upgrade a zdev by upgrading all its drives
This is no longer the case, or at least, should no longer be the case soon. The ability to add drives to a zpool has been announced, and will trickle through to stable before too long.
> - ZFS shouldn't be limited to FreeBSD. The Linux port has come quite a long way. I'd advise you to use PPA over repo though, as many key features are missing from the version on repos.
Agreed. Also for anyone using NixOS, I've found its ZFS support is first class and easy to set up:
I also forgot to mention, your guidance on ZFS memory requirements is outdated. From what I have heard recent releases have drastically reduced the size of ARC cache necessary. One person reported it working phenomenally on a newer Raspberry Pi.
> - TrueNAS is more targeted towards enterprise applications. If you want good utility as a home user then give Proxmox or the like a look. Then you can make it into more than just a NAS (if you're open to it).
I have questions about this. I'm thinking of building my own NAS server, and I don't know which OS to use. On the one hand it looks like people recommend TrueNAS a lot, which is nice now that they have a Linux version, but I'm not really sure what does it offer over a raw Debian apart from web/configuration and some extra tools? I have quite some experience in running Debian systems and managing RAIDs (not with ZFS but doesn't seem too much of a jump) and I worry that TrueNAS, while nice at the beginning, might end up being limiting if I start to tweak too much (I plan on using that NAS for more things than just storage).
What you miss with raw Debian compared to TrueNAS is compatibility. TrueNAS makes sure that all pieces are compatible with one another so that when you update each piece or OS, the storage doesn’t break. The whole package is tested throughly before release.
Also, TrueNAS makes setup painless: users, permissions, shares, vdevs, ZFS tuning, nice dashboard etc. With Debian, you get a lot of config files and ansible playbooks that become hard to manage.
Ideally you won’t run other stuff on a NAS, outside Docker.
There's been a movement in the industry to bring storage back onto servers. They use the fancy buzz term of "hyper convergence" now though.
I will definitely argue that TrueNAS gives stability and ease of management. Some of that can be found with Proxmox too though. I think it just really depends on which medium you prefer. Perhaps trying both is the best option?
I’d try both, they tend to target different use cases. Proxmox doesn’t come with a lot of the tooling for things like SMB, LDAP, etc that Truenas ships with but you may find you don’t actually want any of these extras. In that case Proxmox would be a better choice IMO since it’s Debian based and a bit more streamlined.
Proxmox is my way. It supports newer ZFS version and support term for major version is fine. I put NAS functionality on LXC container that passthru'd host's filesystem.
As thefunnyman pointed out, TrueNAS will make for a more user friendly sharing experience, but you could also just load a container up with Samba on Proxmox, to give you the NAS feature you seek. The heavy lifting of software RAID/ZFS would then be handled by Proxmox.
IMO NAS setup is much more straightforward on truenas. It’s doable in Proxmox inside a VM or LXC, but true as exposes this all directly via a nice UI. I personally use Proxmox with a simple Debian NFS VM, but for less technical users that just want stuff to work I tend to recommend they stick with truenas.
Is there any reason not to use zfs for a NAS? I am thinking of a NAS hosting multimedia files, surely the ability to roll back must comes at a cost in term of disk space for large files.
Because it's a copy on write file system The snapshots don't take up any extra space until you do any writing. The common parts between snapshots are shared between them in storage so you only pay the cost for any changes (rounded to some block size).
But if I remux a large video, won't I have two versions of that video taking space on the disk? If I do that often, that will eat the capacity very quickly.
I’m in the process of setting up a home server after buying a pair of matching 3TB Western Digital “Red” disks. I plan on installing them in a HPE ProLiant MicroServer G7 Server / HP Micro G7 N40L that I was gifted a couple of years ago. Even though it comes with a hardware RAID, I was considering setting up RAID 1 using Linux software RAID. However, according to the Linux Raid Wiki¹, Hardware RAID 1 is better than Software RAID 1.
> This is in fact one of the very few places where Hardware RAID solutions can have an edge over Software solutions - if you use a hardware RAID card, the extra write copies of the data will not have to go over the PCI bus, since it is the RAID controller that will generate the extra copy.
I was intending to use these disks for local backup and for storing rips of my extensive CD and DVD collection. As sibling comments mention, the possibility of the hardware controller failing is a worry, so I’d need to have a backup strategy for the backup disks. Since it’s going to be a home server, down-time wouldn’t be a problem.
I don’t have much experience with either hardware or software RAID so I’d welcome any advice.
> > This is in fact one of the very few places where Hardware RAID solutions can have an edge over Software solutions - if you use a hardware RAID card, the extra write copies of the data will not have to go over the PCI bus, since it is the RAID controller that will generate the extra copy.
This is not a thing to bother (especially after we moved from PCI to PCI-E) for a home user.
The only great thing about HW RAID is what in case your primary drive fail-but-not-fail-completely, ie it would still be seen in the BIOS and BIOS would try to boot from it (but it wouldn't be able to, because drive is half-dead) is what for the BIOS a controller presents a single device and so it would allow booting from a healthy drive.
But again, if this is not a server in a remote oil digging site served twice a year by air (been there, done that) this is not a thing to bother for a home user.
> the possibility of the hardware controller failing is a worry, so I’d need to have a backup strategy for the backup disks
If you use a basic mirror (striped or not) the recovery process is straightforward - for a simple mirror just stick it in any other system/controller, for a striped you would need GetDataBack or R-Studio or just find a newer model of the RAID card from the same vendor.
In your case I would advise to have a single disk in the ODD bay as a boot/system drive and use both your HDDs as an LVM PVs, without fdisk shenanigans. If you/when you decide to upgrade/replace disks the migration would be just a couple of commands like
Wow! That's all really informative. I had thought that a non-striped, mirrored disk should be usable if the H/W RAID controller failed (given that all the data is stored on the disk) but I didn’t know for sure. I didn’t think it would also be possible to recover striped disks.
Thanks for the great advice on using LVM and the commands for replacing a disk. I only recently cane across a system that used the disk block device itself as a physical volume – rather than partitioning it via fdisk or parted. Other than using LVM for creating snapshots, I haven't really used it for anything interesting.
Funny, I have the identical setup with a small SSD for /, and two of the 3TB Reds for my media/music/files. I have seen enough pain from HW Raid controller setups (custom drivers, configs, long recovery) that HW Raid was right out. Then, for soft raid I had bad experiences in the past where the system refused to come up with one disk in the mirror pair bad. I'm sure there is a way to configure this properly but at this point I said fuck it and just mirror one of the WDs by hand to the other via rsync. Think hard (or even test): what actions will need to happen when one of your disks fails? And second, do you need backup or RAID more? Fat-finger deleting a file on a RAID-backed file system will leave you with no backup unless it has snapshots.
Some notes.
- Not having Raid really doesn't matter. The primary purpose seems to be to save a little bit of space by clever checksumming or increase read performance from parallel operation, but none of this is valuable to me.
- I use ext4. I think it would make sense to move to a snapshot-capable system to make the periodic rsync-backups more correct (I don't even bother dropping to r/o mode, since it's fairly static data).
- What really keeps me up at night: bit flips silently corrupting files. I think btrfs or ZFS are supposed to solve this through constant background checksumming. I really need a periodic process to checksum every file and take action on exceptions. Note that RAID will not help you with this.
- This has worked pretty well so far. Twice already (over 12 years) have I had one disk in the pair fail, upon which time I would order a new pair (they were bigger/cheaper by that point and I figured the other one would fail soon) and rebuild the server.
Thanks so much for sharing your experience. I look after a couple of server in my current job that have hardware RAID (Dell) but they were set up by my predecessors and they’ve never given any trouble since I started working there (touch wood). I have also not had any problems with Linux Software RAID, e.g., I was pleasantly surprised to discover that I could move the RAID 1 disks from my previous home server (a repurposed Dell workstation) to the Proliant microserver and my Debian OS had no problem recognising them.
The consensus seems to be that RAID isn’t particularly useful for a home server. Backup would be more useful as I would be more likely to accidentally delete or over-write a file than for a disk to fail catastrophically) so I think I might use the second disk, similar to how you use yours. I would also be better served by using ZFS; its de-duplication would also be useful – so I’m going to try it as an experiment.
> according to the Linux Raid Wiki¹, Hardware RAID 1 is better than Software RAID 1.
Don't take that too much into consideration - the article was last updated in 2007 ( https://raid.wiki.kernel.org/index.php?title=Overview&action... ) so it lacks some details (the same can be said as well for many ZFS-related infos that you might find) => nowadays doublechecking articles related to raid and ZFS is a must.
In my case I bought some HBA (Host Bus Adapter) cards (e.g. LSI SAS 9211-8i), set their BIOS to not do anything special with the HDDs connected to it (to be able to use them as well with other controllers) and used mdadm (earlier) or ZFS (now) to create my RAIDs => it works well, i get max throughput of ~200MiB per disk, I have all fancy features of ZFS without the problem of proprietary stuff related to the controller card :)
Thanks for the tips and feedback. I’ll stick with Software RAID. I wasn’t sure that the system would have enough resources to support ZFS but I think I’ll try it out before I commit large amounts of data to it.
I think the worst point is vendor lock-in. If your controller fails, and a replacement isn't available, then you may be dead in the water. That kind of goes against the very point of RAID.
3 weeks ago I had a controller failure in a manufacturing plant in Latin America. The contract with the manufacturer was to provide a replacement on site in 4 hours. Guess what, 8 hours later the technician with the replacement controller was still on the way.
With TrueNAS I can move my drives to any other computer with the right interface and they will just work. I did this in the past 10 years of using TrueNAS.
Spending lots of money on hardware support doesn't guarantee it won't fail, or even that you will get replacements fast, it just guarantees that a contract exists saying something. As an architect we should engineer our solutions to cater for hardware failure.
I call this sort of thing a technical guarantee rather than a commercial guarantee
I consider myself an intermediate homelabber and a TrueNAS beginner. I just built my first NAS server, so wrote this to capture everything I wish I'd known at the start. I hope it's helpful for anyone else thinking about building their first NAS server.
Any questions or feedback about the post are more than welcome.
A few points from someone with years managing raid and ZFS in arrays all the way up to 50 disks:
RAID-Z1 is something I never consider without a solid backup to restore from and a plan to execute that process at least once in the lifecycle of an array.
If you suffer a total disk failure of one of those disks in the array, you have likely lost some data. The good news is that ZFS will tell you exactly which files you have lost data for and cannot rebuild. If you have those files, you can overwrite them with the backups to get your integrity back.
The reason is, with a total loss of a single disk, any read error on any of the remaining disks is a lost/corrupted file.
For this reason, you need a strong(easily accessible, consistent, current) backup strategy and an acceptance of downtime with Z1.
As for ECC, it's better, but your absolute worse case scenario is that you get a bit flip before the sync and hash happens, and now that bit flipped data is committed to disk and you think it's OK. I prefer ECC to avoid this, but you are still reaping a multitude of benefits from ZFS without ECC.
The only valid rule for RAM and ZFS is that more RAM = more caching of recently read data. Single, or very few user appliances will see little benefit past 8GB even with 100TB unless you happen to be reading the same data over and over. Where ZFS shines is having hundreds of gigabytes of RAM and tens or more concurrent users mostly accessing the same data. That way the vast majority of reads are from RAM and the overall disk IOPS remain mostly idle.
Most of the ZFS RAM myths come from Deduplication, which should be disregarded as a ZFS feature until they allow storing of the DDT on a Optane-like latency device. Even better would be offline deduplication, but I doubt that will be a thing in ZFS this decade.
> If you suffer a total disk failure of one of those disks in the array, you have likely lost some data. [...] The reason is, with a total loss of a single disk, any read error on any of the remaining disks is a lost/corrupted file.
Wait, what? If a RAID-(z)1 ZFS array loses one disk, there's data loss? I've ran so many RAID-1 and RAID-10 arrays with mdadm that I can't even being to count them, and I had many drive failures. If any of those arrays would have corrupted data, I would have been mad as hell.
What I am missing here? How is this even remotely acceptable?
> any read error on any of the remaining disks is a lost/corrupted file.
That is the meat of it. With traditional RAID it is the same issue, except you never know it happens because as long as the controller reads something, it's happy to replicate that corruption to the other disks. At least with ZFS, you know exactly what was corrupted and can fix it, with traditional RAID you won't know it happened at all until you one day notice a corrupted file when you go to use it.
RAID-Z1 is better than traditional RAID-5 in pretty much every conceivable dimension, it just doesn't hide problems from you.
I have encountered this literal scenario where someone ran ZFS on top of a RAID-6(don't do this, use Z2 instead). Two failed drives, RAID-6 rebuilt and said everything was 100% good to go. A ZFS scrub revealed a few hundred corrupted files across 50TB of data. Overwrote the corrupted files from backups, re-scrubbed, file system was now clean.
ZFS automatically self-heals an inconsistent array (for example if one mirrored drive does not agree with the other, or if a parity drive disagrees with the data stripe.)
ZFS does not suffer data loss if you "suffer a total disk failure."
I have no idea where you're getting any of this from.
If the data on disk (with no redundant copies) is bad, you’ve (usually) lost data with ZFS. It isn’t ZFS’s fault, it’s the nature of the game.
The poster built a (non redundant) zfs pool on top of a hardware raid6 device. The underlying hardware device had some failed drives, and when rebuilt, some of the underlying data was lost.
ZFS helped by detecting it instead of letting the bad data though like would normally have happened.
The parity cannot be used in the degraded scenario that was under discussion.
See eg here where the increasing disk size vs specified unrecoverable read error rate is explored in relation to the question at hand: https://queue.acm.org/detail.cfm?id=1670144 (in the article Adam Leventhal from Sun, the makers of ZFS, talks about the need for triple parity).
Also, the conclusion "ensure your backups are really working" is an important point irrespective of this question, since you'll also risk losing data due to buggy software, human errors, ransomware, etc.
You're not missing anything. They're completely wrong.
In RAID-Z, you can lose one drive or have one drive with 'bit rot' (corruption of either the parity or data) and ZFS will still be able to return valid data (and in the case of bit rot, self-heal. ZFS "plays out" both scenarios, checking against the separate file checksum. If trusting one drive over another yields a valid checksum, it overwrites the untrusted drive's data.)
Regular RAID controllers cannot resolve a situation where on-disk data doesn't match parity because there's no way to tell which is correct: the data or parity.
The situation I laid out was a degraded Z1 array with the total loss of a single disk(not recognized at all by the system), plus bitrot on at least one remaining disk during resilver. Pairity is gone, you have checksum to tell you that the read was invalid, but even multiple re-reads don't give valid checksum.
How does Z1 recover the data in this case other than alerting you of which files it cannot repair so that you can overwrite them?
Why do you have bitrot to begin with? That's what scheduled scrubbing is for. You could of course by very unlucky and have a drive fail and the other get corruption on the same day, but check how often you find issues with scrubbing and tell me how likely that scenario is.
I've had hundreds of drives in hundreds of terabytes of appliances over years. URE and resilver is a common occurrence, as in every monthly scrub across 200+ drives. This isn't 200 drives in a single array, this is over 4 appliances geographically distributed.
The drives have been champs overall, they're approaching an average runtime of about 8 years. During that 8 years we've lost about 20% of the drives in various ways.
It is almost guaranteed that when a drive fails, another drive will have a URE during the resilver process. This is a non-issue as we run RAID-Z3 with multiple online hotspares.
We could do weekly. The volume of data is large enough that even sequential scrubbing when idle is about an 18 hour operation. As it is, we're happy with monthly scrubbing on the Z3 arrays. We don't bother pulling drives until they run out of reallocatable sectors, this extends the service lifetime by a year in most cases.
I intentionally provisioned one of the long term archive only appliances with 12 hot spares. This was to prevent the need for a site visit again before we lifecycle the appliance. Currently down to seven hot spares.
That replacement will probably happen later this year. Should reduce the colo cost by power requirement reduction enough that the replacement 200TB appliance pays for itself in 18 months.
- "RAID is not a backup" primarily because "you could rm -rf". ZFS snapshots cover that failure mode to the same extent that synchronization with offsite does, but cheaper. ZFS snapshots obviously don't cover other failure modes like natural disasters or a break in, so RAID is still not a backup.
- for ZIL to do its work properly, you need the disks not to lie when they claim that the data has been truly saved. This can be tricky to check, so perhaps think about a UPS
- if you have two M.2 slots you could use them to mirror two partitions from two different disks for your data pool's SLOG. The same could be done to form a new mirrored ZFS pool for the OS. In my case I even prefer the performance that a single-copy SLOG gives me at the risk of losing the most recent data before it's moved from the SLOG to the pool.
> - "RAID is not a backup" primarily because "you could rm -rf".
or your house could burn down
or somebody could steal the computer while you're away on vacation
or lightning could strike your electrical grid service entrance or a nearby pole/transformer, causing catastrophic damage
or your house could flood
lots of other things.. if you really have important data it's important to plan to for the total destruction of the storage media and server holding it.
> - "RAID is not a backup" primarily because "you could rm -rf". ZFS snapshots cover that failure mode to the same extent that synchronization with offsite does
Not really. You need to be synchronizing to a _write-only_ backup archive. A local ZFS snapshot can be deleted locally.
> Performance topped out at 111 MiB/s (931 Mbps), which is suspiciously close to 1 Gbps.
That's because of overhead in TCP over IPv4. You're testing the payload throughput, not the physical throughput. The theoretical maximum performance without jumbo frames is around 95%.
Some suggestions for anyone else looking to do the same:
i3 runs a bit cooler than ryzen, still 8 threads. 8tb WD blues (they're SMR at 8 and up). You can find Atx boards with 8 sata ports and dual nvme slots for caching / fast pools.
I'd be really careful about SMRs in a RAID. You can end up with no end of performance issues. It's all the downsides of a single SMR drive multiplied by however many drives are in the pools.
You actually have to be careful. There's disk sizes where the manufacturer will do say 8TB CMR and essentially the same drive with different firmware as a 10TB SMR. They'll also have a 10TB CMR model. You have to pay close attention to the model numbers. It's even more of a crapshoot if you shuck drives. You have to carefully research what externals are known to have CMRs.
I have a couple of home built TrueNAS systems for many years (since FreeNAS, ~ 10 years ago), here is some feedback:
- with same disk size, but just 3 disks, I get around 240 MB/sec read speed for large files (with 10 Gbps NIC). I guess the biggest difference is the CPU power, your NAS seems very slow. On 1 Gbps NIC I get 120 MB/sec transfer speed. My system is even virtualized, on bare metal may be a little bit faster.
- you cannot expand your pool, if you add one more disk there is no way to cleanly migrate to a 5 disk raidz1. There is some new development that kind of does something, but it is not what is needed
- unless esthetics is a big deal for you, there are still $30 cases around. The extra $70 can be used for something else *
- * with a small percentage cost increase, an investment in CPU and RAM can give you the capability to run some VMs on that hardware, so that CPU will not sit at idle 99.9% of the time and be underpowered when you do use it. Using a dedicated computer just for a NAS is not very cost and power efficient, but if you group multiple functionalities it becomes a great tool. For example I run 3-4 VMs at all times, up to ~ 12 when I need it.
- that motherboard and the comparison to a B450 is wrong. The MB restricts you to 4 SATA, while the B450 I bought for ~ $120 has 6 SATA ports
- TrueNAS does not *require* a HBA firmware change, that is needed if you want to convert a RAID controller to plain HBA mode or with certain old HBA that need newer firmware. However for your setup a HBA is not needed. If you want to add many disks and have a good performance (like more than 500-1000 MB/sec) then you need the HBA
- your math is wrong. You calculate available space using ~ 3.8TB disks and divide to 4 TB. The 4TB disks don't have 4TB, but 4x10^12 bytes, so the percentages in your table are exactly 80%, 60% and 40%.
- that CPU does not work with 32GB DIMMs. This works only with newer Ryzen generations, not with Zen+ in this CPU.
- GPU is not missing. TrueNAS does not render anything on a GPU, there is no need for one. I did ran TrueNAS for a couple of years on a computer with no video capability at all (a Ryzen 2700) without any problem, I just used a GPU for the initial installation and then removed it.
- unless you store a database for a SQL server or similar, there is no benefit in a SLOG; it is not a tiered cache, so it does not speed up file transfers in any way. You can have a disk dedicated as a read cache, but the cache content is currently wiped at every restart (a documented limitation) and not needed if you don't want very good performance with small files over the network
Just wanted to share my appreciation for not just this post but all your work in recent times! Been following your trail since your post about Google promos and the set of useful projects you've been working on since then.
The slog is only used for synchronous writes, which most writes are not (as I understand it). Most workloads (ie non-db server) won't see much improvement with one.
I would recommend anyone building a home NAS like this in 2022 to look into buying some slightly older 10GbE network interfaces on ebay (an intel X520-DA2 with 2 x 10Gbps SFP+ ports can be found for $55) as a PCI-E card. It's not hard to exceed the transfer ability of an ordinary 1000BaseT port to a home switch these days.
And if you have just a few powerful workstation desktop PCs it's also worth it to connect them at 10GbE to a new switch.
here's a fairly typical one. these have excellent freebsd and linux kernel driver support.
Or go a little further and spend ~$200 on a used passive FDR Infiniband switch, $50 per used dual port FDR IB NIC, $40 for a 10 meter fibre optic cable including transceivers at each end.
Then run IP over IB on each host and you have a 56 Gbit network that all your applications will just see as another network interface on each host.
for home use I'd highly recommend sticking with just 10GbE because you're not locking yourself into a dead-end solution of used weird previous gen infiniband stuff.
if you get a $200 switch with a few 10GbE interfaces in it you can easily expand things in the future by trunking vlans to another newer 10GbE capable switch, or connecting to a switch that has multi-gig copper ports for access ports to 2.5/5GBaseT capable desktop PCs and laptops, etc.
$40 for a 10 meter fiber optic cable is a high price when you can buy LC-LC UPC 9/125 duplex 2 meter cables for $3.50 to $4.70 a piece (or a few cents more for additional meters) and connect them between $20 transceivers. no matter what route someone goes with would recommend buying $30-40 of basic fiber connector cleaning supplies.
if one wants to buy used weird previous gen dead-end stuff there are also tons of very cheap 40GbE mellanox ethernet adapters on ebay with the QSFP to go with them, and if you have a place to put a switch that doesn't matter if it's noisy like in a wiring closet somewhere, cheap 1U switches with 40GbE ethernet ports on them that can also be used as individual 10GbE when broken out.
Can you share what your experience with implementing IPoIB with used gear was? I'm asking mainly because I actually got interested recently with such setups however I got rather discouraged by the driver support.
It seems that some decent support only exists for more recent generations. The older ones like ConnectX-3 or earlier, which typically show up on ebay are either not supported any more or maybe available for older kernel versions and soon to be EOLed.
So do I understand it correctly that to use such adapters one has to actually downgrade to an older kernel version?
Or is there some basic support in the latest Linux kernels for older generations still?
Yes, if you want to use the officially supported driver for ConnectX-3 (mlx4_xxx kernel modules, LTS release of v4.9 available from Nvidia's page), you need to go with something like Ubuntu 20.04 LTS (which should be good until at least end of 2025). However, the latest Mellanox drivers (mlx5_xxx kernel modules) work just fine with the ConnectX-3, at least for basic functionality.
I've not actually used IPoIB on such gear myself, but we have been working quite a bit on reusing old/ancient HPC clusters with IB adapters, and you can generally make things work if you spend enough time on trial and error and you are not afraid of compiling code with complicated dependencies. As long as you can get the IB stuff talking, and the driver is using OFED, the IPoIB part should Just Work.
It is always going to be an adventure working with used gear. But HPC has such a high decommissioning tempo and low resale value that there will always be quite a few other enthusiasts toying about.
I bought brand new 10 Gbps NICs ($47/pcs) and switch ($120) for less than that and replacements are readily available. The 20m AOC cables were indeed $40.
A NAS home build of this size will never exceed 10 Gbps, I barely get ~ 2 Gbps out of the spinning disks.
the main application where one will see real 10Gbps speeds is if the NAS also has a fast NVME SSD used for smaller file size needs at high speeds...
for instance I have a setup which is meant for working with uncompressed raw yuv420 or yuv422p 1080p and 4K video, there's a 512GB NVME SSD and a 1TB SSD set up as individual JBOD and exposed to the network for video editing scratch file storage, and it will definitely saturate 10GbE.
this is actually needlessly complicated, if/when I build a more powerful desktop pc again I'm just going to put the same work file space storage on a 2TB NVME SSD directly stuck into the motherboard.
Very valid point, but I don't know how fast you will hit the chipset speed limits; that NIC is connected to the chipset that is connected to the CPU via a 4x PCIe link and from there to the nVME with 4x PCIe link. In theory you have 4 or 8 GB/sec max bandwidth, but the CPU-chipset link is not dedicated. If you go for Threadripper the math is very different, you have lots of direct CPU connections for (multiple) NIC and multiple nVME.
you may be referencing the original post's cpu/motherboard combo which is not the same as the pci-e bus/lanes, slot and setup on my home file server and desktop PC.
Thanks for including energy usage in the article. I carry USB-C SSDs around the house for backups and storage of archived files. Of course this is a bit of a hassle and I played with the idea of either buying or building a NAS. My current standby consumption for all computer stuff (access points, router, switches, a dozen or so microcontrollers and various SmartHome stuff, but not TVs, running computers or gaming consoles) is already above 100w and I would really like to bring this number down. An extra 30-60w makes it really hard to justify the purchase of a NAS ( that I don’t really need). I thought at least the synonogies would use way less power when not in use, so thanks for making me aware of this.
Yeah, I've never thought much about power consumption, but I've done a few write-ups of previous builds, and I received a lot of questions about power draw, so I decided to measure it on this one. I was surprised at how much power the system consumed, and it will be something I think about more up-front on future builds.
If you are not after speed, then you can do a redundant array of cheap nodes. Instead of using raid, just shove in an 8-12tb disk in a number of thin clients.
The key is that they spend most of the time turned off.
I agree and plan to buy a Kill A Watt P4460 meter. My HPE Gen 9 servers were free, but I still would like to know the operating cost of a single server.
New drives are around $16-$20/TB depending on if you catch a sale and are willing to shuck. You can pick up used SAS drives in 3-4TB capacity for around $5/TB. I'm a crazy person, so I built this to hold 11 3TB SAS drives:
I used a Supermicro MB and ECC RAM. It's not much more expensive and it's nice having IPMI. I personally think it's crazy to forego ECC. The SAS controller, expander, and drives were used, everything else was new. Prices have gone up. The new parts were $638 at the time. The drives were ~$20/ea. The HBA and expander were ~$85 for both. After the fans, cables, extra drive cage and brackets, total cost was around $1K. This hardware is all supported out-of-the-box by TrueNAS. I haven't done the math to figure out when the cost of running this will exceed having purchased higher capacity drives.
This is what a typical used SAS drive looks like. 30K hours hours but very few on/off cycles. Zero defect list:
A few drives arrived with a non-zero defect list or otherwise failed burn-in. I contacted the seller on eBay and they sent me replacements w/o any fuss. I'm not necessarily recommending used SAS drives, but I'm not recommended against them either. I will recommend the serverbuilds forum for generally good advice on all this. I think this post got me started:
The catch with shucking drives from cheap NAS systems like from WD is that those drives have a strong likelihood of being SMR drives instead of CMR. Basically, they'll work fine until something goes wrong requiring, say, a RAID rebuild, and then you'll be out weeks while they rebuild begging that they don't fail in the process because the random write performance is abominable.
SMR only affects writes, so assuming you're originally fine with SMR (let's say you only write to the pool very sparingly), you can start off with SMR but then use CMR for replacements.
The pictures inside your case give me anxiety for some weird reason. That's the cleanest* inside of a PC I've seen in my life, almost artificial looking; are you building in a clean room?
I built it in my basement. Why would a newly built PC have any dust?
I'm sad you don't think much of the cable management. I think I did pretty good considering there's 11 drives, 6 fans, and I wasn't able to use custom lengths on any of the cables.
> I see people talking about snapshotting, but I haven’t found a need for it. I already have snapshots in my restic backup solution. They’re not especially convenient, but I’ve been using restic for two years, and I only recall needing to recover data from a snapshot once.
The ease at which you can revert mistakes using ZFS snapshots is much better compared to restic. You can pretty much navigate to the correct snapshot on your live filesystem and restore whatever you need to restore.
It also makes backups easier as you can just send the snapshots to the backup device (another server or external storage device).
If you setup SMB right (TrueNAS configures this out of the box, SCALE is great if you need a Linux NAS), you can use Windows Shadow Copies to access ZFS snapshots and browse or restore file contents from them.
Also possible with BTRFS. I set this up once for a small business with hourly snapshots during working hours. This way users could just restore older versions of files they accidentally deleted, overwrote or messed up in some other way.
Another benefit: Those snapshots were read-only, so they also served as a protection against ransomware.
I don't think BTRFS supports NFSv4 ACLs yet (ie, Windows ACLs are natively supported on ZFS, there is patchset so Linux also supports it but BTRFS obviously has no integration for a patchset that only exists for ZFS).
Having NFSv4 ACL access is a huge plus since you can configure permissions natively from windows and have them enforced even on the shell.
They likely use XATTRs to store the ACL (that is an option in Samba), but it's not native like it's on the TrueNAS systems with the kernel. I bet if you log into the Syno's via SSH you don't get the ACLs enforced on the shell. With the NFSv4 ACL patchseries, they would and you could benefit from the better options that the NFSv4 ACLs give you.
Storing them in metadata is not the same as having them natively.
They maintain their own kernel module to handle ACLs (synoacl_vfs) and they are indeed enforced locally as well. They can be read and modified by using the `synoacltool` cli.
Is that kernel module open source? One of the advantages of what TrueNAS did is that I can patch it into my own kernels if I'd need to. Plus being compatible to the NFSv4 ACL binary format, so it works via NFS too. Also handling Active Directory would be important there.
Not sure about the module being open-source, but running custom kernels is not really a thing on Synology. They seem to integrate SoC BSPs into kernels for each specific model, and they do not seem to port them across versions. Different models use different kernel versions in the same DSM (Synology distro) version.
The ACLs do work via NFS and it also works with Active Directory. They ship an AD implementation too, if you are interested in that (it is actually Samba in AD mode).
Well, TrueNAS Scale isn't exactly designed to be run on a SoC, it's more of a normal linux distro for NAS/SAN Servers. Hence most of it is open source and there is active upstreaming efforts. Plus the entire ACL thing is native to ZFS already, it's just the glue layer that's missing in Linux. For Linux it's presented in an XAttr for compatibility, for Solaris it's part of the proper permission bit fields.
> I chose raidz1. With only a handful of disks, the odds of two drives failing simultaneously is fairly low.
Which is not really the case of you bought x amount of the same disks and always use them together. I had that happen to me just a few months ago. 4 identical discs bought at the same time. Raidz1 reported one dead/dying disk so I replaced it and started resilvering, which can take days and leaves the disks at 100% utilization.
So after 12 hours or so a second one failed and the data was gone.
In my case even mixing up the disks might not help but I agree it’s still helpful.
I bought 4x Seagate Ironwolf Pro 12TB drives from different vendors, one failed after a year, then when I got the replacement another drive failed during the rebuild, and then 6 months later the replacement failed. Now another one of the original drives is also reporting reallocated sectors.
Same system has 4x WD Red drives which have been running fine with 0 reallocated sectors for almost 7 years.
I'm okay with claims that snapshots are much better than backups for many uses. But in this case the GP was explaining that they only used their backups once in several years, so they did not need to change their backup system.
I'm in the same boat. I configured remote backup systems on a handful of computers. I think I reached for backups only twice over the last ten years. Of course I need something, backups or snapshots, but for my use case snapshots (with a network copy) would need work to set up. And if the remote storage is worse, that would be more of a problem than the changes in the restore process.
I never do "rm prefix" in a dir. I always do "rm ./aDir/prefix" for example. This assures I'm not globbing outside a directory (or just a directory) and tries to help assure I'm not shooting my own foot.
Yea, i love up 1 directory before I delete anything.
This. I'm always going into my backup server and looking in the ".zfs/snapshots" directory to look at the history of how files have changed over the backups. Love restic, but the ZFS snapshots are fantastic.
Nice build. I recently built my second NAS (from a used R720 from ebay). The total (without disks) is pretty similar to the build documented in this article.
Having a large NAS has had an interesting (though predictable) impact on all the computers around it: Pretty much every bit of data lives on the NAS (accessed either by CIFS, NFS, or iSCSI). When I had to reinstall Windows, it was mostly painless because all my important data and Steam games library was on a remote iSCSI disk. When I replaced the drives on my linux servers, I didn't have to backup hardly anything, as I worked almost exclusivly on NFS-mounted directories. When bringing up a new raspberry pi for projects, it also has instant access to more terabytes of storage than it could ever need.
Also, for a homelab, getting 10GBe fiber between two machines is surpringly cheap and easy. For certain workloads, it can be a noticable speed boost over 1GBe.
How is performance, especially with regards to load times, if your steam library is mounted remotely?
I ask because the difference between a SSD and a hard drive can be massive in this regards, so I'd be really interested to know if the network latency is also a comparable hit.
The performance is fine. It's been years since I ran my steam library on HDD, so I don't have anything really to compare to (except my own expectations and impatience). The NAS is running 7 SSD drives in zfs raid, and exports a 1TB volume over iSCSI, via a dedicated 10GBe fiber link. Anecdotally, I will often boot into games faster than friends who have a local SSD (so I think this means that I've gotten disk perf to be "good enough" that other hardware components start to dominate things like load times)
I'm not a hardcore gamer by any means, but I really wonder how much influence drives actually have on games.
My gaming computer had an old SATA SSD (Samsung 840 Evo IIRC). Some games took ages to load (particularly Fallout 4). I switched to a much faster NVME drive, and subjectively, it's not any faster loading games. I'd say this was a very underwhelming purchase.
There's certainly diminishing returns between "an SSD" and "a faster SSD" (unless the slower one is DRAMless or QLC), but hard drive to SSD is still a big gulf
I have pretty much this existing setup. One thing I'd add is that it's quite noisy. If you have somewhere you can put it, and your house is all cat 6'ed up then great. But if like me you have it in the same room as you, you wil notice it. And it's not the pc, the fractal case has very quite 120mm fans, it's the HDDs.
Are you me? I have almost the exact same build as discussed in the post, and I am super annoyed with how loud the disks are. I have a cronjob that puts them to sleep every night(the NAS is in my bedroom)... For some weird reason they never stop spinning otherwise.
Made me chuckle. Until SSD are within the reach of a reasonable price range we're out of luck. I have resorted to leaving my NAS off and turning it on when I need it.
For reference here's the drives I have. I don't think they're by any means the loudest and nor are they the quietest. The perception of noise can be a lot to do with surroundings, if you have wooden floor instead of carpet and a whole number of factors. For me the constant whiring is too much.
It really depends on the drives you have. When I upgraded from 3TB to 4TB and 6TB models, I was very much taken aback by how much louder these drives are despite being in the same WD Red CMR product line. Some of their more recent SMR drives have really weird acoustics that always makes me think they are about to fail, but in reality thats just now they should sounds like, yikes.
Are there current JBOD products in 1U short-depth (15" for wall mounted rack) form factor, e.g. 4 x 3.5" hotswap drive bays with a mini-SAS connection to the unit? This would be useful as a backup device, or low-power NAS when connected to a Linux thin-client with LSI HBA.
There were some 1U products which included RAID support, priced around $500, which is a bit much for 1U chassis + SATA/SAS backplane + Pico power supply. 1U chassis with ~11" depth (seems to be a telco standard?) start around $100.
For ~$600, QNAP has an Arm-based 1U short-depth NAS with 2x10GbE and 2x2.5GbE networking, plus dual M.2 NVME slots. Maybe Armbian will run on that SoC, it's part of a supported family. https://www.qnap.com/en-us/product/ts-435xeu
$100 ODROID M1 SBC has an M.2 NVME slot with 4x PCIe lanes. In theory, this could be bridged to a PCI slot + LSI HBA, within a small case, as a DIY low-power NAS.
I am sure the author will appreciate ditching the proprietary Synology to go instead with a custom ZFS server, as the reliability, recoverability, and feature set of ZFS is quite frankly hard to beat. I have been using ZFS to build my custom NASs for the last... checks notes 17 years. I started back when ZFS was only available on Solaris/OpenSolaris. My builds usually have between 5 and 7 drives (raidz2).
However I do not recommend his choice of 4 x 8TB drives in a raidz1. Financially and technically it doesn't make sense. He spent $733 for 24TB usable ($30.5/TB)
He should have bought fewer, larger drives. For example 14TB drives sell for $240. So a config with 3 x 14TB in a raidz1 would total $720 for 28TB usable ($25.7/TB). Smaller costs, more storage, one less drive (= increased reliability)! It's win-win-win.
Especially given his goal and hope is in a couple years to be able to add an extra drive and reshape the raidz1 to gain usable space, then a 14TB drive then will be significantly cheaper per TB than an 8TB drive (today they are about the same cost per TB).
Actually, with only 8.5TB of data to store presently, if I were him I would probably go one step further and go with a simple zfs mirror of 2 x 18TB drives. At $320 per drive that's only $640 total for 18TB usable ($35.6/TB). It's a slightly higher cost per TB (+17%), but the reliability is much improved as we have only 2 drives instead of 4, so totally worth it in my eyes. And bonus: in a few years he can swap them out with 2 bigger-capacity drives, and ZFS already supports resizing mirrors.
Where? Also, is it worthwhile to buy hard drives explicitly for NAS when you're using ZFS? For example, Seagate has the IronWolf product line explicitly for NAS and cost more.
Drives branded for NAS applications differ slightly from mainstream drives. For example Seagate claims the IronWolf is "designed to reduce vibration, accelerate error recovery and control power consumption" which essentially means the drive head actuators will be operated more gently (reduced vibration) which slightly increases latency and slightly reduces power consumption, and also the firmware is configured so that it does fewer retries on I/O errors, so the disk commands time out more quickly in order to pass the error more quickly to the RAID/ZFS layer (why wait a minute of hardware retries when the RAID can just rebuild the sector from parity or mirror disks.) IMHO for home use, none of this is important. Vibration is only an issue in flimsy chassis, or extreme situations like dozens of disks packed tightly together, or extreme noise as found in a dense data center (see the video of a Sun employee shouting at a server). And whether you have to wait a few seconds vs a few minutes for an I/O operation to timeout when a disk starts failing is completely unimportant in a non-business critical environment like a a home NAS.
A 500W PSU won't necessarily draw more than a 250W PSU, that is merely its maximum sustained load (what the rest of the system asks for) rating. The Bronze 80+ rating is likely part of the problem here, that indicates what the power draw from the wall is compared to what is being provided to your system. Titanium 80+ would net you about 10% reduction in wall power usage. Keep in mind that manufacturers play fast and loose with the certification process and a consumer unit may not actually be what it says on the box, you need to rely on quantitative reviews.
Other than that, spend some time in the firmware settings. Powtop also does a great job at shaving off some watts.
"A 500W PSU won't necessarily draw more than a 250W PSU"
Mostly true, but not exactly. Most computer PSUs are more efficient when operating around 50% of their rated load. So if a computer consumes 125W internally, a 250W PSU would translate to lower power consumption measured at the wall than a 500W PSU, typically by about 2-5%.
For example see the chart https://www.sunpower-uk.com/files/2014/07/What-is-Effciency.... (115 VAC input) : 88% efficiency at 25% load, vs 90.5% efficiency at 50% load. In practice if the consumption is 125W at the PSU's DC output, this translates respectively to 142W vs 138W measured at the wall.
This 2-5% difference may not seem much, but it's similar to upgrading 1 or 2 levels in the 80 PLUS ratings (Bronze, to Silver, to Gold, to Platinum, to Titanium).
Switch-mode PSUs are very inefficient at the low end of their duty cycle.
A 250W 80-bronze PSU for a 60W load will be operating at 25% capacity and 82% efficiency or better.
A 500W 80-titanium PSU at 60W will be at around 12% and 90% efficiency or better.
So, an 8% difference in minimum required efficiency...for a huge increase in cost.
It's much better to buy a high "tier" PSU (for reliability and safety), sized so that it spends most of its time at or above 20% duty cycle (which in OP's case would indeed be 250W.)
80-gold is very common in the marketplace and where most people should probably be buying.
For building NAS, there's a myth that you need good enough PSU to spin up all disks at once (unless you have delay mechanism). I wonder is it still matter now.
> I chose raidz1. With only a handful of disks, the odds of two drives failing simultaneously is fairly low.
Only if you buy different hard drives or at least from different production batches. I had a lot of trouble on the same premise and I won't make that mistake again.
Edit: He mentioned it though ( a bit later in the article)
> The problem is that disks aren’t statistically independent. If one disk fails, its neighbor has a substantially higher risk of dying. This is especially true if the disks are the same model, from the same manufacturing batch, and processed the same workloads. Given this, I did what I could to reduce the risk of concurrent disk failures.
> I chose two different models of disk from two different manufacturers. To reduce the chances of getting disks from the same manufacturing batch, I bought them from different vendors. I can’t say how much this matters, but it didn’t increase costs significantly, so why not?
> I had a lot of trouble on the same premise and I won't make that mistake again.
Please elaborate, I'd love to hear your story!
I hear a lot of advice around raid/z-levels and it often seems backed up by shaky math that doesn't seem to be backed up by reality (like the blog posts that claim that a rebuild of an array of 8 TB drives will absolutely have hard read errors, no exceptions, and yet monthly ZFS scrubs pass with flying colors?)
Yes but there’s a price/reliability/performance trade off.
Also, with disks that big failures become qualitatively different. For example, when a disk fails in a mirror, the bigger the disk size the higher the chance the 2nd disk will have unreadable blocks.
> With only a handful of disks, the odds of two drives failing simultaneously is fairly low.
The problem is when the second drive fails while you’re recomputing the parity after having replaced the first faulty drive, a process which may stress the discs more/differently than regular operation, and also tends to take some time. Raidz 2 (or Raid 6) helps to provide some redundancy during that process. Otherwise you don’t have any until the Raid has been rebuilt.
Building your own NAS Server by hand may be a nice project, but if you would like to get something up and running quickly, you should consider prebuild servers like Dell T* series or HP Microserver. It is real server hardware, supporting ECC RAM and by far less work to build, often providing (semi-)professional remote management.
If you plan to build a budget NAS and enough room is not a problem, I personally would recommend to get an old and used Dell T20 Xeon E3-1225v3 with min. 16GB ECC DDR3 RAM, 2x10TB Seagate Exos ZFS RAID and a bootable USB Stick with TrueNAS or if you prefer Linux, TrueNAS Scale / OpenMediaVault.
If room IS a problem, you could get a HP Microserver Gen8 or higher with a Xeon the config above.
- Server Cost: 150 Bucks
- Total Cost: 650 Bucks (150 for server, 500 for HDD)
- Power Consumption: 33W Idle, 60W Heavy File Transfer
- Silent enough without modding the fans
- Ethernet-Transfer-Speed: 110MB/s on 30% System Load
I do not own a 10Gbit ethernet card, but I'm pretty sure, transfer speeds with 10Gbit would be acceptable, too.
Background about my situation: I've only ever used laptops and Raspberry Pis in my life. Currently, my home server is a couple of RPis but I've outgrown them and need to upgrade.
Point being, I've never built a server from scratch or even opened one up. I am planning to go with this for now: https://pcpartpicker.com/list/bzYZcb
What you are suggesting seems a better way for me to go about it. Can you share some links from Ebay or Amazon for the kind of servers I could get? Also, will these servers contain everything I need to get up and running (with the exception of hard drives, of course)?
Well, the question is, what are your real needs and what part of you is asking for MORE POWER? :-)
I personally think, that the Dell T20 is more than enough for a ONLY NAS Server and even for a small homelab.
You could go for something similar to this (ebay) but I had MUCH more success in looking at "garage sale" websites (I don't know, what sites to use in the US):
> Well, the question is, what are your real needs and what part of you is asking for MORE POWER? :-)
Ha, should have mentioned that before! I want to run a few services at home, so need more than 2 GB RAM. I'll also create a NAS/Nextcloud to store "family" data (documents, pictures etc.) but nothing "data hoarder" level. So, 4 TB would easily last us a decade or more.
Do you still think I should go for such enterprise servers? (Noise, complexity etc.) Or should I just get a used desktop/laptop?
what would you recommend for power efficient hdds (when idle)?
I have recently built a truenas box with 2x4TB ssd's. But I think I will want to expand it. Currently it runs at 14W idle. If I add 2hdd I expect it to increase to 40W(?). How can I optimize this?
To answer your question: You could buy WD Red, to save a little noise and power.
BUT:
You should always consider, that optimizing Power Consumption would mean to buy new hardware. I hardly ever do this. Buying new hardware in many cases is vastly more expensive than keeping your old hw, even if it consumes more power.
I'll try to give you an example:
- My Dell T20 consumes 33W Idle
- There is HW out there, that consumes 14W - diff (19W)
- Let's say my Dell costs 150$ vs 600$ (for new hardware) - diff: 450$
- How long can I run my Dell T20 for 450$, without investing in new hardware
Calculation:
- Dell additional power per year: 0,019kW * 24h * 365 = 166kWh
- Estimated cost per year (in germany): 166kWh * 0,30$ = 49,80$
- Means I can run it about 9 years to the break even point
Advice: Buy a newer cheap used server consuming 12W in 5 years, and you save a ton of money. AND buying used hardware is better for the environment...
I have to say, I'm put off by the power benchmark. I have a way older system with way more stuff (6 3.5 inch HDD, 2 2.5 inch HDD, 2.5 inch SSD, 8-port SAS controller, Intel NIC card) and it idles (all drives spun down) at ~30 watts.
When I first bought the system over 10 years ago, ZFS on Linux wasn't really a thing, so I used FreeBSD. I later switched and with the switch came substantial power savings.
Oh, that's interesting. TrueNAS is available on Debian now, so I wonder if there would be a big drop in power consumption.
Lawrence Systems just ran benchmarks[0] between TrueNAS Core (FreeBSD) and TrueNAS Scale (Debian), but they didn't include power consumption, unfortunately.
I've gone the TrueNAS route, but I'm running it on a QNAP TS-451. I'm running TrueNAS off of a USB stick hanging off the back, so I didn't have do to anything with the hardware and reverting back to QTS is just a matter of setting the boot order in the BIOS.
I really like seeing other people's builds, but I know that building my own computer isn't something I want to do. I was happy to see the comparison between the DIY model and the roughly-equivalent commercial units. I'll likely buy another QNAP (to run TrueNAS on) when the time comes, and the comparison tells me that I won't get screwed too badly by doing so.
>I've gone the TrueNAS route, but I'm running it on a QNAP TS-451. I'm running TrueNAS off of a USB stick hanging off the back
Oh, I didn't realize that QNAP allows that. Synology makes it pretty hard to boot any other OS, and I assumed the other vendors were similar. I'll keep that in mind for my next build because I do have fun building servers, but I also really appreciate systems like Synology and QNAP where the hardware and case is optimized for the NAS use-case.
I think using 1 disk redundancy is a mistake. It’s not only physical failure you’re worried about, it’s an error upon rebuild when you lose a drive. Bit rot on your remaining drives can occur which wouldn’t be detected until rebuild time when you lose a drive, and that could cause you to lose your entire volume. Bit rot can be checked for but you can’t always be sure and with larger and larger sets of data it gets slower to do.
I use raid 6 and also backup my data externally to another nas as well as backup to a static usb drive. Backup requires multiple different types since failures are so catastrophic and can occur in ways you don’t expect.
My conclusion from this was that the Synology is actually excellent value, and a newer one would likely have been superior on all dimensions (including time spent).
Right, apart from the entertainment/hobby value I'm not sure I understand these guides. It might be cheaper to build but in the end what you pay for is the software and not spending your time to configure it.
At some point I wanted to go the TrueNAS / FreeNAS / OwnCloud etc. route but after seeing the pages upon pages of troubleshooting and lost data horror stories I stuck with a commercial solution.
It’s hard to beat synology: small form factor, low power, quiet, excellent DSM software, web interface for file browsing, expandable array, a lot of apps including mobile apps for photo backup and backup apps, etc.
But Synology doesn’t use ZFS, which is a better filesystem than btrfs. In particular ZFS offers native encryption (instead of the clunky ecryptfs in synology), and allows ZFS send from Linux servers.
I just use pCloud mounted as a network drive and throw everything I am not currently working on on it. With 10Gbps I have at home, it works wonders.
Plus, the storage is unlimited. Plus, it is more resistant to failures and disaster than anything home made. Plus, I don't have to store and take care of another noisy box in my home.
What area of the world do you live where you get 10 Gbps to the Internet? Can you reliably get 10 Gbps transfers to pCloud?
I got 1 Gbps fiber in the last year, but it's more like 700-800 Mbps in practice. I consider myself lucky to even get that, as my experience before that has always been 100-200 Mbps even on a "1 Gbps" plan. I'm super jealous of people who get a full 10 Gbps Internet connection.
> While I obviously don’t want my server to corrupt my data in RAM, I’ve also been using computers for the past 30 years without ECC RAM, and I’ve never noticed data corruption.
You never noticed also exactly because you can’t know about data corruption if you don’t run with ECC memory.
Precisely. My trust in a non-ECC machine is much lower. I've had slow on-disk data corruption that I've tracked down to failing memory in retail-grade hardware, and it's been in a collection of large media files being sync'ed between machines. Those corruptions tend to be un-noticed at the time, and are effectively impossible (without original sources) to resolve later on, but happily also tend to have minimal impact.
Either way, I've been using computers longer than TFA and wouldn't build a home lab box without ECC.
> I purchased the same model of disk from two different vendors to decrease the chances of getting two disks from the same manufacturing batch.
I prefer mixing brands/models instead. Two vendors _might_ get you a different batch, but you could be choosing a bad model. I ended up building mine from three different WD models and two Seagate ones. I'm paranoid and run with two spares.
About 7 years ago there was an Amazon sale ($300) on Lenovo TS140 towers with the low-powered Xeon chip and ECC RAM and 4 drive bays. Ever since I've been unable to find a similar price point for the same quality, but wanted a backup server. I recently got a Raspberry Pi 4 (8GB model) and external USB hard drive (8TB) mirrored with a s3backer volume on backblaze B2 for about $300 total, and as a backup server it's fast enough (performance limited by the Internet speed to B2) and probably idles at 10W-15W.
One of the nice benefits of ZFS native encryption + s3backer is that if I had a total outage locally and needed to recover some files quickly I could mount the s3backer-based zpool from any machine, decrypt the dataset, and pull the individual files out of a filesystem. It's also a weird situation with cloud providers that convenient network-attached block storage is ~10X the price of object storage at the moment but performance can be similar using s3backer.
Appreciate the insights on your S3 backup solution.
I will mention that I am one of those folks with a TS140. Love that it's a power sipper. I maxed out the processor and memory, as well as loading it up with two 10 TB rust disks and two 512 GB SSDs.
As far as I can tell, building a similar system using NVMe is vastly more complicated. If you can fit everything into M.2 slots, it’s easy. Otherwise you need barely-standardized PCIe cards, backplanes, connectors ordered off a menu of enterprise things with bizarre model numbers, and possibly “RAID” cards even though you don’t actually want hardware RAID.
22TB of NVMe drives is going to be a bit more expensive than the system in the article however.
I do wonder what the power consumption figures would be though. His system was drawing an annoyingly large amount of power and I suspect that was mostly those HDDs.
Nowadays some recent consumer platforms supports PCIe x16 to x4/x4/x4/x4 bifurcation that should be enough. For RAID, switching (dynamically shares bandwidth between SSDs) is nonsense, isn't it?
I also run a home NAS in a Node 304. I went with a supermicro mainboard for ECC support which means I had to swap the three fans that come with the case because the mainboard only supports PWM fans. Non-PWM fans would only spin at full speed otherwise.
Regarding the SLOG device, you probably don't need it for a file server, but if you do you can definitely free a drive bay for an HDD by just using double sided tape somewhere like on the PSU. I'm sure it's also possible to put three more HDDs above the CPU, right in front of the exhaust fan. If I had a 3D printer I would try to build something bringing the total to nine HDDs.
If you need more SATA ports but are running out of PCIe slots, you may be able to reuse an empty M.2 slot. An M.2 with two lanes of PCIe 3 gives you 5 SATA ports with an adapter[0].
I occasionally look at the PCIe-to-SATA market and find it confusing. It appears polarized into either cards from a reputable brand and very expensive, even with few SATA ports, or cards from an unknown brand and relatively affordable. What's your experience with this and what can you recommend (2-port or 4-port)? Are the cheap cards safe & reliable or are they to be avoided?
Basically: they can be buggy. Either they work fine for you, or you hit an edge case and have trouble. They can also have worse performance as they only have one SATA controller and split it amongst the drives.
If you're doing this at home, you can get used enterprise gear on eBay (like an LSI SAS HBA) for the same price or cheaper than brand-new consumer gear, and it will probably still be more reliable (I built a 130 TB NAS for my friend's video production business and literally everything aside from the drives and the cables was bought used on online auction, and it's been humming along fine for a while now - the only part that was bad was one stick of RAM, but the ECC errors told me that before I even got around to running my tests on the sticks)
I've been running older SAS cards for years and it's been doing just fine. They go for cheap on eBay. Each SAS port serve four SATA drives, using SAS-to-SATA cables.
Just make sure to get one that runs in IT mode or you have to mess with the firmware.
> Just make sure to get one that runs in IT mode or you have to mess with the firmware.
In case some people wonder what "IT mode" is, as I used to some years ago, what you basically want is a card that will expose the drives directly to the OS, as opposed to "volumes".
In other terms, if the card is a RAID controller, it may insist on you creating arrays and only expose those. You can circumvent it by creating single-drive arrays, but it's a pain.
Some cards can do both, but it's usually not advertised. Non-RAID cards also tend to be cheaper. Others (usually LSI) can be flashed with a non-RAID firmware, but again, it's less of a hassle to not have to do it.
I would advise using ZFS on Linux over ZFS on FreeBSD. You may find this somewhat surprising if you know my post history being a major FreeBSD advocate, but I have run into a somewhat surprising and persistent (and known, but not to me when I started building) issue with FreeBSD's USB Mass Storage support. This issue does not happen on Linux. This is among several issues I noted which affected my ability to make a budget-friendly homelab NAS.
Since you are using an M.2 drive rather than a USB drive for your boot drive, you are not affected by the issue that affected me. But I've reached a point where I would not trust FreeBSD to not have weird and esoteric hardware issues that could affect performance or reliability for storage. I'd recommend using ZFS on Linux (Note, I still use FreeBSD as my primary OS for my personal laptop).
Not used ECC is not a good tradeoff in my experience. The only ZFS corruption I have experienced was direct attached ZFS on a Mac with memory errors undetected by Apple's diagnostics.
I set up my most recent NAS using a TerraMaster unit. It's basically a nifty case (9x5x5 inches) around a low power Intel board with a USB stick for a boot device (which I replaced with a mini 100gb USB SSD).
I don't know and don't care about TerraMaster's software (it might be awesome - I have no idea). I just rolled my own NixOS install with ZFS so that I could have a deterministic installation (I've heard good things about the TrueNAS OS as well, but I'm a control freak and like being able to rebuild the entire server with a single command and a config file, so I stick with NixOS).
The nice thing is that I essentially got a motherboard, CPU, PSU, and compact case for $350 (for the F2-422). All I had to do was upgrade the RAM (SO-DIMM) and add the drives.
I've long since reduced to only two drives for my NAS. At one point I was up to 7 drives before I realized my madness. It's cheap enough to get the storage I need with two mirrored drives, is quieter and uses less energy (I can keep it in the same room), and when I finally outgrow them in 5 years or so, the old drives will be re-purposed as backup via an external USB enclosure I keep around.
I don't know, but if I could install NixOS without difficulty, it should be possible. I installed Ubuntu server on it at first and that also worked fine. No tweaking necessary at all. You just flash the standard x64 installer on a USB stick, plug it in, and install like you would on any PC (because it basically is a PC - it even has a working HDMI port).
>If you’re new to the homelab world or have no experience building PCs, I recommend that you don’t build your own NAS.
>Before building this system, I had zero experience with ZFS, so I was excited to try it out.
Sorry, but this is amusing to me. ZFS on TrueNAS is probably fine, but you're building your production NAS, to replace the Synology device you've become "so dependent on". Don't become dependent on ZFS without knowing the implications!
I was facing this choice recently, and I agreed with the other tech savy person in the household that we should just use good old LVM + Btrfs. Not only does it run like a charm, but it also allowed us to switch the LV from single (during the data move) to RAID 1 and eventually to RAID 5/6 with zero issues. It will also be much easier to recover from than ZFS.
On another note, it's a bad market to buy NAS drives, especially from Seagate. Seagate Exos drives are at this point in time often cheaper than IronWolf, even non Pro IronWolf. They're slightly more noisy and don't come with the free data recovery, but otherwise they're a straight upgrade over the IronWolf drives.
My tiny NAS is still using ext4 + md raid1. It’s the third incarnation of essentially the same design (previously used raid10 when drives were smaller).
When it fills up, I delete some files rather than adding disks.
Has someone yet created open source patches for LVM + Btrfs like what Synology does to pierce the layers and use the btrfs checksums as a tie-breaker to tell lvm what disk to trust to repair errors?
I'm not sure. We use the raidintegrity option, which layers dm-integrity on top of the raid. We then run btrfs scrubs periodically to trigger reads on the device.
I have tried a few different OSes but my favorite by far is unRaid. It’s really easy to setup and maintain and it gave me a lot of really good experience with server maintenance and the whole container ecosystem. I bought a 24 drive server chassis and am slowly filling it up. Up to 80 TB now and I only have to have one extra drive for local backup (I also backup to another box that I do periodically).
Oh i've been down this road before, and i ended up with a new Synology, though i keep my lab part on a separate machine and use the Synology for storage only :)
Benchmarks are nice, but you're comparing an old NAS to a brand new one with significantly more computer power, and had you compared to a new Synology, you'd probably arrive at the same findings.
Parts of you power consumption could very likely be from your choice of hard drives. The 8TB Seagate Ironwolf has an average power consumption of 8.8W [1], where a WD Red 4TB requires at most 4.8W [2]. With 4 drives, that's an additional 16W of power consumption right there.
Another thing you could try is to enable powerd and measure if it has any effect on the power consumption. In case it does, you can enable it in TrueNAS Core by adding it as a tunable via the web UI. System -> Tunables -> Add Type = rc.conf Variable = powerd_enable Value = yes
I am a newbie with server builds. Can you share an Ebay or Amazon link that can be a good/cheap starting point? Also, will these servers contain everything I need to get up and running (with the exception of hard drives, of course)?
With drive bays you'll see: SFF (small form factors, ie 2.5" drives) and LFF (3.5")
Usually written like "12 LFF" for say a server that can hold 12 3.5" drives.
Look out for if it comes with drive trays for the bays or not. They're usually surprisingly cheap because there's a bazillion of them out there but it could be $5-$20 each.
Then there's SAS / Raid controller stuff you need to look out for in regards to compatibility with your configuration.
For example if you're going to run TrueNAS (or whatever it's called nowadays) you'll want to search "XXXXX controller TrueNas compatibility" to stumble upon a thread or some hardware list to hear what people have to say. There actually aren't too many models, or they're clones of each other with a Dell or HP branding label on it, so you're bound to find some stuff, for mainstream stuff sold on ebay anyway.
The hardware will typically be at least power-on tested, which for this kind of equipment is usually good enough, but beyond that your experience will vary.
There will be typical gremlin stuff like an unseated cable which will throw errors and have you wondering if it's a bad drive, or bad install, or whatever... but I think if you're building your own and bringing older gear back to life, you won't be able to avoid that unless you're buying a pre-packaged NAS box.
Thanks for the detailed writeup, much appreciated!
Another thing I'd like your opinion on is this: my use cases involve running a few services at home (NAS, Nextcloud, Jellyfin, Photoprism etc.) and store "family" data (documents, pictures etc.) but nothing "data hoarder" level. So, 4 TB would easily last us a decade or more.
Do you think such enterprises servers make sense for me? (Noise, complexity etc.) Or should I just get a used desktop/laptop?
You could get away with whatever machine you have lying around with enough space that does differential back ups to an external (or with something hosted somewhere else over the internet).
Well, I don’t have any machine lying around. (I do have an RPi which is acting as my home server today. But it’s only 2GB and that’s why I am trying to replace it.)
So, my options are to buy old desktop, old enterprise server or build a new desktop. The first two are similar price points on eBay, but I am not sure about the noise and power tradeoffs between the two.
Like, if a 1U server is to noisy and power-consuming compared to the desktop, probably not a good choice for my case.
The bottleneck on most home labs is the lan. A new spindled hard drive running ext3/4 or ntfs can easily saturate a 1 gig nic. That makes the complications of running an NAS on raid or zfs not worth the time and effort-- unless you like playing with file systems or flooding your lan.
I built my own nas a few years back with a multi disk zfs pool and ubuntu running mellanox 10g nics. Fast as it was, the complexity of it all, especially zfs having its own way of doing everything, made it a time hog. Just wasn't worth it.
Now I run my nas in a Debian Mint VM with consumer grade 2.5G nic adapters. Simplicity and ease of recovery are my priorities.
Nice article - though one misstatement is that ZFS dies not allow you add disks to a pool. It does [1] [2], by adding new vdevs. The linked issue is about adding support for expanding existing vdevs instead.
It's a hard pill to swallow, adding two more drives in a vdev to get one more drive worth of storage (authors case maxes out at 6 drives, currently has 4). So often you will bite the bullet and just completely rebuild.
True RAIDz expansion is something that's supposed to be coming, possibly in Q3 2022, so it may be that by the time one needs to expand a volume, that ability will have landed. That'll be a game changer.
I went down this rabbit hole about a decade ago. Spent a lot of time and money on a home lab. While it’s cool, the payoff is just not there. I switched to Google/AWS a few years ago and never looked back.
Good article. He choose a fractal design case which I really like (the company not the specific model).
I had all kinds of thermal problems with a too small case that I used for my truenas build. It would turn off without any trace in server logs (I have real server HW and therefore expected something in logs since there is a whole separate computer for this).
I changed the case from a NAS case to another fractal desfrign case with lots of space for drives and heatsink. All thermal issues disappeared.
I just wanted to warn anyone who is building to take this seriously. Some HW drives generate a lot of heat.
I'm fairly happy with my 4-bay Synology NAS. When I last looked at ZFS, it seemed that piecemeal upgrades - like, upgrade a 4TB drive to 8TB, and get more available space - wouldn't work in ZFS, but it would in SHR, at least if you had more than 2 drives.
Having scheduled checks is a good idea: I have weekly short SMART tests, monthly long SMART tests, and quarterly data scrubs.
The TinyPilot device looks nifty - it's a Raspberry Pi as a remote KVM switch. I stumbled on that last night as I was banging my head against a familial tech support issue.
Oh, my, you are right, the TinyPilot seems awesome! I see it was developed by the author of this ZFS NAS server blog post. I just ordered one to play with :)
Cool post thanks for sharing! I wanted to throw out another option, older enterprise server hardware is surprisingly cheap on ebay. https://labgopher.com/ has a nice page to help you find good deals (I am not affiliated with them). One warning these servers are outrageously loud and power hungry.
Back around 2004 there was a technology called VRRP then OpenBSD got a similar thing called CARP - a quick google suggests these still exist but i never see mention of them in my filter bubble for some reason.
I was obsessed with making an instant failover cluster. I never managed to get it working exactly how i wanted and it relied on two old UPSs with dead batteries to operate as STONITH devices (they had well supported rs232 interfaces).
I sometimes think about investigating that idea again but maybe with raspberry pis and cheap iot plugs.
If I was building the same system, I'd use 8GB of RAM and use the NVME drive for an L2ARC. I'd put the root fs into the main zfs dataset. The machine will be faster (~128GB of cache), cheaper and draw less power due to less RAM.
I'm not sure if TruNas supports this kind of config, or if they enforce booting from a separate drive. But FreeBSD itself is perfectly happy to boot from ZFS.
I use 2 ZFS mirrored 4TB drives mounted on a USB C dual bay device (iDsonix brand) for backing up my ZFS pools. I have a simple script that imports the backup pool and sends snapshots from my main ZFS pools to the backup pool.
My question: How do you safely store your physical backup drives/devices?
I have a fireproof box, but I don't think it was made for safely storing electronics in the event of a fire.
To be clear I meant the drives backing up the NAS, not the actual NAS.
I think backing up online ultimately is the safest choice, and it takes getting comfortable doing that and being okay with paying a fee. This is for data that I can't lose, like family photos, etc.
I started looking into using rclone directly from my FreeBSD NAS device. rclone seems to support many providers.
This is really cool. I've been tinkering, trying to get away from Dropbox and repurposed an old server to SMB share a disk that I occasionally rsync with another disk via ssh. I feel like it's not sufficient to protect against errors. What's a reliable, easy to maintain NAS solution for that purpose, Synology?
I have a similar setup to yours, but with more disks in the machine and a hot swap bay for offline backups.
Did the price comparison for sonology a few years ago and felt it just made more sense to build my own. It’s just the current LTS Ubuntu release and it runs plex, pihole, file sharing, cups print server and some other stuff
I chose the same case for my NAS. Main thing I did different was rather than buying a consumer board, I bought a mini-ITX Xeon-D board from supermicro which had integrated dual 10G NICs, 6x SATA, and an ASPEED IPMI for remote management. Was $400 for that board a few years ago (soldered CPU).
The number of parity drives is often fixed, so the odds of the number of failures being higher than the number of parity drives goes up as you increase drive count.
Depends on how you look at it I suppose. The lifespan of a singular disk is likely rather long, but put a dozen of them in the same place and you'll see a failure or two every few years.
Of course, we know that having a larger sample size and seeing more failures doesn't _actually_ mean that groups of disks are less reliable, but it could seem that way if you don't think too hard about it.
Packing a CPU along with many hard drives into a small box implies cooling, and therefore noise. A large-diameter and rather slow fan fixes this, bumping up the volume. This is often-neglected while designing.
What is the technical upside of using TrueNAS instead of samba? If you want to optimize for control, it seems a bit weird to me to settle for an "all in one" software stack.
I see, so I assume the upside is that it's a time saver. Thanks! I personally wen't with samba on Linux and with btrfs. I was wondering if there's something non-obvious in TrueNAS that I'm missing out on.
And to my account, I think my upsides are that:
- ability to choose the kernel
- no need for SSD for base OS since running off of RAM is
rather easy on Linux
- samba can run in a container thus a bit more control security-wise
- server may run something else as well
Of course, this comes with a lot more technical hurdles. More like a side-project than utility really. That's why I was wondering does TrueNAS provide non-obvious upsides that would be lacking in self-rolled one.
There are two flavors of TrueNAS - Core and Scale. Core is basically a FreeBSD distro and Scale is basically a Linux distro. They're both a base OS with the typical packages anyone would need for a NAS, with sane defaults + a user-friendly web-based management system.
The upsides are that it's plug-and-play for anyone who doesn't want to research all the options available and figure out the various pitfalls on their own.
> no need for SSD for base OS since running off of RAM is rather easy on Linux
I don't understand this sentence. You're running off a RAM disk with no boot drive? What if you have a power outage?
> samba can run in a container thus a bit more control security-wise
Core supports FreeBSD jails and Scale supports Docker so you could run samba in a container on either if you're willing to do set it up yourself.
> server may run something else as well
As before, both have jail/container functionality. I haven't used Scale myself but Core comes with a bunch of "click to install" jail options for stuff like Plex, ZoneMinder, etc. Our machine also runs a Windows VM (ew) and a Wordpress install in a Jail
Thanks, this is a great explanation! I wish the blog post would have described the TrueNAS like this.
> You're running off a RAM disk with no boot drive? What if you have a power outage?
Yes, the server only has the HDDs which contain the NAS data. The server bootloops until it gets an image from the router (ipxe boot). The disk images have systemd scripts which install everything from 0 on each boot. Coincidentally, this means system restart is how I upgrade my software.
> Core supports FreeBSD jails and Scale supports Docker
This clarifies the situation -- TrueNAS seems like an option that I would recommend for anyone who wants a quick OSS NAS setup.
Yes, but it’s usually not recommended, because the receiver side doesn’t verify that the data is identical to that at the sender side, and a small error could corrupt the file system.
ZFS send and receive is a good way to do it, but there is no ZFS send and btrfs receive!
Get LSI 2000 or 3000 series SATA cards. Several manufacturers make them approximately to the reference spec. The drivers are in the Linux kernel. If they don't come flashed to the IT spec firmware (no RAID capabilities), do that, but the cheap ones usually do. The 4i models sometimes come with ordinary individual SATA connectors; the 8i will have one of two kinds of combo connectors that can accept cables that go to a backplace or breakout cables to ordinary SATA connectors.
If your budget server starts with a first step of buying new hardware, I'm going to ignore your advice. A 500W psu? nope. Buy a used thinkcentre with a 250W CPU for $50-$100 and save the planet more of this e-waste.
heh, got the same case for my nas, still havent gotten an optical drive though.. but i dont see me buying one either... guess ive got a lot of coasters now
However, such a drive is getting heavily into diminishing returns territory.
e.g. a 20TB drive from Seagate is $500. A 4TB drive is $70, 8TB is $140. Getting the same spend in smaller capacity drives would give you 28TB in the 4TB drives and 24TB/32TB in the 8TB drives (for $80 under/$60 over).
Add in a second to rotate and you're spending $1000 in drives, assuming these 26TB drives replace the 20TB drives at a similar price when they trickle down to consumer hands.
You have to factor in the power usage of having multiple drives spinning. Though I’d agree that smaller drives are better when you have a drive failure, as resilvering is quicker.
I have a nightly restic backup from my main workstation to buckets on Backblaze and Wasabi. It backs up the few local folders I have on my workstation and all the files I care about on my NAS, which the workstation accesses over Samba. I've published my scripts on Github.[0]
I don't back up my Blu-Rays or DVDs, so I'm backing up <1 TB of data. The current backups are the original discs themselves, which I keep, but at this point, it would be hundreds of hours of work to re-rip them and thousands of hours of processing time to re-encode them, so I've been considering ways to back them up affordably. It's 11 TiB of data, so it's not easy to find a good host for it.
"CephFS supports asynchronous replication of snapshots to a remote CephFS file system via cephfs-mirror tool. Snapshots are synchronized by mirroring snapshot data followed by creating a snapshot with the same name (for a given directory on the remote file system) as the snapshot being synchronized." ( https://docs.ceph.com/en/latest/dev/cephfs-mirroring/ )
We found ZFS led to maintenance issues, but it was probably unrelated to the filesystem per say. i.e. culling a rack storage node is easier than fiddling with degraded raids.
Buy another, use ZFS send/receive. It's only double the price! Better yet, put it elsewhere (georedundancy). With ZFS encryption, the target system need not know about the data.
For critical data though I use Borg and a Hetzner StorageBox.
With over 25 years of large scale *nix sysadmin experience: please please please don't fall in to the trap of thinking RAID5/Z is a good idea. It almost never is.
The number 1 trap you fall in to is during rebuild after a failed drive. In order to rebuild every byte on every other drive has to be read. On massive arrays this process invariably throws up additional errors, however this time you might not have the parity data to recover it. This process continues in a snowballing situation. This problem is exacerbated by using unsuitable drives. This author seems to have chosen well, but many choose to select drives for capacity over reliability in a quest for the most TB usable possible. A few years ago there was also the scandal of the WD Red drives that were totally unsuitable for RAID usage.
And to make matters worse there is the performance impact. Writing consists of 4 operations: read, read parity, write, write partity. That gives a /4 penalty on the sum of your arrays drives IOPS.
RAID6/Z2 gives you slight relief from the above risk, however at the increased cost of an additional performance hit (a /6 penalty)
If going RAID(Z), it is generally considered best practice to go for a model that includes a mirror. There are decisions to be made whether you stripe mirrors or mirror a stripe. Personally my preference for reducing complexity and improving quick rebuild is to stripe across mirrors. So that is RAID10. You pair your drives up in mirrors, and then you stripe across those pairs. The capacity penalty is 50%. The performance penalty is close to zero.
The author also chose to skip a write buffer (ZIL) drive. This, imo, is a mistake. They are a trivial cost to add (you only require a capacity that gives you the maximum amount of data you can write to your array in 15 seconds (tunable)) and they offer a tremendous advantage. As well as gaining the benefit of SSD IOPS for your writes you also save wear on your data array by coalescing writes in to a larger chunk and buy yourself some security against power cuts etc as faster IOPS give you a reduced likelihood of coinciding with an environmental issue. And if you are especially worried you can add them as a mirrored pair.
You can also add SSDs as a cache (L2ARC) drive (I think the author missed this in their article) to speed up reads. In the case of the authors use case this would really help with things like media catalogs etc as well as buffering ahead when streaming media. The ARC in ZFS always happens, and the L1 is in RAM, but a L2ARC is very beneficial.
The author did comment on RAM for the ARC and sizing this. ZFS will basically use whatever you give it in this regard. The really heavy use case is if you turn on deduplication but that is an expensive and often unnecessary feature. (An example good use case is a VDI server)
Last tip for ZFS: turn on compression. On a modern CPU it's practically free.
> Based on Backblaze’s stats, high-quality disk drives fail at 0.5-4% per year. A 4% risk per year is a 2% chance in any given week. Two simultaneous failures would happen once every 48 years, so I should be fine, right?
Either I misunderstood or there are some typos but this math seems all kind of wrong.
A 4% risk per year (assuming failure risk is independent of disk age) is less than 0.1% by week. A 2% risk per week would be a 65% risk per year!!
2 simultaneous failures at the same week for just 2 disks (again with the huge assumption of age-independent risk) would in the order of magnitude of less than 1:10^6 , so more than 20k years(31.2 k years tbc)
Of course you either change your drives every few years so the age-independent AFR still holds or you have to model the probability of failure using some exponential distribution like Poisson's. Exercise for the reader to estimate the numbers in that case.
People hosting large private media libraries, I assume. Think people eschew music/movie streaming services and instead download FLACs/videos and stream those over their local network.
It's almost entirely because people really like technology that not only promises to reward you with excellent features and stability, it follows through on those promises.
Because people don't know about cephfs yet, and the silliness of degraded raid setups. i.e. trusting a zfs community edition in a production environment can be a painful lesson.
;-)
I'm sure it'd be painful, but let's throw "infrequent" onto your description of the lesson. :-)
I've run ZFS for home storage and work backups for ~15 years, across Nexenta, ZFS-fuse, FreeBSD, and OpenZFS, backing up hundreds of machines, and have never lost data on one of them.
I know about CephFS, but performance was abysmal compared to ZFS for a home server. On a single box with 4-8 drives I didn't come close to saturating a 10G link, which ZFS managed just fine.
It was also very complex to manage compare to ZFS, with many different layers to consider.
I'm sure it shines in a data center, for which it has been designed. But unless something radical has changed in the last year, it's not for a budget homelab NAS.
The cephfs per-machine redundancy mode is usually the preferred configuration.
i.e. usually avoids cramming everything into a single point of failure, buying specialty SAS cards, and poking at live raid arrays to do maintenance.
Seen too many people's TrueNAS/FreeNAS installs glitch up over the years to trust the zfs community edition as a sane production choice. ZFS certainly has improved, but Oracle is not generally known for their goodwill toward the opensource community. ;-)
I've never run TrueNAS/FreeNAS in proper production, but I have run it at home for over a decade and never lost data, despite generally running on old hardware, multiple drive failures, motherboards dying and power outages/lightning strikes.
Overall been very little fuzz for my home NAS system.
I wonder what he means by this. If he's referring to SHR, then it's just standard mdraid and Synology themselves have instructions on how to mount the volume in Ubuntu https://kb.synology.com/en-us/DSM/tutorial/How_can_I_recover...
edit: He later mentions encrypted volumes but those are also just using standard encryptfs https://www.impedancemismatch.io/posts/decrypt-synology-back...
This is one of the reasons I feel comfortable recommending Synology devices - there's not a lot of lock-in