Great question! Close, but not exactly. We do use a sparse file but only very briefly during the transition.
We start with 1 SSD as a single top level vdev. When you add the second SSD you choose if you want to enable FailSafe or not. If you don't enable FailSafe you can just keep adding disks and they will be added as top level vdevs. Giving you maximum read and write performance due to striping data across them. Very simple, no tricks.
However if you choose FailSafe when you add your second SSD, we then do a bit of ZFS topology surgery, but only very briefly. So you start with a ZFS pool with a single top level vdev running on your current SSD. And you just added a new unused SSD and chose to transition to FailSafe mode. First we create a sparse file sized to the exact same size as your current active SSD. Then we create an entirely new pool with a single top level raidz1 vdev backed by two disks, the new SSD, and the sparse file. The sparse file acts as a placeholder for your current active SSD in the new pool. We then immediately remove the sparse file so this new pool and dataset is degraded. We then take a snapshot of the first dataset, and sync the entire snapshot over to the new pool. The system is live and running off the old pool for this whole process.
Once the snapshot has completed we then very briefly reboot to switch to the new pool. (We have the entire OS running on a writable overlay on the ZFS dataset). This is an atomic process. Early on in the boot process, before the ZFS dataset is mounted, we take an additional snapshot of the old dataset, and do an incremental sync over to the new dataset. This is very quick and copies over any small changes since the first snapshot was created.
Once this sync has completed, the two separate pools now contain identical data. We then mount the new pool and boot up with it. Then we can destroy the old pool, and attach the old SSD to the new pool, bringing it out of degraded state. And the old SSD will be resilvered in the new pool. The user is now booted up on a two wide raidz1 dataset on the new pool with bit-for-bit identical data that they shutdown on with the single ssd dataset on the old pool.
Despite sounding a bit wacky, the transition process is actually extremely safe. Apart from the switch over to the new dataset, the entire process happens in the background with the system online and fully functional. The transition can fail at almost any point and it will gracefully roll back to the single SSD. We only nuke the old single SSD at the very last step, so either we can roll back, or they have a working raidz1 array.
It sounds bad that the raidz1 goes through a period of degradation, but there is no additional risk here over not doing the transition. They are coming from a single disk vdev that already cannot survive a single disk failure. We briefly put them through a degraded raidz1 array that can also not survive a single disk loss, (no less risky than how they were already operating), to then end up at a healthy raidz1 array that can survive a single disk loss, significantly increasing the safety in a simple and frictionless way for the user.
Using two wide raidz1 arrays also get's a bit of a kneejerk reaction but it turns out for our use case the downsides are practically negligible and the upsides are huge. Mirrors basically give you 2x read speed over two disk raidz1. And less read intensive rebuilds. Everything else is pretty much the same or the differences are negligible. It turns out those benefits don't make a meaningful difference to us. A single SSD can already far exceed the bandwidth required to fully saturate our 2.5GbE connection. The additional speed of a mirror is nice but not really that noticeable. However the absolute killer feature of raidz is raidz expansion. Once we've moved to a two disk wide raidz1 array, which is not the fastest possible 2 disk configuration, but more than fast enough for what we need, we can add extra SSDs and do online expansions to a 3 disk raidz1 array and then 4 disk raidz1 array etc. As you add more disks to the raidz1 array, you also stripe reads and writes across n-1 disks, so with 4 disks you exceed the mirror perf benefits anyway.
In theory we could start with one SSD, then migrate to a mirror with the second SSD, and then again migrate to a 3 disk raidz1 array using the sparse file trick. However it's extra complexity for negligible improvements. And when moving from the mirror to the raidz1, you then degrade the user AFTER you've told them they're running FailSafe. Which changes the transition process from a practically zero additional risk operation, to an extremely high risk operation.
Ultimately what we think this design gives us is the simplest consumer RAID implementation with the highest safety guarantees that exist today. We provide ZFS level data assurance, with Synology SHR style one-by-one disk expansion, in an extremely simple and easy to use UI.
Thanks for the thorough answer. It is a little wacky and complicated but I agree it should be safe. I'm not really in the target market for your software but the hardware does look very nice. Good luck with it.
Great question! Close, but not exactly. We do use a sparse file but only very briefly during the transition.
We start with 1 SSD as a single top level vdev. When you add the second SSD you choose if you want to enable FailSafe or not. If you don't enable FailSafe you can just keep adding disks and they will be added as top level vdevs. Giving you maximum read and write performance due to striping data across them. Very simple, no tricks.
However if you choose FailSafe when you add your second SSD, we then do a bit of ZFS topology surgery, but only very briefly. So you start with a ZFS pool with a single top level vdev running on your current SSD. And you just added a new unused SSD and chose to transition to FailSafe mode. First we create a sparse file sized to the exact same size as your current active SSD. Then we create an entirely new pool with a single top level raidz1 vdev backed by two disks, the new SSD, and the sparse file. The sparse file acts as a placeholder for your current active SSD in the new pool. We then immediately remove the sparse file so this new pool and dataset is degraded. We then take a snapshot of the first dataset, and sync the entire snapshot over to the new pool. The system is live and running off the old pool for this whole process.
Once the snapshot has completed we then very briefly reboot to switch to the new pool. (We have the entire OS running on a writable overlay on the ZFS dataset). This is an atomic process. Early on in the boot process, before the ZFS dataset is mounted, we take an additional snapshot of the old dataset, and do an incremental sync over to the new dataset. This is very quick and copies over any small changes since the first snapshot was created.
Once this sync has completed, the two separate pools now contain identical data. We then mount the new pool and boot up with it. Then we can destroy the old pool, and attach the old SSD to the new pool, bringing it out of degraded state. And the old SSD will be resilvered in the new pool. The user is now booted up on a two wide raidz1 dataset on the new pool with bit-for-bit identical data that they shutdown on with the single ssd dataset on the old pool.
Despite sounding a bit wacky, the transition process is actually extremely safe. Apart from the switch over to the new dataset, the entire process happens in the background with the system online and fully functional. The transition can fail at almost any point and it will gracefully roll back to the single SSD. We only nuke the old single SSD at the very last step, so either we can roll back, or they have a working raidz1 array.
It sounds bad that the raidz1 goes through a period of degradation, but there is no additional risk here over not doing the transition. They are coming from a single disk vdev that already cannot survive a single disk failure. We briefly put them through a degraded raidz1 array that can also not survive a single disk loss, (no less risky than how they were already operating), to then end up at a healthy raidz1 array that can survive a single disk loss, significantly increasing the safety in a simple and frictionless way for the user.
Using two wide raidz1 arrays also get's a bit of a kneejerk reaction but it turns out for our use case the downsides are practically negligible and the upsides are huge. Mirrors basically give you 2x read speed over two disk raidz1. And less read intensive rebuilds. Everything else is pretty much the same or the differences are negligible. It turns out those benefits don't make a meaningful difference to us. A single SSD can already far exceed the bandwidth required to fully saturate our 2.5GbE connection. The additional speed of a mirror is nice but not really that noticeable. However the absolute killer feature of raidz is raidz expansion. Once we've moved to a two disk wide raidz1 array, which is not the fastest possible 2 disk configuration, but more than fast enough for what we need, we can add extra SSDs and do online expansions to a 3 disk raidz1 array and then 4 disk raidz1 array etc. As you add more disks to the raidz1 array, you also stripe reads and writes across n-1 disks, so with 4 disks you exceed the mirror perf benefits anyway.
In theory we could start with one SSD, then migrate to a mirror with the second SSD, and then again migrate to a 3 disk raidz1 array using the sparse file trick. However it's extra complexity for negligible improvements. And when moving from the mirror to the raidz1, you then degrade the user AFTER you've told them they're running FailSafe. Which changes the transition process from a practically zero additional risk operation, to an extremely high risk operation.
Ultimately what we think this design gives us is the simplest consumer RAID implementation with the highest safety guarantees that exist today. We provide ZFS level data assurance, with Synology SHR style one-by-one disk expansion, in an extremely simple and easy to use UI.