Not sure. The simplest solution is to store all files under a hashed name and sym/hardlink on a case to case basis. But some applications tend to behave weirdly with such files. Windows has its own implementation of symlinks and hardlinks. They simply call it something else. Perhaps portability could be an issue.
Sure, I was just curious, since you mentioned not wanting to use ZFS without kernel support and BTRFS does have that. Being familiar with ZFS, I guess is a decent explanation.
When the topic of backups came up last year, I talked about my current solution: https://news.ycombinator.com/item?id=41042790. Someone suggested a workaround in the form of zfsbootmenu but I decided to stick to the simple way of doing things.
Main issue with symlink is needing to choose the source of truth— one needs to be the real file, and the other point to it. You also need to make sure they have the same lifetimes to prevent dangling links.
Hardlink is somewhat better because both point to the same inode, but will also not work if the file needs different permissions or needs to be independently mutable from different locations.
Reflink hits the sweetspot where it can have different permissions, updates trigger CoW preventing confusing mutations, and all while still reducing total disk usage.
I don't disagree but I think some of these problems could potentially be solved by having somewhat of a birds nest of a filesystem for large blobs, eg.
/blobs/<sha256_sum>/filename.zip
and then symlinking/reflinking filename.zip to wherever it needs to be in the source tree...
It's more portable than hardlinks, solves your "source of truth" problem and has pretty wide platform support.
Platforms that don't support symlinks/reflinks could copy the files to where they need to be then delete the blob store at the end and be no worse off than they are now.
Anyway, I'm just a netizen making a drive-by comment.