ROCm. About 40%. But there is duplication there as well. Two 16GB folders contai...

homebrewer · 2025-02-19T01:04:35 1739927075

Run rmlint on it, it will replace duplicate files with reflinks (if your fs supports them — xfs and btrfs do), or hardlinks if not.

sieve · 2025-02-19T01:12:00 1739927520

Thanks! Hearing about this for the first time. Never felt the need before.

forrestthewoods · 2025-02-19T00:00:18 1739923218

Does uv have any plans for symlink/hardlink deduplication?

sieve · 2025-02-19T00:16:47 1739924207

Not sure. The simplest solution is to store all files under a hashed name and sym/hardlink on a case to case basis. But some applications tend to behave weirdly with such files. Windows has its own implementation of symlinks and hardlinks. They simply call it something else. Perhaps portability could be an issue.

mixmastamyk · 2025-02-19T01:07:38 1739927258

The article says it already hard links duplicates. But likely not able to help if you are using multiple versions of interpreter and lib.

IgorPartola · 2025-02-18T23:42:52 1739922172

Sounds like a great use case for ZFS’s deduplication at block level.

sieve · 2025-02-18T23:53:37 1739922817

I use ZFS everywhere EXCEPT on this drive. Not willing to have ZFS on the primary drive till native support lands in the kernel (so, never).

hyperbrainer · 2025-02-19T08:46:27 1739954787

Have you tried borg [0]? Also, why not BTRFS?

[0] https://borgbackup.readthedocs.io/en/stable/index.html

sieve · 2025-02-19T09:56:21 1739958981

Have been using ZFS for the past thirteen years and all my workflows including backup are based on it. It just works.

hyperbrainer · 2025-02-19T17:45:34 1739987134

Sure, I was just curious, since you mentioned not wanting to use ZFS without kernel support and BTRFS does have that. Being familiar with ZFS, I guess is a decent explanation.

sieve · 2025-02-20T02:59:38 1740020378

When the topic of backups came up last year, I talked about my current solution: https://news.ycombinator.com/item?id=41042790. Someone suggested a workaround in the form of zfsbootmenu but I decided to stick to the simple way of doing things.

22c · 2025-02-18T23:51:48 1739922708

or, you know.. symlinks

flakes · 2025-02-19T01:36:47 1739929007

Main issue with symlink is needing to choose the source of truth— one needs to be the real file, and the other point to it. You also need to make sure they have the same lifetimes to prevent dangling links.

Hardlink is somewhat better because both point to the same inode, but will also not work if the file needs different permissions or needs to be independently mutable from different locations.

Reflink hits the sweetspot where it can have different permissions, updates trigger CoW preventing confusing mutations, and all while still reducing total disk usage.

22c · 2025-02-19T02:53:50 1739933630

I don't disagree but I think some of these problems could potentially be solved by having somewhat of a birds nest of a filesystem for large blobs, eg.

/blobs/<sha256_sum>/filename.zip

and then symlinking/reflinking filename.zip to wherever it needs to be in the source tree...

It's more portable than hardlinks, solves your "source of truth" problem and has pretty wide platform support.

Platforms that don't support symlinks/reflinks could copy the files to where they need to be then delete the blob store at the end and be no worse off than they are now.

Anyway, I'm just a netizen making a drive-by comment.