I used the bash rsync-time-backup script for a while to back up datasets because they use less space than a full copy by linking to unchanged files in previous backups. Now I am using DVC that takes a git-like approach to do the same, and additionally making it easier to distribute datasets with a simple pull command.