Hacker News new | past | comments | ask | show | jobs | submit login

For that many files I probably would've used rsync between local disks. shrug



And hopefully you would have written up a similar essay on the oddball experiences you had with rsync, which is even more stateful than cp and even more likely to have odd interactions when used outside its comfort zone.

Ditto for tricks like:(cd $src; tar cf - .) | (cd $dst; tar xf -).

Pretty much nothing is going to work in an obvious way in a regime like this. That's sort of the point of the article.


Or maybe not. He mentions rsnapshot in the article, which uses rsync under the hood. This implies rsync would have a very good chance of handling a large number of hardlinks... since it created them in the first place.


That doesn't follow. If backups are for multiple machines to a big file server, the backup machine will have a much larger set of files than those that come from an individual machine. Further, each backup "image" compares the directory for the previous backup to the current live system. Generally it looks something like this:

1. Initial backup or "full backup" - copy the full targeted filesystem to the time indexed directory of the backup machine.

2. Sequential backups:

a. on the backup machine, create a directory for the new time, create a mirror directory structure of the previous time.

b. hard link the files in the new structure to those in the previous backup (which may be links themselves, back to the last full backup.

c. rsync the files to the new backup directory. Anything that needs to be transfered results in rsync transfering the file to a new directory, the moving it into the proper place. This unlinks the filename from the previous version and replaces it with the full version.

So yeah, the result of this system over a few machines and a long-timeframe backup system is way more links on the backup machine than any iteration of the backup will ever actually use.


Yes, it has more links, I realize, but this still doesn't mean it wouldn't work. Give it a shot and report back. (Hah.)


But rsnapshot created that large number of hardlinks incrementally. Without trying it there's no indication that rsync handles that better, especially as rsync with default settings doesn't preserve hardlinks as rsync builds a quite expensive data structure for them.

I'm surprised that cp even worked. I wouldn't have had the balls to just use cp as it could just have failed.

My strategy would probably have been to cp or rsync over the latest backup (so that new incremental backups could be immediately be created again) and then incrementally rsync'ing the previous backups with hardlink detection.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: