Hacker News new | past | comments | ask | show | jobs | submit login

The pipe is not the issue; you actually are storing the full paths of all files in the archive somewhere (namely in RAM) so that you're able to decide whether a given file you're looking at is an alternative hardlink for a file you have already sent.

Yes, using tar for copying is not new and has its use cases, but this is not one of them.




Certainly not for all files.

The filesystem keeps a track of the number of nodes links for each file. So tar does not have to memorize the inode for each file, but only for those who have duplicates. So unless a large amount of files have more than one hard link, this is probably not a problem.

We will never know though, considering the author did not try it. Neither did he explain why he did not, despite the fact that it is the recommended Unix procedure.


From the original post: "We use rsnapshot (..) so most of the files have a high link count."

> it is the recommended Unix procedure

I don't think it is "recommended procedure", not today. At least on GNU/Linux cp is just fine. It may have been different two decades ago for some reason: I remember a graphical file manager (called "dfm") that used piped tar back in the late 90ies [1]. I know that installing open source utilities was pretty common back then to get better programs on closed source Unixes, and tar was probably high on the priorities to install whereas cp less so, and hence tar might have in fact been better than the preinstalled cp. I haven't seen anyone in the public technical (hence primarily open source) community advocating that even for portability reasons, so I can only think of it as "perhaps historically relevant, if at all".

Your point is valid though that one could have tested it here. See rbh42's answer on this point. Although I'm not sure he's right about both tar instances requiring a hash table: if tar submits the hard linking as from a previously submitted path to the new one, then there's no need for the table. And my quick look at the transmitted data seems to contain the source path for the link. The question then becomes whether tar has a better optimized hash table (not very likely), or uses a better algorithm (see my post about "megacopy" for an attempt at the latter).

[1] It actually had a problem: it didn't put "--" between the options and the path, and promptly failed when I moved an item that had a file name starting with a minus: it was interpreted by tar as option, and in addition the file manager didn't properly check the exit codes and simply unlinked the source even though it wasn't copied. The tools before GNU may really have been wanting: his failure to use "--" may have been caused by some contemporary non-GNU tar not supporting it. Anyway, of course I moved on to another file manager. Those bits of mine were sacrificed to the god of superior recommended Unix procedures or something.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: