Looking at the code, it looks like deallocating a hash table requires traversing the entire table, because there is malloc()'d memory associated with each hash entry, so each entry has to be visited and free()'d. From hash_free() in coreutils hash.c:
for (bucket = table->bucket; bucket < table->bucket_limit; bucket++)
{
for (cursor = bucket->next; cursor; cursor = next)
{
next = cursor->next;
free (cursor);
}
}
Whereas if you just don't bother to deallocate the table before the process exits, the OS will reclaim the whole memory block without having to walk a giant data structure. That's a fairly common situation in C programs that do explicit memory management of complex data structures in the traditional malloc()/free() style. Giant linked lists and graph structures are another common culprit, where you have to pointer-chase all over the place to free() them if you allocated them in the traditional way (vs. packing them into an array or using a userspace custom allocator for the bookkeeping).
... and both malloc() and free() need to maintain (update) their own data structures, which in many implementations (don't know about the current glibc one) are stored interspersed with the data. Walking a hash table and calling free() on each and every item, even if it doesn't seem so, actually dirties a large number of memory pages (which could be truly randomly scattered because it is a hash table), which then need to be re-written into the swap space on swap-out.
You could, but it's fairly uncommon to do that kind of "custom allocator" style in C, especially in traditional Unix utility programming. It seems to be more of a C++ thing culturally in my experience, though you do find it in C in "big" programs, like scientific-computing code.
Others have already mentioned that it isn't really necessary, but for some historical context this wasn't always true. When I first started programming computers, apps (written in languages that aren't garbage collected) on your typical consumer level computer/OS had to be really careful with freeing up all allocated memory before exiting or that memory would be gone forever (as far as the OS was concerned when it came to reallocating it to other apps) until the user rebooted the computer.
Advances coming from the combination of more advanced operating systems and CPU features like MMUs have made this a non-issue in most cases (can still be an issue on embedded, etc).
This was a particular problem on the Amiga system, among others. There were attempts to retro-fit resource tracking to the OS, but this was made all-but-impossible by the fact that a process can malloc() a chunk of memory, write a message in it, and then hand a pointer to that message to another process, expecting the other process to dealloc() it. This made for wonderfully cheap inter-process communication (for instance, the filesystem would load blocks from disc and pass them as messages to the process wanting them).
If it's the last thing you do before you exit the process, it isn't necessary, because the OS will reclaim your process's memory in one fell swoop. I believe that's what the linked post is advocating 'cp' should do. (At least on modern systems that's true; maybe there are some exotic old systems where not freeing your data structures before exit causes permanent memory leaks?)
It's seen as good C programming practice to free() your malloc()s, though, and it makes extending programs easier if you have that functionality, since what was previously the end of program can be wrapped in a higher-level loop without leaking memory. But if you really are exiting for sure, you don't have to make the final free-memory call. It can also be faster to not do any intermediate deallocations either: just leave everything for the one big final deallocation, as a kind of poor-man's version of one-generation generational GC. Nonetheless many C programmers see it somehow as a bit unclean not to deallocate properly. Arguably it does make some kind of errors more likely if you don't, e.g. if you have cleanup that needs to be done that the OS doesn't do automatically, you now have different kinds of cleanup routines for the end-of-process vs. not-end-of-process case.
I tend to do this in my C programs because in development usually have malloc() wrapped so that if any block hasn't been free()'ed it's reported at exit() time. This kind of check for lost pointers is usually so cheap that you use it even if you never expect to run on a system without decent memory management.
As an aside, GNU libc keeps ( or at least used to keep, I haven't checked in years ) the pointers used by malloc()/free() next to the blocks themselves, which gives really bad behavior when freeing a large number of blocks that have been pushed out to swap--you wind up bringing in pages in order to free them because the memory manager's working set is the size of all allocated memory. Years ago I wrote a replacement that avoided this just to speed up Netscape's horrible performance when it re-sized the bdb1.85 databases it used to track browser history. The browser would just "go away" thrashing the disk for hours and killing it just returned you to a state where it would decide to resize again an hour or so after a restart. Using LD_PRELOAD to use a malloc that kept it's bookkeeping away from the allocated blocks changed hours to seconds.
It's not necessary per se, but many programmers do it out of good habits. It becomes much harder to e.g. find memory leaks in your code if you never free the memory you haven't leaked. And normally the extra overhead it entails is imperceptible.
Obviously many programmers do not think to (or cannot easily) test their code with 40 TBs of data.
If you wrote cp in such a way that it's a reusable C library that other programs might use, then I can understand why they do this. It still doesn't make much sense for the command-line version to call this. Of course, it might run on some weird platform where the OS doesn't reclaim the memory of a process on exit(?).