Hacker News new | past | comments | ask | show | jobs | submit login

No mention of filesystem. As it's RHEL 5.4 I'm going to guess ext3, which uses indirect blocks instead of extents for large files (which a directory containing millions of files surely is). Would also be useful to confirm that dir_index is enabled.

Some useful background material:

http://computer-forensics.sans.org/blog/2008/12/24/understan...

http://static.usenix.org/publications/library/proceedings/al...




Also, the storage is RAID-10, so striped and who knows what kind of caching goes on in the hardware controller.

The numbers are not that useful. It's notable that rsync:rm went from 1:12 in his old test to 1:3 in his new test, but we really don't know anything about why.

FWIW (very little), I did a similar test on a convenient OSX box (HFS+, 1000000 zero-byte files, single spindle), and rm won. rsync was next (+25%), straight C came in a little higher, then ruby (20% over rsync). Maybe BSD rm is awesome.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: