I hit this with a log trimming script I wrote in perl a few months back. We had one directory with about 4 million files.
Just in case you're curious I left it run and it took about 25 hours to get all of the directory entries. One with about 3 million files took 12 hours, so i'm not sure how long it would have taken if you'd have let ls run on its own accord.
Only 200k files afterwards though. :D
I think i'll poke around with this after I get a coffee, I remember stracing the interpreter and getting annoyed at its 32k or bust behavior. But since I didn't have a time limit I didn't much care about runtime. Thanks for the write up!