You are right that 32k buffers should be more than enough to read 5MB in a reasonable time, regardless of disk/VM architecture, but I don't think stat was the problem either. My guess would be readdir or possibly getdents is probably O(n^2) somewhere.
[just noticed it was 500M (oh wow), but same difference]
[just noticed it was 500M (oh wow), but same difference]