The original onus for the post was python's os.listdir() which as far as I know ...

nitrogen · on Aug 16, 2011

This is just a hypothesis based on very little actual knowledge, but perhaps a very long scheduling interval is responsible for the slowness with smaller reads? Consider this scenario: the virtualization hypervisor is on a reasonably loaded system, and decides to block the virtual machine for every single read. Since the physical system has several other VMs on it, whenever the VM in question loses its time slice it has to wait a long time to get another one. Thus, even if the 32K read itself happens quickly, the act of reading alone causes a delay of n milliseconds. If you increase the read size, your VM still gets scheduled 1000/n times per second, but each time it gets scheduled it reads 5MB instead of 32K.