Hacker News new | past | comments | ask | show | jobs | submit login

The original onus for the post was python's os.listdir() which as far as I know doesn't stat(). ls, just made the blog post more interesting :-).

I was surprised that the 32K reads were taking so long. It's possible since it was on a virtualized disk ("in the cloud") that something else was slowing down disk IO (like Xen).

But I can assure you that a larger read buffer performed much better in this given scenario.

I'd welcome more tests though.




This is just a hypothesis based on very little actual knowledge, but perhaps a very long scheduling interval is responsible for the slowness with smaller reads? Consider this scenario: the virtualization hypervisor is on a reasonably loaded system, and decides to block the virtual machine for every single read. Since the physical system has several other VMs on it, whenever the VM in question loses its time slice it has to wait a long time to get another one. Thus, even if the 32K read itself happens quickly, the act of reading alone causes a delay of n milliseconds. If you increase the read size, your VM still gets scheduled 1000/n times per second, but each time it gets scheduled it reads 5MB instead of 32K.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: