Hacker News new | past | comments | ask | show | jobs | submit login

One possible advantage of using mmap over a buffer pool can be programmer ergonomics.

Reading data into a buffer pool in process RAM takes time to warm up, and the pool can only be accessed by a single process. In contrast, for an mmap-backed data structure, assuming that files are static once written (which can be the case for an multi-version concurrency control (MVCC) architecture), you open an mmap read-only connection from any process and the so long as the data is already in the OS cache, you get instant fast reads. This makes managing database connections much easier, since connections are cheap and the programmer can just open as many as they want whenever and wherever they want.

It is true that cache eviction strategy used by the OS is likely to be suboptimal. So if you're in a position to only run a single database process, you might decide to make different tradeoffs.




This is true, but in the case where files are read only, just reading directly from the files with fread()/read()/etc works pretty well. You do have to pay the cost of a system call and a copy from the OS buffer cache into your user-space buffer, but OTOH when the page isn't in the buffer cache, the cost of reading the required data from storage is more predictable than the cost of faulting in all the 4kb pages you're reading.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: