Hacker News new | past | comments | ask | show | jobs | submit login

One of the nice things about using mmap'd data structures is that you don't have to slurp the entire thing into memory to work on it. The OS's virtual memory system will only load a few pages at a time for whatever's needed. This means that in practical applications you're likely to be within an order of magnitude of RAM speed if you're using an SSD.

Secondly, the other nice bit is that you can use structures larger than the physical RAM on the system. It's kind of like getting a system with enormous amounts of slow-ish RAM.

Thirdly, you can build a whole bunch of these for different purposes and just open the ones you need, and not bother with the rest, again like having even more RAM.

Finally, you can multi-process trivially on these as the OS sorts out the mess and suddenly it's like having a slightly slow multi-million dollar multi-tb shared memory supercomputer from 5-8 years ago at your disposal. Works great on Beowulf clusters if your disk storage is shared on fast links and can handle the IOPS.

The number of applications you can do with a 5-8 year old super computer are vast. I've seen 20tB classifiers built on more or less commodity hardware (<$100k for a cluster of cheap hardware and one big disk) that were previously thought to require a $5 million dollar supercomputer just because of the large shared memory. You can probably build an equivalent machine for <$10k these days off of Newegg and some decent NAS boxes.




> One of the nice things about using mmap'd data structures is that you don't have to slurp the entire thing into memory to work on it.

Right, but one of the not-so-nice-things is that you can't do the I/O asynchronously, so you can end up with poor performance depending on the access pattern. (I guess you can, with another thread running in the background touching pages, but it's more of a pain and you're not really guaranteed the data will stay in memory until you use them.) [Edit: Actually I guess if you touch by writing to those pages rather than just reading from them then they'll have to stay in memory... though do note that I'm assuming no swap here.]


At some cost to portability, IIRC, on Linux you can ask the operating system to keep pages in memory. Other systems will probably have similar functionality.


They can remain paged in, but unlike on FreeBSD, you can't control when dirty pages are flushed to the backing filesystem on Linux. Specifically, the MAP_NOSYNC option doesn't exist on Linux (see https://www.freebsd.org/cgi/man.cgi?sektion=2&query=mmap for a description).


Windows has NtLockVirtualMemory, but (a) it requires special permissions (meaning random apps can't do it without admin privileges), and (b) something about it feels like it's the wrong way to do it, but I can't pin down what it is exactly.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: