At that point you might as well use a dynamic array which doesn't invalidate old...

CoastalCoder · on June 4, 2023

Just to clarify, our approach was to help find the bug that was misusing an instance of std::vector.

It was a temporary measure, put in place only until we found the bug.

2594119481 · on June 5, 2023

This looks like a really neat approach!

I'd be curious to see how this approach performs when integrated in a running system. In particular, there are some additional costs associated with this approach which might not appear in the listed microbenchmarks. Using mmap/mprotect/munmap like this will cause page faults and TLB shootdowns. After mprotect()-ing memory to allow writes to it, writes will still initially cause page faults. On page fault, the kernel will need to allocate a page of memory, map the page into the process, and then a TLB shootdown to clear the core-local caches of the page table. In addition to these costs, every page fault incurs a context switch to/from the kernel for each 4KB page.

The TLB shootdowns are especially costly, since they slow down all threads currently running in the process, not just the threads that are performing the memory access. As a result, this approach could slow down other parts of the program that aren't even using DynArray.

One potential approach to mitigate this is to use Large/Huge pages. Because all the costs that I mentioned are incurred on each page miss, you can make misses less frequent by using large pages. The issue is that, if you're going around allocating 1GB pages willy-nilly, then you're very quickly going to run out of system memory. You can also run into memory fragmentation issues, since large pages require the system to have 1GB of physically contiguous memory available (whereas with 4KB pages, you can slice and dice from anywhere in physical memory).

Another solution is to use caching/leverage a user-space memory allocator. One benefit of using the std::vector approach is that, if you're using a properly tuned memory allocator, once your program reaches a steady-state of memory usage, it should never need to ask the OS for more memory. Because the memory allocator is just recycling memory over and over again, it never needs to update its page tables (by using mmap() to ask for more memory from the OS), and so it never has page faults or incurs TLB shootdowns.

Regardless, there are lots of interesting trade-offs in this space. Especially between memory usage and compute. And for short-lived applications, for example, I could see DynArray being a clear win.

One of the things I really like about performance optimization is the amount that you can experiment--different solutions perform better in different scenarios, and it's fun to try them all out!