Author here. It does make use of mmap(). I worked on adding mmap() support to llama.cpp back in March, specifically so I could build things like Emacs Copilot. See: https://github.com/ggerganov/llama.cpp/pull/613 Recently I've been working with Mozilla to create llamafile, so that using llama.cpp can be even easier. We've also been upstreaming a lot of bug fixes too!