Strange. I remember trying to get this to work on a 16gb machine and all of the comments on a github issue mentioning it were saying it needs at least 32 or more.
Try llamafile https://github.com/Mozilla-Ocho/llamafile
I have Mistral 7B running this way on a 10 year old laptop and it only seems to use a few GB with it's memory mapping approach.
If it is lazy loading just what it needs, seems like an efficient use of memory. In any case, this 4GB model will easily fit into the commenter's 16GB machine.
It’s really just a difference in accounting. Memory used for memory-mapped files aren’t shown in the “used” header, but instead the disk cache one. And doesn’t need to be swapped out to be discarded, so if you lack the memory it just slows everything down without an obvious cause.
/edit this was with llama cpp though not ollama