Can you say a bit more about this? Based on my non-scientific personal experienc...

Can you say a bit more about this? Based on my non-scientific personal experience on an M1 with 64gb memory, that's approximately what it seems to be. If the model is 4gb in size, loading it up and doing inference takes about 4gb of memory. I've used LM Studio and llamafiles directly and both seem to exhibit this behavior. I believe llamafiles use mmap by default based on what I've seen jart talk about. LM Studio allows you to "GPU offload" the model by loading it partially or completely into GPU memory, so not sure what that means.