Noob question, and may be probably being asked at the wrong place.
Is there any way to find out min system requirements for running ollama run commands with different models.
On my 32G M2 Pro Mac, I can run up to about 30B models using 4 bit quantization. It is fast unless I am generating a lot of text. If I ask a 30B model to generate 5 pages of text it can take over 1 minute. Running smaller models like Mistral 7B is very fast.
Install Ollama from https://ollama.ai and experiment with it using the command line interface. I mostly use Ollama’s local API from Common Lisp or Racket - so simple to do.
EDIT: if you only have 8G RAM, try some of the 3B models. I suggest using at least 4 bit quantization.
You can easily experiment with smaller models, for example, Mistral 7B or Phi-2 on M1/M2/M3 processors. With more memory, you can run larger models, and better memory bandwidth (M2 Ultra vs. M2 base model) means improved performance (tokens/second).
They have a high level summary of ram requirements for the parameter size of each model and how much storage each model uses on their GitHub: https://github.com/ollama/ollama#model-library