Sort of funny for the lack of local options considering that Mozilla funds llama...

Sort of funny for the lack of local options considering that Mozilla funds llamafile even. Hopefully they allow some API integration, if they are using standard OpenAI API calls, it should be easy to enable swapping the endpoint.

Also, while it's nice to have a service option for those without any spare compute, I think it's a bit of a shame on the model considering how even at the 7B class, models like Llama 3.1 8B, Qwen 2.5 8B or Tulu 3 8B, Falcon 3 7B, all clearly outclass Mistral 7B (Mistral 7B is also very bad at multilingual, and is particularly inefficient at multilingual tokenization).

The current best fully open weights (Apache 2.0 or similar) small models currently are probably: OLMo 2 7B, Qwen 2.5 7B, Granite 3.1 8B, and Mistral Nemo Instruct (12B)

There's been a recent launch of a "GPU-Poor" Chat Arena for those interested in scoping out some of the smaller models (not a lot of ratings so very noisy, take it with a grain of salt): https://huggingface.co/spaces/k-mktr/gpu-poor-llm-arena