llama.cpp is still under development and they sometimes come out with breaking c...

llama.cpp is still under development and they sometimes come out with breaking changes or new quantization methods, and it can be a lot of work to keep up with these changes as you publish more models over time. It's easier to just publish a standard float32 safetensors that works with PyTorch, and let the community deal with other runtimes and file formats.

If it's a new architecture, then there's also additional work needed to add support in llama.cpp, which means more dev time, more testing, and potentially loss of surprise model release if the development work has to be done out in the open