This seems to be a persistent issue with almost all weight releases, even from bigger companies like Meta.
Are the people who release these weights not testing them in various inference engines? Seems they make it work with Huggingface's Transformers library, then call it a day, but sometimes not even that.
Oh so chat template issues yes are quite pervasive sadly - for eg Llama as you mentioned, but also Qwen, Mistral, Google, the Phi team, DeepSeek - it's actually very common!
My take is large labs with closed source models also did have issues during the beginning, but most likely have standardized the chat template (for eg OpenAI using ChatML). The OSS community on the other hand keeps experimenting with new templates - for example adding tool calling causes a large headache. For example in https://unsloth.ai/blog/phi3 - we found many bugs in OSS models.
No they don't. Why would they? Most of them are using a single inference engine, most likely developed inhouse. Or they go for something like vLLM, but llama.cpp especially is under their radar.
The reason is simple. There isn't much money in it. llama.cpp is free and targets lower end of the hardware spectrum. Corporations will run something else, or even more likely, offload the task to contractor.
./llama.cpp/llama-cli -hf unsloth/SmolLM3-3B-GGUF:Q4_K_XL --jinja -ngl 99