Hacker News new | past | comments | ask | show | jobs | submit login

It seems like a lot of innovation is around training, no? GGML (the library that reads GGUF format) supports these values for the required 'general.architecture':

  llama
  mpt
  gptneox
  gptj
  gpt2
  bloom
  falcon
  rwkv



I've also been trying to figure out GGUF and the other model formats going around. I'm horrified to see there is no model architecture details in the file! As you say, it seems they are hard-coding the above architectures as constants. If a new hot model comes out, one would need to update the reader code (which has the new model arch implemented). Am I understanding this right?

I'm also a bit confused by the quantization aspect. This is a pretty complex topic. GGML seems to use 16bit as per the article. If was pushing it to 8bit, I reckin I'd see no size improvement the GGML file? The article says they encode quantization versions in that file. Where are they defined?


Why are you horrified?

In designing software, there's often a trade off between (i) generality / configurability, and (ii) performance.

llama.cpp is built for inference, not for training or model architecture research. It seems reasonable to optimize for performance, which is what ~100% of llama.cpp users care about.


GGUF files seems to be proliferating. I think some folks (like myself) make an incorrect assumption that the format has more portability/generalizability than it appears to have. Hence, the horror!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: