Hacker News new | past | comments | ask | show | jobs | submit login

More like – it won't be useful to small-time developers (since they won't have the capability to host and run it themselves) and so all the benefits will be reaped by AWS and other large players.



This is what I understood as well. They want to either democratize adoption or not release it. The last thing they/anyone wants is for another BigCo or Govt to h take undue advantage of the model (through fine-tuning?) when others cannot.

That said, I can imagine a GPTQ/4-bit quantized model to be smaller and easier to run on somewhat commodity clusters?

Or it could run with GGML/llama.cpp on a cloud instance with a TB of RAM?

After seeing what people were able to do with LLaMA, I am positive that the community will find a way to run it - albeit with some loss in performance.

It would be truly amazing if they used their computing to develop quantized models as well.


A big chunk of developments based on Facebook’s LLaMA model are by small-time developers and individuals, not large players. Facebook has already shown a viable way to release models in the way you described.


If you really need, 170B parameter model can infer a few tokens per minute on commodity hardware.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: