Serious question (since I'm not familiar with AI/ML), what's the point of releas...

rish-b · on June 18, 2023

A common reason is to reduce cost and latency. Larger models typically require GPUs with more memory (and hence higher costs), plus the time to serve requests is also higher (more matrix multiplications to be done).

andreygrehov · on June 18, 2023

Got it. That makes sense. Thank you. But what about the quality then? Can the quality of 13B model be the same as the quality of, say, 30B model?

rolisz · on June 18, 2023

Flan-T5 is a 3B model that is of comparable quality to Llama 13B.

Moreover, you can fine-tune model for your specific tasks and you need fewer resources to fine tune a smaller model.

spacebanana7 · on June 18, 2023

As a general principle the larger models are better quality.

However, fine tuned small models can outperform general purpose large models on specific tasks.

There are also many lightweight tasks like basic sentiment analysis where the correctness of small models can be good enough to point of being indistinguishable from large models.

pythux · on June 18, 2023

It is very expensive to train these base models so a smaller size is more practical if you aren’t a big company with hundreds of powerful GPUs at hand. Table 15 from LLaMA paper[1] has some insightful figures: it took 135,168 GPU hours to train the 13B version and a bit more than 1M GPU hours for the 65B version. And we are talking about A100 80GB GPUs here (expensive and scarce). Not everyone can afford these kinds of trainings (especially if it takes a few attempts; e.g. if you’ve got a bug in the tokenizer)

[1] https://arxiv.org/pdf/2302.13971.pdf

andreygrehov · on June 18, 2023

Hold on, are you saying I can grab the 13B OpenLLaMa model and train it? I thought all of these models are already pre-trained and represent sort of the end state. Am I completely missing the point?

JohnKemeny · on June 18, 2023

A neural network is just a bunch of weights. You can always continue modifying the weights as you see fit. A network is never "done" learning.

wahahah · on June 18, 2023

RAM requirements