I agree with your premise: I have used 65b variants and of course they’re not as...

lhl · on April 10, 2023

It's worth pointing out that size isn't everything. From Meta's benchmarking [1] LLaMA 33B outperforms GPT-3 175B, Gopher 280B, Chinchilla 70B and even matches PaLM 540B on a bunch of common evals. Those interested in doing more comparisons can look at https://crfm.stanford.edu/helm/latest/?group=core_scenarios and https://paperswithcode.com/paper/llama-open-and-efficient-fo... to see where it sits (with some GPT 3.5 and 4 numbers here: https://paperswithcode.com/paper/gpt-4-technical-report-1)

I'd agree the secret sauce for how great the newest services perform is probably in the fine-tuning. We're seeing almost daily releases of fine-tuning data sets, training methods and models (at lower and lower costs) so I'm personally pretty optimistic that we'll be seeing some big improvement in self-hosted LLM performance pretty quickly.

[1] https://ar5iv.labs.arxiv.org/html/2302.13971#:~:text=Table%2....

bitL · on April 10, 2023

> Why would we expect to get comparable performance with models a fraction of the size and a pittance of the fine tuning?

LLaMA incorporated new techniques that make 65B perform way better than GPT-3's 175B so the model size argument is not very strong.