Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I agree with your premise: I have used 65b variants and of course they’re not as good as OpenAI. GPT3 has 175b parameters, and OpenAI has done more RLHF than anyone else. Why would we expect to get comparable performance with models a fraction of the size and a pittance of the fine tuning?

That said, it’s clear that replicating GPT4+ performance is within the resources of a number of large tech orgs.

And the smaller models can definitely still be useful for tasks.



It's worth pointing out that size isn't everything. From Meta's benchmarking [1] LLaMA 33B outperforms GPT-3 175B, Gopher 280B, Chinchilla 70B and even matches PaLM 540B on a bunch of common evals. Those interested in doing more comparisons can look at https://crfm.stanford.edu/helm/latest/?group=core_scenarios and https://paperswithcode.com/paper/llama-open-and-efficient-fo... to see where it sits (with some GPT 3.5 and 4 numbers here: https://paperswithcode.com/paper/gpt-4-technical-report-1)

I'd agree the secret sauce for how great the newest services perform is probably in the fine-tuning. We're seeing almost daily releases of fine-tuning data sets, training methods and models (at lower and lower costs) so I'm personally pretty optimistic that we'll be seeing some big improvement in self-hosted LLM performance pretty quickly.

[1] https://ar5iv.labs.arxiv.org/html/2302.13971#:~:text=Table%2....


> Why would we expect to get comparable performance with models a fraction of the size and a pittance of the fine tuning?

LLaMA incorporated new techniques that make 65B perform way better than GPT-3's 175B so the model size argument is not very strong.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: