Hacker News new | past | comments | ask | show | jobs | submit login

I've been training large 65b models on "rent for N hours" systems for less than 1k per customized model. Then fine tuning those to be whatever I want for even cheaper.

2 months since gpt 4.

This ride has only just started, fasten your whatevers.




Finetuning cost are nowhere near representative of the cost to pre-train those models.

Trying to replicate the quality of GPT-3 from scratch, using all the tricks and training optimizations in the books that are available now but weren't used during GPT-3 actual training, will still cost you north of $500K, and that's being extremly optimistic.

GPT-4 level model would be at least 10x this using the same optimism (meaning you are managing to train it for much cheaper than OpenAI). And That's just pure hardware cost, the team you need to actually makes this happen is going to be very expensive as well.

edit: To quantify how "extremely optimistic" that is, the very model you are finetuning, which I assume is Llama 65B, would cost around ~$18M to train on google cloud assuming you get a 50% discount on their listed GPU prices (2048 A100 GPUs for 5 months). And that's not even GPT-4 level.


$5M to train GPT-4 is the best investment I've ever seen. I've seen startups waste more money for tremendously smaller impact.


As I stated in my comment, $5M is assuming you can do a much much better job than OpenAI at optimizing your training, only need to make a single training run, your employees salaries are $0, and you get a clean dataset for essentially free.

Real cost is 10-20x that.

That's still a good investment though. But the issue is you could very well sink $50M into this endeavour and end up with a model that actually is not really good and gets rendered useless by an open-source model that gets released 1 month later.

OpenAI truly has unique expertise in this field that is very, very hard to replicate.


> and end up with a model that actually is not really good and gets rendered useless

ahem Bard ahem


You are confusing training with fine-tuning which is a different beast.


No I'm not, it's the full model on 8 gpus for a couple hundred. After training I fine tune for chats but mostly command and control tools, and then you fine-tune for application.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: