I've been training large 65b models on "rent for N hours" systems for less than ...

Voloskaya · on April 17, 2023

Finetuning cost are nowhere near representative of the cost to pre-train those models.

Trying to replicate the quality of GPT-3 from scratch, using all the tricks and training optimizations in the books that are available now but weren't used during GPT-3 actual training, will still cost you north of $500K, and that's being extremly optimistic.

GPT-4 level model would be at least 10x this using the same optimism (meaning you are managing to train it for much cheaper than OpenAI). And That's just pure hardware cost, the team you need to actually makes this happen is going to be very expensive as well.

edit: To quantify how "extremely optimistic" that is, the very model you are finetuning, which I assume is Llama 65B, would cost around ~$18M to train on google cloud assuming you get a 50% discount on their listed GPU prices (2048 A100 GPUs for 5 months). And that's not even GPT-4 level.

bagels · on April 17, 2023

$5M to train GPT-4 is the best investment I've ever seen. I've seen startups waste more money for tremendously smaller impact.

Voloskaya · on April 17, 2023

As I stated in my comment, $5M is assuming you can do a much much better job than OpenAI at optimizing your training, only need to make a single training run, your employees salaries are $0, and you get a clean dataset for essentially free.

Real cost is 10-20x that.

That's still a good investment though. But the issue is you could very well sink $50M into this endeavour and end up with a model that actually is not really good and gets rendered useless by an open-source model that gets released 1 month later.

OpenAI truly has unique expertise in this field that is very, very hard to replicate.

moffkalast · on April 17, 2023

> and end up with a model that actually is not really good and gets rendered useless

ahem Bard ahem

Tepix · on April 18, 2023

You are confusing training with fine-tuning which is a different beast.

ldehaan · on April 24, 2023

No I'm not, it's the full model on 8 gpus for a couple hundred. After training I fine tune for chats but mostly command and control tools, and then you fine-tune for application.