1) The costs will go down over time, much of the cost is the margin of NVIDIA an...

sofixa · 2024-09-27T12:06:47 1727438807

> The costs will go down over time, much of the cost is the margin of NVIDIA and training new models

Isn't each new model bigger and heavier and thus requries more compute to train?

JanSt · 2024-09-27T12:09:49 1727438989

Yes, but 1) you only need to train the model once and the inference is way cheaper. Train one great model (i.e. Claude 3.5) and you can get much more than $80/month worth out of it. 2) the hardware is getting much better and prices will fall drastically once there is a bit of a saturation of the market or another company starts putting out hardware that can compete with NVIDIA

sofixa · 2024-09-27T12:19:49 1727439589

> Train one great model (i.e. Claude 3.5) and you can get much more than $80/month worth out of it

Until the competition outcompetes you with their new model and you have to train a new superior one, because you have no moat. Which happens what, around every month or two?

> the hardware is getting much better and prices will fall drastically once there is a bit of a saturation of the market or another company starts putting out hardware that can compete with NVIDIA

Where is the hardware that can compete with NVIDIA going to come from? And if they don't have competition, which they don't, why would they bring down prices?

ben_w · 2024-09-27T13:06:49 1727442409

> Until the competition outcompetes you with their new model and you have to train a new superior one, because you have no moat. Which happens what, around every month or two?

Eventually one of you runs out of money, but your customers keep getting better models until then; and if the loser in this race releases the weights on a suitable gratis license then your businesses can both lose.

But that still leaves your customers with access to a model that's much cheaper to run than it was to create.

JanSt · 2024-09-27T12:26:06 1727439966

The point is not that every lab will be profitable. There only needs to be one model in the end to increase our productivity massively, which is the point I'm making.

Huge margins lead to a lot of competition trying to catch up, which is what makes market economies so successful.

Workaccount2 · 2024-09-27T14:10:42 1727446242

Gemini models are trained and run on Google's in house TPU's, which frankly are incredible compared to H100's. In fact Claude was trained on TPUs.

Google however does not sell these, you can only lease time on them via GCP.

robrenaud · 2024-09-27T14:40:27 1727448027

Then those new models get distilled into smaller ones.

Raising the max intelligence of the models tends to raise the intelligence of all the models via distillation.