Is compute *that* expensive? An H100 rents at about $2.50/hour, it's 80 hours of...

drdrey · 2024-12-05T20:17:17 1733429837

are you assuming that you can do o1 inference on a single h100?

nine_k · 2024-12-05T20:56:57 1733432217

Good question. How many H100s does it take? Is there any way to guess / approximate that?

shikon7 · 2024-12-05T21:23:32 1733433812

You need enough RAM to store the model and the KV-cache depending on context size. Assuming the model has a trillion parameters (there are only rumours how many there actually are) and uses 8 bit per parameter, 16 H100 might be sufficient.

londons_explore · 2024-12-05T22:10:50 1733436650

I suspect the biggest most powerful model could easily use hundreds or maybe thousands of H100's.

And the 'search' part of it could use many of these clusters in parallel, and then pick the best answer to return to the user.

holoduke · 2024-12-05T22:21:11 1733437271

16? No. More in the order of 1000+ h100 computing in parallel for one request.

ssl-3 · 2024-12-05T20:26:40 1733430400

Does an o1 query run on a singular H100, or on a plurality of H100s?

danpalmer · 2024-12-05T22:44:38 1733438678

A single H100 has 80GB of memory, meaning that at FP16 you could roughly fit a 40B parameter model on it, or at FP4 quantisation you could fit a 160B parameter model on it. We don't know (I don't think) what quantisation OpenAI use, or how many parameters o1 is, but most likely...

...they probably quantise a bit, but not loads, as they don't want to sacrifice performance. FP8 seems like a possible middle ground. o1 is just a bunch of GPT-4o in a trenchcoat strung together with some advanced prompting. GPT-4o is theorised to be 200B parameters. If you wanted to run 5 parallel generation tasks at peak during the o1 inference process, that's 5x 200B, at FP8, or about 12 H100s. 12 H100s takes about one full rack of kit to run.

anticensor · 2024-12-06T19:21:25 1733512885

o1 is ten times as expensive as pre-turbo GPT-4.