Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is compute that expensive? An H100 rents at about $2.50/hour, it's 80 hours of pure compute. Assuming 720 hours a month, 1/9 duty cycle around the clock, or 1/3 if we assume 8-hour work day. It's really intense, constant use. And I bet OpenAI spend less on operating their infra than the rate at which cloud providers rent it out.


are you assuming that you can do o1 inference on a single h100?


Good question. How many H100s does it take? Is there any way to guess / approximate that?


You need enough RAM to store the model and the KV-cache depending on context size. Assuming the model has a trillion parameters (there are only rumours how many there actually are) and uses 8 bit per parameter, 16 H100 might be sufficient.


I suspect the biggest most powerful model could easily use hundreds or maybe thousands of H100's.

And the 'search' part of it could use many of these clusters in parallel, and then pick the best answer to return to the user.


16? No. More in the order of 1000+ h100 computing in parallel for one request.


Does an o1 query run on a singular H100, or on a plurality of H100s?


A single H100 has 80GB of memory, meaning that at FP16 you could roughly fit a 40B parameter model on it, or at FP4 quantisation you could fit a 160B parameter model on it. We don't know (I don't think) what quantisation OpenAI use, or how many parameters o1 is, but most likely...

...they probably quantise a bit, but not loads, as they don't want to sacrifice performance. FP8 seems like a possible middle ground. o1 is just a bunch of GPT-4o in a trenchcoat strung together with some advanced prompting. GPT-4o is theorised to be 200B parameters. If you wanted to run 5 parallel generation tasks at peak during the o1 inference process, that's 5x 200B, at FP8, or about 12 H100s. 12 H100s takes about one full rack of kit to run.


o1 is ten times as expensive as pre-turbo GPT-4.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: