Is compute that expensive? An H100 rents at about $2.50/hour, it's 80 hours of pure compute. Assuming 720 hours a month, 1/9 duty cycle around the clock, or 1/3 if we assume 8-hour work day. It's really intense, constant use. And I bet OpenAI spend less on operating their infra than the rate at which cloud providers rent it out.
You need enough RAM to store the model and the KV-cache depending on context size. Assuming the model has a trillion parameters (there are only rumours how many there actually are) and uses 8 bit per parameter, 16 H100 might be sufficient.
A single H100 has 80GB of memory, meaning that at FP16 you could roughly fit a 40B parameter model on it, or at FP4 quantisation you could fit a 160B parameter model on it. We don't know (I don't think) what quantisation OpenAI use, or how many parameters o1 is, but most likely...
...they probably quantise a bit, but not loads, as they don't want to sacrifice performance. FP8 seems like a possible middle ground. o1 is just a bunch of GPT-4o in a trenchcoat strung together with some advanced prompting. GPT-4o is theorised to be 200B parameters. If you wanted to run 5 parallel generation tasks at peak during the o1 inference process, that's 5x 200B, at FP8, or about 12 H100s. 12 H100s takes about one full rack of kit to run.