are you assuming that you can do o1 inference on a single h100?

nine_k · 2024-12-05T20:56:57 1733432217

Good question. How many H100s does it take? Is there any way to guess / approximate that?

shikon7 · 2024-12-05T21:23:32 1733433812

You need enough RAM to store the model and the KV-cache depending on context size. Assuming the model has a trillion parameters (there are only rumours how many there actually are) and uses 8 bit per parameter, 16 H100 might be sufficient.

londons_explore · 2024-12-05T22:10:50 1733436650

I suspect the biggest most powerful model could easily use hundreds or maybe thousands of H100's.

And the 'search' part of it could use many of these clusters in parallel, and then pick the best answer to return to the user.

holoduke · 2024-12-05T22:21:11 1733437271

16? No. More in the order of 1000+ h100 computing in parallel for one request.