Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

are you assuming that you can do o1 inference on a single h100?


Good question. How many H100s does it take? Is there any way to guess / approximate that?


You need enough RAM to store the model and the KV-cache depending on context size. Assuming the model has a trillion parameters (there are only rumours how many there actually are) and uses 8 bit per parameter, 16 H100 might be sufficient.


I suspect the biggest most powerful model could easily use hundreds or maybe thousands of H100's.

And the 'search' part of it could use many of these clusters in parallel, and then pick the best answer to return to the user.


16? No. More in the order of 1000+ h100 computing in parallel for one request.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: