> Training: The gpt-oss models trained on NVIDIA H100 GPUs using the PyTorch framework [17] with expert-optimized Triton [18] kernels2. The training run for gpt-oss-120b required 2.1 million H100-hours to complete, with gpt-oss-20b needing almost 10x fewer.
This makes DeepSeek's very cheap claim on compute cost for r1 seem reasonable. Assuming $2/hr for h100, it's really not that much money compared to the $60-100M estimates for GPT 4, which people speculate as a MoE 1.8T model, something in the range of 200B active last I heard.
This makes DeepSeek's very cheap claim on compute cost for r1 seem reasonable. Assuming $2/hr for h100, it's really not that much money compared to the $60-100M estimates for GPT 4, which people speculate as a MoE 1.8T model, something in the range of 200B active last I heard.