The costs for OpenAI and Google aren't public, but if you look at the open-source models, inference is very cheap: for example, you can generally beat the public serverless prices by a factor of ~2 by using dedicated GPUs [1], and given that a 70b model costs about $1/million tokens serverless — and tend to perform similarly on benchmarks to 4o — OpenAI is most likely getting very fat profit margins at $2.50/million input tokens and $10/million output tokens.
The problem for them is making enough money for the training runs (where it seems like their strategy is to raise money on the hope they achieve some kind of runaway self-improving effect that grants them an effective monopoly on the leading models, combined with regulatory pushes to ban their competitors) — but it seems very unlikely to me that they're losing money serving the models.
The problem for them is making enough money for the training runs (where it seems like their strategy is to raise money on the hope they achieve some kind of runaway self-improving effect that grants them an effective monopoly on the leading models, combined with regulatory pushes to ban their competitors) — but it seems very unlikely to me that they're losing money serving the models.
1: https://fireworks.ai/blog/why-gpus-on-demand