Part of the reason their API is so cheap because they explicitly state they are going to train on your API data. Open AI and Claude say they won’t if you use their API (if you use ChatGPT that’s a different story). There are no free lunches.
This comment is misleading. There is a "free lunch" here in the sense that serving this model is far cheaper than worse, open source models at scale.
Yes they probably are more willing to go down in price due to this, but the architecture is open, and they are charging similarly to a 30B-50B dense model, which is about how many active params deepseek-v3 has.
Its a matter of degree. If 90% of the cost savings are from a new, smarter architecture, it doesn't make sense to point to the API terms as the reason for it being so cheap.