> Inference cost are in free fall. The recent optimizations from DeepSeek means that all the available GPUs could cover a demand of 10k tokens per day from a frontier model for… the entire earth population. There is nowhere this level of demand. The economics of selling tokens does not work anymore for model providers: they have to move higher up in the value chain.
Even before DeepSeek, the prices were declining by about 90% per year when keeping performance constant. The way to think about economics is different I think. Think of it as any other industry that is on a learning curve like chips, batteries, solar panels, or you name it. The price in these industries keeps falling each year. The winners are the companies that can keep scaling up their production. Think TSMC for example. Nobody can produce high quality chips for a lower price than TSMC due to economies of scale. For instance, one PhD at the company can spend 4 years optimizing a tiny part of the process. But it’s worth it because if it makes the process 0.001% cheaper to run then the PhD paid itself back on the TSMC scale.
So the economics for selling tokens does work. The question is who can keep scaling up long enough so that the rest (has to) give up.
Even before DeepSeek, the prices were declining by about 90% per year when keeping performance constant. The way to think about economics is different I think. Think of it as any other industry that is on a learning curve like chips, batteries, solar panels, or you name it. The price in these industries keeps falling each year. The winners are the companies that can keep scaling up their production. Think TSMC for example. Nobody can produce high quality chips for a lower price than TSMC due to economies of scale. For instance, one PhD at the company can spend 4 years optimizing a tiny part of the process. But it’s worth it because if it makes the process 0.001% cheaper to run then the PhD paid itself back on the TSMC scale.
So the economics for selling tokens does work. The question is who can keep scaling up long enough so that the rest (has to) give up.