Not really sure how this article refutes what I said?
He defines it as "everything that happens from when you put a prompt in to generate an output" -> but he seems to conflate inference with a query. Putting in input to generate the next single token is inference. A query or response just means the LLM repeats this until the stop token is emitted. (Happy to be corrected here)
The cost of inference per token is going down - the cost per query goes up because models consume more tokens, which was my point.
Either way, charging consumers per token pretty much guarantees that serving models is profitable (each of Anthropic's prior models turn a profit). The consumer-friendly flat 20$ subscription is not sustainable in the long run.