Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The cost for both training and inference is vaguely quadratic while, for the vast majority of users, the marginal utility of additional context is sharply diminishing. For 99% of ChatGPT users something like 8192 tokens, or about 20 pages of context would be plenty. Companies have to balance the cost of training and serving models. Google did train an uber long context version of Gemini but since Gemini itself fundamentally was not better than GPT-4 or Claude this didn't really matter much, since so few people actually benefited from such a niche advantage it didn't really shift the playing field in their favor.


Marginal utility only drops because effective context is really bad, i.e. most models still vastly prefer the first things they see and those "needle in a haystack" tests are misleading in that they convince people that LLMs do a good job of handling their whole context when they just don't.

If we have the effective context window equal to the claimed context window, well, I'd start worrying a bit about most of the risks that AI doomers talk about...




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: