Hacker News new | past | comments | ask | show | jobs | submit login

The interesting question to me is how far these reasoning models can be scaled. With another 12 months of compute scaling (for synthetic data generation and RL) how good will these models be at coding? I talked with Finbarr Timbers (ex-DeepMind) yesterday about this and his take is that we'll hit diminishing returns – not because we can't make models more powerful, but because we're approaching diminishing returns in areas that matter to users and that AI models may be nearing a plateau where capability gains matter less than UX.



I think in a lot of ways we are already there. Users are clearly already having difficulty seeing which model is better or if new models are improving over old models. People go back to the same gotcha questions and get different answers based on the random seed. Even the benchmarks are getting very saturated.

These models already do an excellent job with your homework, your corporate PowerPoints and your idle questions. At some point only experts would be able to decide if one response was really better than another.

Our biggest challenge is going to be finding problem domains with low performance that we can still scale up to human performance. And those will be so niche that no one will care.

Agents on the other hand still have a lot of potential. If you can get a model to stay on task with long context and remain grounded then you can start firing your staff.


Don't underestimate how much the long tail means to the general public.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: