Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yup. Exactly this. As soon as enough people get screwed by the ~80% accuracy rate, the whole facade will crumble. Unless AI companies manage to bring the accuracy up 20% in the next year, by either limiting scope or finding new methods, it will crumble. That kind of accuracy gain isn't happening with LLMs alone (ie foundational models).


Charitably, I don’t understand what those like you mean by the “whole facade” and why you use these old machine learning metrics like “accuracy rate” to assess what’s going on. Facade implies that the unprecedented and still exponential organic uptake of GPT (again see the actual data I linked earlier from Mary Meeker) is just a hype-generated fad, rather than people finding it actually useful to whatever end. Indeed, the main issue with the “facade” argument is that it’s actually what dominates the media (Marcus et al) much more than any hyperbolic pro-AI “hype.”

This “80-20” framing, moreover, implies we’re just trying to asymptotically optimize a classification model or some information retrieval system… If you’ve worked with LLMs daily on hard problems (non-trivial programming and scholarly research, for example), the progress over even just the last year is phenomenal — and even with the presently existing models I find most problems arise from failures of context management and the integration of LLMs with IR systems.


Time will tell.


My team has measurably gotten our LLM feature to have ~94% accuracy in widespread reliable tests. Seems fairly confident, speaking as an SWE not a DS orML engineer though.


Yeah, I've had similar results. Even with GPT-o1, I find almost all errors at this point come from the web search functionality and the model taking X random source as an authority. It's interesting that I find my human intelligence in the process is most useful for hand-collecting the sources and data to analyze -- and, of course, for directing the process across multiple LLM queries.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: