Hacker News new | past | comments | ask | show | jobs | submit login

>The fact that scaling continues to work has significant implications for the timelines of AGI development. The scaling hypothesis is the idea that we may have most of the pieces in place needed to build AGI and that most of the remaining work will be taking existing methods and scaling them up to larger models and bigger datasets. If the era of scaling was over then we should probably expect AGI to be much further away. The fact the scaling laws continue to hold is strongly suggestive of shorter timelines.

If you understand the shape of the power law scaling curves, shouldn't this scaling hypothesis tell you that AGI is not close, at least via a path of simply scaling up GPT-4? For example, the GPT-4 paper reports a 67% pass-rate on the HumanEval benchmark. In Figure 2, they show a power-law improvement on a medium-difficulty subset as a function of total compute. How many powers of ten are we going to increase GPT-4 compute by just to be able to solve some relatively simple programming problems?




I always enjoy reading some of your comments, they ameliorate the hype about LLM and give a critical review. Anyway, I think a stronger model than GPT-4 could improve the way to use tools, so that the model is able to self-improve using tools. For example using all kind of solvers and heuristics to guide the model. I don't know how to estimate that risk just now.

Edited: Don't know if is a good thing to study the weak points of closed LLMs. Even asking LLMs can give hints about possible ways to improve. In my case I am happy I am certainly old and my mind is a lot weaker than before, but even in this case I prefer not to use LLMs for gaining insight because she will someday get a better insight than myself. But the lust of knowledge is a mortal sin.


Someone did that calculation and the result is here: https://www.reddit.com/r/slatestarcodex/comments/13u40yf/

100x GPT-4 to 85%.


And, if I'm reading their calculation right, that's 85% on the medium-difficulty bucket, not even the entire HumanEval benchmark?

(quoting from the GPT-4 paper):

>All but the 15 hardest HumanEval problems were split into 6 difficulty buckets based on the performance of smaller models. The results on the 3rd easiest bucket are shown in Figure 2


That does seem to support the idea that we're two or three major breakthroughs away from superintelligent AGI, assuming these scaling curves keep holding as they have.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: