>The fact that scaling continues to work has significant implications for the ti...

askkk · on June 1, 2023

I always enjoy reading some of your comments, they ameliorate the hype about LLM and give a critical review. Anyway, I think a stronger model than GPT-4 could improve the way to use tools, so that the model is able to self-improve using tools. For example using all kind of solvers and heuristics to guide the model. I don't know how to estimate that risk just now.

Edited: Don't know if is a good thing to study the weak points of closed LLMs. Even asking LLMs can give hints about possible ways to improve. In my case I am happy I am certainly old and my mind is a lot weaker than before, but even in this case I prefer not to use LLMs for gaining insight because she will someday get a better insight than myself. But the lust of knowledge is a mortal sin.

sanxiyn · on June 1, 2023

Someone did that calculation and the result is here: https://www.reddit.com/r/slatestarcodex/comments/13u40yf/

100x GPT-4 to 85%.

Imnimo · on June 1, 2023

And, if I'm reading their calculation right, that's 85% on the medium-difficulty bucket, not even the entire HumanEval benchmark?

(quoting from the GPT-4 paper):

>All but the 15 hardest HumanEval problems were split into 6 difficulty buckets based on the performance of smaller models. The results on the 3rd easiest bucket are shown in Figure 2

PoignardAzur · on June 1, 2023

That does seem to support the idea that we're two or three major breakthroughs away from superintelligent AGI, assuming these scaling curves keep holding as they have.