Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have been watching this space since Skip Thought Vectors in 2015. No. No one in 2018 suspected that large language models would smoothly scale up by simply increasing the number of parameters. There is no clear and obvious path from Attention is All you Need to InstructGPT, which just came out last February without hindsight.

Point at a single person, let alone "everyone" who predicted we would have AI-based coding assistance and be integrating this technology into a search engine by 2023. Anyone at all. I'd love to read a paper or even a blog post from 2018 predicting half the things that work now. ! You can't.

I've seen some hardcore goalpost moving before, but nothing as obviously provably wrong as "we're basically exactly where everyone predicted we'd be back in 2018."

Is this on the path to AGI? I doubt it. You need some sort of actor-critic component likely, though the RLHF stuff is working way better than it has any right to and already is far more agentic than the pure dumb language model of a year ago.



> No one in 2018 suspected that large language models would smoothly scale up by simply increasing the number of parameters.

https://d4mucfpksywv.cloudfront.net/better-language-models/l...

That was kind of the entire point of GPT-2.

Computerphile summed it up pretty well on GPT-3's release: https://youtu.be/_8yVOC4ciXc

Here's some quotes from that video:

"The thing about gpt2 is just that it was much bigger than anything that came before. It was more parameters and was kind of the point of that paper."

...

"They made gpt2 because the curve wasn't leveling off. We've gone 117 times bigger than gpt2 and they're still not leveling off."




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: