I have been watching this space since Skip Thought Vectors in 2015. No. No one i...

CSMastermind · on Jan 4, 2023

> No one in 2018 suspected that large language models would smoothly scale up by simply increasing the number of parameters.

https://d4mucfpksywv.cloudfront.net/better-language-models/l...

That was kind of the entire point of GPT-2.

Computerphile summed it up pretty well on GPT-3's release: https://youtu.be/_8yVOC4ciXc

Here's some quotes from that video:

"The thing about gpt2 is just that it was much bigger than anything that came before. It was more parameters and was kind of the point of that paper."

...

"They made gpt2 because the curve wasn't leveling off. We've gone 117 times bigger than gpt2 and they're still not leveling off."