dragongod2718's comments

dragongod2718 · on July 28, 2020

u/Gwern's [explanation on why Google didn't produce GPT-3 earlier](https://www.lesswrong.com/posts/N6vZEnCn6A95Xn39p/are-we-in-...):

As far as I can tell, this is what is going on: they do not have any such thing, because GB and DM do not believe in the scaling hypothesis the way that Sutskever, Amodei and others at OA do.

GB is entirely too practical and short-term focused to dabble in such esoteric & expensive speculation, although Quoc's group occasionally surprises you. They'll dabble in something like GShard, but mostly because they expect to be likely to be able to deploy it or something like it to production in Google Translate.

DM (particularly Hassabis, I'm not sure about Legg's current views) believes that AGI will require effectively replicating the human brain module by module, and that while these modules will be extremely large and expensive by contemporary standards, they still need to be invented and finetuned piece by piece, with little risk or surprise until the final assembly. That is how you get DM contraptions like Agent57 which are throwing the kitchen sink at the wall to see what sticks, and why they place such emphasis on neuroscience as inspiration and cross-fertilization. When someone seems to have come up with a scalable architecture for a problem, like AlphaZero or AlphaStar, they are willing to pour on the gas to make it scale, but otherwise, incremental refinement on ALE and then DMLab is the game plan. Because they have locked up so much talent and have so much proprietary code and believe all of that is a major moat to any competitor trying to replicate the complicated brain, they are fairly easygoing.

OA, lacking anything like DM's long-term funding from Google or its enormous headcount, is making a startup-like bet that they know the secret: the scaling hypothesis is true and very simple DRL algorithms like PPO on top of large simple architectures like RNNs or Transformers can emerge and meta-learn their way to powerful capabilities, enabling further funding for still more compute & scaling, in a virtuous cycle. And if OA is wrong to trust in the God of Straight Lines On Graphs, well, they never could compete with DM directly using DM's favored approach, and were always going to be an also-ran footnote.

While all of this hypothetically can be replicated relatively easily (never underestimate the amount of tweaking and special sauce it takes) by competitors if they wished (the necessary amounts of compute budgets are still trivial in terms of Big Science or other investments like AlphaGo or AlphaStar or Waymo, after all), said competitors are too hidebound and deeply philosophically wrong to ever admit fault and try to overtake OA until it's too late. This might seem absurd, but look at the repeated criticism of OA every time they release a new example of the scaling hypothesis, from GPT-1 to Dactyl to OA5 to GPT-2 to iGPT to GPT-3... (When faced with the choice between having to admit all their fancy hard work is a dead-end, swallow the bitter lesson, and start budgeting tens of millions of compute, or between writing a tweet explaining how, "actually, GPT-3 shows that scaling is a dead end and it's just imitation intelligence" - most people will get busy on the tweet!)

dragongod2718 · on July 28, 2020

I don't think this is an overestimation of intelligence. That ability is itself intelligence.

dragongod2718 · on July 28, 2020

The compute intensive methods are likely to deliver results much faster.

http://incompleteideas.net/IncIdeas/BitterLesson.html

dragongod2718 · on July 28, 2020

Why only a small margin?

dragongod2718 · on July 28, 2020

Wait, you think AI is overfunded?

dragongod2718 · on July 27, 2020

You both agree I think. He's not saying that GPT-3 invented the revolutionary ability of prompt programming, but that prompt programming allows GPT-3 to be applied to arbitrary contexts (from programming to providing legal advice to generating fiction). That amazing generality and high quality allow it to be applicable to most services.

So it's taking some slice of the $50tn pie.