That really depends on who you asked. Transformers existed in 2019, and while GPT4 is genuinely advanced, it's not a gigantic leap. Deep learning is to neural networks what GPT4 is to transformers. It's a lot bigger, enabled by better hardware and more data, but fundamentally not a new technology. There needs to be a lot more leaps for AGI, language models are not the holy grail, but another tool like reinforcement learning, expert systems (another candidate for AGI in the 90s), soft-optimization. Each of those tools are better than GPT for some problems.
- Giving gpt4 short term memory. Right now it has working memory (context size, activations) and long term memory (training data). But no short term memory.
- Give it the ability to have internal thoughts before speaking out loud
- Add reinforcement learning. If you want to write code, it helps if you can try things out with a real compiler and get feedback. That’s how humans do it.
I think GPT4 + these properties would be significantly more capable. And honestly I don’t see a good reason for any of these problems to take decades to solve. We’ve already mastered RL in other domains (eg alphazero).
In the meantime, an insane amount of money is being poured into making the next generation of AI chips. Even if nothing changes algorithmically, we’ll have significantly bigger, better, cheaper models in a few years that will put gpt4 to shame.
The other hard thing is that while transformers weren’t a gigantic leap, nobody - not even the researchers involved - predicted how powerful gpt3 or gpt4 would be. We just don’t know yet what other small algorithmic leaps might make the system another order of magnitude smarter. And one more order of magnitude of intelligence will probably be enough to make gpt5 smarter than most humans.
You're guessing, which is a problem, and gives you license to conclude that:
> I don’t think agi will be that far away.
To,
> Give it the ability to have internal thoughts before speaking out loud
, you have no idea what that means technically because no one knows how internal thinking could ever be mapped to compute. No one does, so it's ok to not know, but then don't use it as a prior to guess.
Some have maybe seen this before, if you haven't I'll say it again: compute ≠ intelligence. That LLMs offer convincing phantasms of reasoning is what gives the outside observer the idea they are intelligent.
The only intelligence we understand is human intelligence. That's important to emphasize because any idea of an autonomous intelligence rests on our flavor of human intelligence which is fuelled by desire and ambition. Ergo, any machine intelligence we imagine of course is Skynet.
Somebody in some other conversation here pointed out the paperclip optimizer as a counterpoint but no, Bostrom makes tons of assumptions on his way to the optimizer, like that an optimizer would optimize humans out of the equation to protect its ability to produce paperclips. There's so many leaps of logic here which all assume very human ideas of intelligence like the optimizer must protect itself from a threat.
At the end of the day intelligence guesses. If it doesn't guess it's just an algorithm.
Now, if you step beyond your biases that we cannot make intelligent machines and instead imagine you have two intelligent machines that are pitted against each other. These could be anything from stock trading applications to machines of war with physical manifestations. If you want either of these things to work they need to be protected against threats, both digital or kinetic. To think AGI systems will be left to flounder around like babies is very strange thinking indeed.
The drive to protect ourselves from threats occurred via evolution, but our abhorrence at killing people to achieve our goals also came from evolution. If you assume the first is implausible, how can you keep the second?
Everything we assume about intelligence is based on what humans are like because of evolution. Figuring out what intelligence might be like without those evolutionary pressures is really hard.
Transformers came about in 2017, so extend the time horizon by a whopping 2 years and OP’s point remains.
AlexNet was 2012, which is what essentially kicked off the deep learning neural network revolution. The original implementation of AlphaGo utilized DNN in 2016 to beat Lee Sedol at Go, which people had been predicting was at least 5-10 years away.
So the timeline, spanning just under the last 12 years at this point:
AlexNet -> AlphaGo (4 years) -> Transformers (1 year) -> GPT4 (6 years)
Imagine showing GPT4 to someone in 2012. They’d think it’s science fiction. The rate of progress has been absurd.
This is more of a social/cultural perception issue. In 2012 we had Siri, Google Now, and S Voice. (Alexa and Cortana both came in 2014.) So the average layman would just be like, “oh, so that’s a fancier Siri.”
> Transformers existed in 2019, and while GPT4 is genuinely advanced, it's not a gigantic leap
I'm referring to the emergent capabilities that show up with scale, which are a gigantic leap. The usual citation suggests that this came into the public consciousness in 2020 [1]. Do you take issue with the timeline or the idea that these were a gigantic leap?
> Do you take issue with ... the idea that these were a gigantic leap?
Yes, the gigantic leap was transformers. No one thought we peaked with 340M or 1.5B parameters, in fact expectations from early work was that massively scaling was going to achieve zero-shot capabilities rather the emergent capabilities you're alluding to which are essentially variations of in-context learning, a relative disappointment.
Subsequent improvements in GPT-4, which seem to be mostly just more RLHF and MoE, are similarly not surprising and are temporizing measures while hardware and datasets are limited. Jury is still out whether the billions spent are worth it in terms of actually getting us closer to AGI.
It seems to be worth it for OpenAI/MS who are trying to be first to market and establish vendor lock-in.
Emergent capabilities are interesting but at the moment I don't think anyone has any idea how interesting they really are, or if they are truly a gigantic leap in the direction of AGI. I've skimmed a handful of papers trying to answer that question and there has hardly been a slam dunk proof that indeed they mean anything other than the fact that the transformer model can generate reasonable facsimiles of what a human might say.
The reality is that GPT isn't even great at answering factual questions at this point, despite all the hype. I have a modest amount of expertise in music theory and I've found that asking even relatively basic questions resulted in completely incorrect answers coming out of GPT. It's been impressive that in some cases if you say, "No, that's incorrect." it will actually go back and spit out a new answer which in some cases is correct, but that's hardly any fundamental advancement in "understanding", and just more evidence that it's parroting what it's been trained on, which includes substantial amounts of misinformation.
We really have replaced “oh no, you put too much knowledge into a LISP program and it’s sentient!” with “oh no, you’ve stacked too many transformers and now it’s sentient!”
Sentient or no, the power of stacking transformers shocked everyone. 1 or 2 more “happy accidents” like that and we’re in for an interesting few decades.