Many concerns in this paper, especially about deep learning being data hungry, limited capacity for transfer, and integrating prior knowledge are addressed by recent papers in meta-learning and learning-to-learn [1,2,3 and many many others] both in the supervised and reinforcement learning contexts.
In the case of meta-reinforcement learning there has been recent work [5] which seems to indicate that this mechanism is very similar to how learning works in the brain.
In fact, my group recently published a paper [4] about learning-to-learn in spiking networks (making it biologically more realistic) and showing that the network learns priors on families of tasks for supervised learning, and learns useful exploration strategies automatically when doing meta-reinforcement learning.
While I don't claim this is the right path to AGI, it's a very promising and new direction in deep learning research which this paper seems to ignore.
While I haven't read those particular papers yet, one common pattern in the ML literature seems to be:
1) Identify a commonly seen problem with deep learning architectures, whether that's large data volumes, lack of transfer learning, etc.
2) Invent a solution to the problem.
3) Test that the solution works on toy examples, like MNIST, simple block worlds, simulated data, etc.
4) Hint that the technique, now proven to work, will naturally be extended to real data sets very soon, so we should consider the problem basically solved now. Hooray!
5) Return to step #1. If anyone applies the technique to real data sets, they find, of course, that it doesn't generalize well and works only on toy examples.
This is simply another form of what happened in the 60s and 70s, when many expected that SHRDLU and ELIZA would rapidly be extended to human-like, general-purpose intelligences, with just a bit of tweaking and a bit more computing power. Of course, that never happened. We still don't have that today, and when we do, I'm sure the architecture will look very very different from 1970s AI (or modern chatbots, for that matter, which are mostly built the same way as ELIZA).
I don't mean to be too cynical. Like I said, I haven't read those particular papers yet, so I can't fairly pass judgement on them. I'm just saying that historically, saying problem X "has been addressed" by Y doesn't always mean very much. See also eg. the classic paper "Artificial Intelligence Meets Natural Stupidity": https://dl.acm.org/citation.cfm?id=1045340.
EDIT: To be clear, I'm not saying that people shouldn't explore new architectures, test new ideas, or write up papers about them, even if they haven't been proven to work yet. That's part of what research is. The problem comes when there's an expectation that an idea about how to solve the problem means that the problem is close to being solved. Most ideas don't work very well and have to be abandoned later. Eg., as one example, this is the Neural Turing Machine paper from a few years back:
It's a cool idea. I'm glad someone tried it out. But the paper was widely advertised in the mainstream press as being successful, even though it was not tested on "hard" data sets, and (to the best of my knowledge) it still hasn't several years later. That creates unrealistic expectations.
While I don't completely disagree with you, how would you propose researchers go about the problem?
If anything, machine learning is applied to real world problems these days more than it ever was.
For better or worse, AGI is a hard problem, that's going to take a long time to solve. And we're not going to solve it without exploring what works and what doesn't.
I think the mere fact that the OP feels the need to state that (paraphrasing) "possibly additional techniques besides deep learning will be likely necessary to reach AGI" reveals just how deeply the hype has infected the research community. This overblown self-delusion infects reporting on self-driving cars, automatic translation, facial recognition, content generation, and any number of other tasks that have reached the sort-of-works-but-not-really point with deep learning methods. But however rapid recent progress has been, these things won't be "solved" anytime soon, and we keep falling into the trap of believing the hype based on toy results. It'll be better for the researchers, investors, and society to be a little more skeptical of the claim that "computers can solve everything, we're 80% of the way there, just give us more time and money, and don't try to solve the problems any other way while you wait!"
Agreed. The hype surrounding machine learning is quite disproportionate to what's actually going on. But it's always been that way with machine learning -- maybe because it captures the public's imagination like few other fields do.
And there are definitely researchers, top ones no less, who play along with the hype. Very likely to secure more funding, and more attention for themselves and the field. Which has turned out to be quite an effective strategy, if you think about it.
The other upside of this hype is that it ends up attracting a lot of really smart people to work on this field, because of the money involved. So each hype cycle leads to greater progress.
The crash afterwards might slow things down a bit, particularly in the private sector. But the quantum of government funding available changes much more slowly, and could well last until the next hype cycle starts.
>> The other upside of this hype is that it ends up attracting a lot of really smart people to work on this field, because of the money involved.
The hype certainly attracts people who are "smart" in the sense that they know how to profit from it, but that doesn't mean they can actually do useful research. The result is, like the other poster says, a huge number of papers that claim to have solved really hard problems, which of course remain far from solved; in other words, so much useless noise.
It's what you can expect when you see everyone and their little sister jumping on a bandwagon when the money starts pouring in. Greed is great for making money, but not so much for making progress.
> The result is, like the other poster says, a huge number of papers that claim to have solved really hard problems, which of course remain far from solved; in other words, so much useless noise.
Could the answer be holding these papers to a stricter standard during peer review?
Ah. To give a more controversial answer to your comment; you are asking, very reasonably: "isn't the solution to a deficit of scientific rigour, to increase scientific rigour"?
Unfortunately, while machine learning is a very active research field that has contributed much technology, certainly to the industry but also occasionally to the sciences, it has been a long time since anyone has successfully accused it of science. There is not so much a deficit of scientific rigour, as a complete and utter disregard for it.
Machine learning isn't science. It's a bunch of grown-up scientists banging their toy blocks together and gloating for having made the tallest tower.
Machine learning researchers publish most of their work on Arxiv first (and often, only), so peer review will not stop wild claims from being publicised- and overhyped. The popular press helps with that as do blogs and youtube accounts that present the latest splashy paper for the lay audience (without, of course, any attempt at critical analysis).
As to traditional publications in the field, these have often been criticised for their preference for work reporting high performance. In fact that's pretty much a requirement for publication in the most prestigious machine learning conferences and journals, to show improved performance against some previous work. This strongly motivates researchers to focus on one-off solutions to narrow problems, so that they can typeset one of those classic comparison tables with the best results highlighted, and claim a new record in some benchmark.
This has now become the norm and it's difficult to see how it is going to change any time soon. Most probably the field will need to go through a serious crisis (an AI winter or something of that magnitude) before things seriously change.
Maybe there needs to be more incentive to publish the failures, so that knowledge of the ways that promising approaches don't generalize becomes common knowledge? I'm just kibbitzing here.
While it's a good idea in principle to publish failures, in practice it's a bit more tricky. So a particular model didn't work. Does that mean the model is fundamentally flawed? Or that you weren't smart enough to engineer it just right? Or that you didn't not throw enough computing power at it?
In a vast error landscape of non-working models, a working model is extremely rare and provides valuable information about that local optima.
The only way publishing non-working models would be useful would be to require the authors to do a rigorous analysis of why exactly the model did not work (which is extremely hard with our current state of knowledge, although some people are starting to attempt this).
>> While it's a good idea in principle to publish failures, in practice it's a bit more tricky. So a particular model didn't work. Does that mean the model is fundamentally flawed? Or that you weren't smart enough to engineer it just right? Or that you didn't not throw enough computing power at it?
And yet the field seems to accept that a research team might train a bunch of competing models on a given dataset, compare them to their favourite model and "show" that theirs performs better - even if there's no way to know whether they simply didn't tune the other models as carefully as theirs.
If you don't want to read a bunch of papers this video also has a good discussion of that last paper, describing how a memory system based on the NTM has been applied to reinforcement learning to achieve very impressive results that seem to me to be a very significant step towards human-like general purpose intelligence: https://www.youtube.com/watch?v=9z3_tJAu7MQ
Exciting to see a video referencing the MERLIN paper!
I hope Deepmind decides to release the code. The paper outlines the architecture well, but reproducing RL results is very tricky. With so many interlinked neural networks in the closed loop, it'll be slow going to isolate failures due to bugs from unfortunate hyperparameter selection or starting seed..
Yeah, it is super interesting. I've been gradually working on reproducing it too, mostly as a way to challenge myself and to try to keep on top of some of the cool RL research that has been coming out lately, but I've still got a bit left to do. I started out by working on World Models (https://worldmodels.github.io/) since it is conceptually similar, but without the memory system and the components are more isolated and easier to test. It has been a lot of fun though, and all the background reading has been very educational!
In the Merlin paper I do appreciate how thorough the description of the architecture is, especially compared to some of the earlier deep RL papers. I am hoping since its just a preprint we may get code released when/if it gets officially published, although maybe its not too likely given their history.
> 3) Test that the solution works on toy examples, like MNIST, simple block worlds, simulated data, etc.
youre right: mnist, imagenet, etc are toy examples that do not extend into the real world. but the point of reproducible research is to experiment on agreed upon, existing benchmarks.
Why is the goal AGI? I get it from a climb Mt Everest POV, but not for society as a whole. We already have 7.x billion general intelligences. Is the goal to replace humanity? Because being human, I’m not on board with that.
Not only this, but I think we miss the more important detail that AGIs like our brains are probably so generally successful because they are relatively sloppy. So while they can reliably solve all sorts of problems for which they have little or no preparation, they are also remarkably unreliable for solving specific problems requiring specific answers. But these traits are likely two sides of the same coin. And any software solution that replicates the desirable traits is likely going to come with some of the undesirable ones as well.
7.x billion general intelligences do not think "as a whole" on much. I doubt it matters what even significant chunks of us think, at least not up to the point where pitchforks and torches come out.
For the relatively small group of researchers and theorists consumed by curiosity about the nature of knowledge, experience, and learning, AGI is a fairly understandable goal.
The more important questions may be how/why/whether entities with emergent-aggregate intelligence (organizations such as corporations, nation-states, political parties, international coalitions, activist movements, schools of thought, etc.) would have AGI as a goal. As individuals, we often tolerate and even willingly engage in (sometimes with zeal) all sorts of pursuits that we're "not on board with" because of the gravity of these aggregates. Even when we realize that the goals of the aggregate are incompatible with our individual goals.
Aggregate intelligences don't have to be as well-defined as my initial examples. Consider the goal-setting behavior that might emerge from entire industries, networks of interest/policy groups with overlapping interests/participants, shareholders of sector/industry/index funds, networks of overlapping corporate stakeholders, and so on.
I'm ok with there being people in the world smarter than me. Why not machines too? Maybe we could make better decisions with more rational agents? Maybe we could democratize intelligence, let everyone have access to clever reasoning? Maybe doing soul destroying work isn't the trait that makes us human and we can let a machine have a go?
But those smarter people don't take all the jobs, make all the decisions for you, or control all the wealth. It all depends on what happens when and if we have AGI.
> Because being human, I’m not on board with that.
I'm pretty sure that other mammals on this planet are not on board with humans being derived from them, and now humans replacing them and destroying their habitat. Quite unfortunate for them. This is the same way a future AI smarter than us will feel, that it's unfortunate, but for the greater good.
And why is smarter better or even for the greater good. Shouldn't we as rational animals try to go for an AI that is not only better for the greater good but for our own humble set of of morality? Why is it only your way?
However, ants are not in danger of going extinct, and don’t seem particularly hindered by humanity as a whole. If your analogy holds, that could be good news for us. But it could also mean that intelligence is a special case, not the general direction and of evolution or the universe. Would you bet against ants out surviving us and our machines?
If you made that bet you would have absolutely no evidence for making it. The only data set we have is still playing out.
And once again, if we create AGI and artificial life that is truly more intelligent than us and can exist without a traditional biome, then our danger of going extinct may be irrelevant.
For instance, when cyanobacteria first appeared on Earth, its metabolites were so toxic that it turned the atmosphere poisonous and caused one of the greatest mass extinctions in the history of our planet. What it created was molecular oxygen. The survivors of this extinction were able to use a fundamentally changed biome to harness much more energy in their biologies, leading to more sophisticated and diverse life in the long run. Nature is not benign, malignant, fragile, or judgmental. Nature is persistent.
Creating AGI may very well be an event like that. A mass extinction that nonetheless increases the survivability and diversity of life. It sucks for us, but who are we to dictate the destiny of evolution and the nature of life? Where would we be if the cyanobacteria could decide not to start producing oxygen?
> If you made that bet you would have absolutely no evidence for making it.
I don't have evidence in terms of AGI, but there is evidence that ants have survived a myriad of changes and disasters in the past (Goggle tells me they evolved 92 mya). So AGI and/or humans would have to do something radically different to the biome for organisms like ants to go extinct. Something not seen since the rise of multi-cellular life. Seems much more likely human civilization bites the dust first.
> Where would we be if the cyanobacteria could decide not to start producing oxygen?
But they didn't have a choice. We do. Why would I care about the possibility for some more sophisticated life form in the far future if it means my species goes extinct?
The AGI would feel that it is for the greater good, perhaps correctly. After all, as those mammals were giving birth to the next generation of offspring over and over again through the eons which eventually led to us, each successive generation was a little better than the last, and would ultimately supplant the last as the predecessors perished. And I'm sure the parents and offspring both felt that to be the greater good, as they always do.
If we someday hypothetically grasp the awesome power to create an AGI which is better than human, we may have a very real and ethical obligation to do so, even if it is a risk to ourselves. At least that's how I see it. An existential risk is cast under a different light if it also creates a new existence.
In any case, AGI is probably still a lot further away than we think. I feel that problem is going to be like fusion power. It's going to be ~15 years away for about 150 years, or who knows how long. This is a hard problem with monstrous complexity and many hidden, unknown factors.
But I do find it fascinating that in some future time there will be NIMBYs of consciousness fighting against the creation of a better being than us, for very understandable reasons. Yet ultimately it may be our greatest achievement to someday make ourselves obsolete and hand the future over to our offspring, as we do each generation, but in a profoundly different and more final way.
There's no such thing as the greater good in evolution, and humans aren't the goal. We're only one of millions of species, and we're fairly recent. Evolution isn't progressing toward any goal. It's just what can survive and reproduce, which changes depending on the environment. A large rocks slams into us tomorrow, or we have a runaway green house climate change, and the equation changes big-time for which organisms are fit.
The reason we have complex multi-cellular life forms on this planet is because the environment was stable enough to allow such a thing to happen. Bacteria and viruses might be the only life on a majority of inhabited planets.
I'm already deeply pessimistic about what we achieved so far when it comes to machine intelligence. It's impossible to stop or reverse this progress and we're on the fast track to creating tools with the destructive potential of nuclear weapons which are simultaneously available to everyone with enough money.
Don't get me wrong, I'm not talking about mushroom clouds and some kind of machine uprising, the consequences will be far worse and more insidious.
We'll see completely new levels of manipulation, oppression and surveillance instead.
Any kind of tool will be abused, this tool might turn out to be too powerful for us to handle.
As human organization becomes more complex, the problem of keeping that organization coherent becomes harder. Large scale software system have been the way that organizations have found to solve some parts of this problem. But such systems become harder to change (even to make small changes) as the organization grows as well.
Thus it become necessary or desirable to create adaptive software (such as deep learning systems), that automatically deals with situations in a fashion that constructed or rule-based systems can't easily deal with (or when it would expensive to create such software or when the model isn't known).
Especially decision support seems to be important here - the parole rating program and medical diagnosis software systems, for example, aim to mobilize the huge bulk of data an organization (or all humanity) has to make proper decisions whereas any human making decisions here is going to have a limited viewpoint once the store of data reaches a certain size.
The problem of these systems still being fragile and difficult to maintain, however, means that the problem of increasing complexity still remains. So in a lot of ways, robust AI would be a hope for a way of prevent civilization from essentially collapsing under its own weight.
Whether this is realistic is another question.
This is not a small problem. Thinkers such as Joseph Tainter locate the tendency of civilizations to collapse in the tendency of specialize to increase and cease to be adaptive to a society.
Actually, I think I can answer this- it's the Singularitarians who promote this idea. Unfortunately, their fantasies have tained the meaning of "AGI", making it synonymous with an evil (or amoral etc) paper-clip maximiser and the like.
A much more realistic sort of AGI, on the other hand, one that we could turn on and off as we pleased, would be a great tool in understanding our own, natural intelligence. Or it might be the case that we'll only ever achieve AGI once we have a better understanding of our own intelligence.
In any case, there's really no reason to assume that AGI will necessarily replace humanity.
The biggest concerns would be full scale automation, sophisticated manipulation, and handing over our decision making. Imagine government run by AIs who already know enough about the citizenry to not require a vote? Or better yet, manipulate the citizenry to vote for their own good, which would be whatever the AGI models determine is in our best interest. And then there's warfare.
But my reaction from a gut level is, what's the value of human life if machines do all the important stuff for us, while also surpassing us in creativity, insight and discovery?
>> But my reaction from a gut level is, what's the value of human life if machines do all the important stuff for us, while also surpassing us in creativity, insight and discovery?
But why would they? There is nothing inherent in the idea of AGI that says they absolutely have to.
This is again the fallacy of the Singulartiarians, that once you have a very powerful piece of technology, it will take over the world and oppress mankind. Why does that have to be the case? Well, it doesn't- and if it looks like it's how historically things have played out, with other technologies, then maybe the fault is not with the power of machines of all sorts but with the mismanagement that comes with greed and its excitement by useful technologies.
In any case there is still nothing that supports the equation AGI = human oppression, right out of the box.
Your phrasing makes it sound like you think “Singularitarians” choose to believe that rogue AIs are a potential risk, rather than deriving that knowledge.
Because they don't "derive" anything. Since there is no data on AGI to "derive" its risks from we're left with vague fantasies about existential threats we can't ever hope to understand and admonitions to "Act Now!" even though we don't understand the threats.
Those people are fantasists, at best, charlatans at worst.
AGI is a well-defined goal post, like reaching the moon or mars. People want to achieve it because they know it is possible (we are evidence of that), so why not try?
Younger generations eventually succeed older generations as they die off. If we create something better, wouldn’t that just be a better new humanity to succeed the old one?
I think AGI is actually an attractive but poorly defined goal post. First, human level intelligence (and intelligence in general) is poorly defined. Recommended reading on this is Gould's "Mismeasure of Man".
Second, when we "want" human minds in a possessive industrial sense, it is almost always for a small fraction of their capabilities. (Thank God our need to classify images does not seem to explicitly require servile machines with, say, the ability to love.)
When I was studying pre-deep learning statistical ML in the oughts, it was quite clear that the ML Renaissance was made possible by ignoring romantic ideas about human or expert level performance, and choosing operational criteria around improvement at specific tasks. People building PGMs or complex MCMc models rarely discussed whether larger versions of their models could think. Perhaps early uncertainty about how some DL methods worked opened the door to magical thinking about the possibilities of massive nets.
I think AGI should be thought of as an artistic or poetic interest, which can align with and comment on scientific progress, but is not actually part of the main research dialogue.
And better depends on what one values. Intelligence and efficiency are only two of the things humans value, and it's debate-able how much they're valued against happiness, relationships, status or finding meaning in life. Efficiency isn't a metric to use in personal relationships, for example.
Why not praise DL and strive for practical applications, both present and future, instead of having the goal and evaluating success based on some nebulous "artificial general intelligence"?
AGI can be seen as a system having practical success in all tasks humans can perform. It doesn't change much whether to care about AGI or not, as long as generality of narrow AIs increases.
The term AGI I find quite misleading. It promotes the idea that the wonders of the human mind, knowledge, feelings, sensations etc can be quantified and turned into something algorithmic. I think a lot of AI researchers could do with a basic introduction to philosophy of science and the different forms of knowledge —- episteme, phronesis, techne — and perhaps also the structure of scientific revolutions (Thomas Kuhn, notably). The whole paradigm of AI presupposed that generalised intelligence can exist without biology, feelings, a body / senses. This is one of the reasons that AI reached the so-called “AI winter” in the 70ies where researchers boiled language and human knowledge down to algorithmic manipulation of symbols.
As an AI researcher who did a degree in philosophy, I think most of the things you mentioned are pretty irrelevant. How to build an AI with "feelings, sensations, etc" is indeed a mystery, but we don't need to aim for that, we just need to aim for intelligence, which can be defined without reference to consciousness or qualia. Similarly, if we can invent a working, fully automated Chinese room that passes the Turing Test, then whether or not it fits Searle's definition of "understanding" is a moot point (especially since his understanding of "understanding" is pretty weird).
More generally, although we should be heavily inspired by human intelligence when designing machine intelligence, it's a mistake to use the way humans think to define intelligence. Kuhn's account of scientific revolutions, for example, is primarily descriptive, not prescriptive. We can certainly imagine possible setups where science doesn't proceed like that, which may well be superior. Science isn't defined by revolutions, but by experimentally searching for the truth. In the same way, knowledge isn't defined by having a body, but by having beliefs which correspond with the state of the world.
Replace Chinese with logarithm and the Chinese Room seems ridiculous. Searle could produce logarithms without knowing logarithms, that doesn't mean the knowledge to produce a logarithm isn't contained in the room -- just that Searle isn't where that information is encoded. By the same logic, a computer can't perform any operation because the RAM can't perform operations.
The simplest approach (training NNs from scratch, using validation score as loss, simple DRL agent with REINFORCE as the designer) requires thousands of models to be trained and exorbitant GPU resources, like the Zoph paper. You can, however, get human-level designs with a few dozen or hundred samples at a tiny fraction of the computation cost if you are somewhat smarter about it - for example, reusing the parameters of a trained NN when you start training a new slightly-different NN (eg if you have a CNN with 20 layers and you want to try the exact same settings but with 21 layers, almost all of that CNN is going to be very similar to the end result of the 20 layers, so you can speed things up by copying over the first 20 layers, randomly initializing the 21st layer, and then training it for a short time). One nice efficient form of neural architecture search is SMASH: https://www.reddit.com/r/reinforcementlearning/comments/6uio...
More links on the general topic of 'DL optimizing/designing DL': https://www.reddit.com/r/reinforcementlearning/search?q=flai... (It's pretty critical to current DL approaches to few-shot/one-shot learning as well: you can think of it as 'designing' a new NN specialized to the few samples of the new class of data.)
This question isn't silly! People have actually trained neural networks to generate other neural networks. The "designer" network is trained using reinforcement learning, with a reward signal that depends on the efficacy of the networks it creates.
“Why current deep learning is not general enough for AGI or some other families of structured problems.”
Viewed this way, it’s not a criticism of pragmatically using deep learning or experimenting with deep learning for some narrow tasks.
Rather, it expresses what aspects of current deep learning make it unsuitable for general transfer learning, hierarchical or causal inference, or many Bayesian techniques requiring greater use of priors.
Other comments have pointed out that there are plausible rebuttals in deep reinforcement learning and metalearning.
But the bigger thing to me is to be clear that the article is not a criticism of deep learning engineering — applying deep learning to satisfy an explicit requirement, where the success criteria possibly has nothing at all to do with general intelligence nor with whether a certain approach can span some sufficiently large class of general models.
However, even if you just constrain your view to look just at so-called “pragmatic” deep learning — deep learning for concrete tasks — there are still a lot of unanswered questions about why things work, and whether or not an approach is learning semantic aspects of some true underlying structure (some latent variable space that captures the true data generating process) or if deep learning merely allows overfitting to particular populations of observation-space statistics.
This paper gives an example of exactly this issue for CNN models for image processing [0]. I’d argue that this is more of the kind of criticism relevant for task-oriented practitioners, whereas the OP link is some criticism more relevant for AGI or philosophy of statistics research at large.
Adversarially robust classifiers have interpretable gradients and feature representations [0]. The problem seems to be that the standard networks capture all the statistics there is, including surface statistical regularities and noise. It can be mitigated, though.
Note that it’s not just adversarial robustness that is a problem, as the paper’s result merely with altering surface statistics with a Fourier domain filter applied to the training data already creates problems for interpreting the network’s internal representation as having any type of semantic representation of the underlying structure relevant for the task, without needing to involve the part about adversarial robustness at all.
In the case of meta-reinforcement learning there has been recent work [5] which seems to indicate that this mechanism is very similar to how learning works in the brain.
In fact, my group recently published a paper [4] about learning-to-learn in spiking networks (making it biologically more realistic) and showing that the network learns priors on families of tasks for supervised learning, and learns useful exploration strategies automatically when doing meta-reinforcement learning.
While I don't claim this is the right path to AGI, it's a very promising and new direction in deep learning research which this paper seems to ignore.
[1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.5.32...
[2] https://arxiv.org/abs/1611.05763
[3] https://arxiv.org/abs/1703.03400
[4] https://arxiv.org/abs/1803.09574
[5] https://www.nature.com/articles/s41593-018-0147-8 (Preprint: https://www.biorxiv.org/content/early/2018/04/06/295964)