David Duvenaud, an assistant professor in the same department as Hinton at the University of Toronto, says deep learning has been somewhat like engineering before physics. “Someone writes a paper and says, ‘I made this bridge and it stood up!’ Another guy has a paper: ‘I made this bridge and it fell down—but then I added pillars, and then it stayed up.’ Then pillars are a hot new thing. Someone comes up with arches, and it’s like, ‘Arches are great!’” With physics, he says, “you can actually understand what’s going to work and why.” Only recently, he says, have we begun to move into that phase of actual understanding with artificial intelligence.
Hinton himself says, “Most conferences consist of making minor variations … as opposed to thinking hard and saying, ‘What is it about what we’re doing now that’s really deficient? What does it have difficulty with? Let’s focus on that.’”
Last I checked bridges came before Newtonian Mechanics and it seems strange to argue this wasn't a good thing. Admittedly paper writing wasn't the main mechanism of transmitting knowledge but it's fairly common for human engineering to come before the full theoretical foundations as opposed to after.
It's not that bridges before Newton were bad, it's that Newton gave us the ability to design the strongest possible bridge of a given shape with the materials at hand - using not just calculus but calculus-of-variations, a subject nearly as old as Newtonian mechanics [1]. With this knowledge, what happens when one adds one or two columns to a bridge is now longer "news" the way it might have been before Newtonian mechanics.
A stereotypical picture of an engineering approach without scientific knowledge would be a list of ways to do stuff combined with hints about how to vary the approach per-situation. It requires lots memorizing, trial-and-error and experts that often can't fully explain their reasoning. It's easy to believe bridge-building before Newton was like this though I'm not an expert. Present day AI sounds a lot like from what I've read (though I'm not an expert here either).
Edit: And yes, one could argue that the progress Newton ushered in merely replaced one list of models with a higher, more general list of models - yes, but that is how progress gone so far.
I completely agree with your assessment, but the problem is a bit worse in my opinion. We already have a pretty firm grasp of how different ML systems learn and converge towards a solution in the average case. It's not that we need to understand our neural networks better, it's that we need to understand our problem domain better. We can't determine how well some ML architecture will perform at an object recognition problem without some math describing object recognition. This makes things a lot more complicated, because it means we have to do a lot more work to understand every single application where we want to use ML.
And, of course, if we had some really good mathematical framework for describing and reasoning about object recognition, we probably wouldn't need to turn to ML to solve it ;)
The whole point of Deep Learning is that we don't want to describe math behind object recognition; it was the failed "classical" approach where people spent decades figuring out complex features which worked horribly. Deep Learning is actually pretty simple, well understood and parallelizable, and it's basically a billion-dimensional non-linear optimization. As optimization is infested with NP-hard problems, it's as difficult as it gets. It's actually amazing what we can do with it in the real world right now (and we are still far away from seeing all its fruits). Of course, it would frustrate academics that can't base AGI on top of it, but did they really think this approach would do it anyway?
Deep learning does not seem to abstract very well. Train on a data set then test with images that are simply upside down and the preformance can be significant.
Feature extraction also works much better when you toss a lot of data and processing power behind it. So, a lot of progress is simply more data and computing power vs better approaches. Consider how poorly deep leaning works when using a single 286.
> Deep learning does not seem to abstract very well. Train on a data set then test with images that are simply upside down and the preformance can be significant.
But that's true of people too. How quickly can you read upside-down?
If you trained on a mixture of upside-down and right way up images, and tested on upside-down images, performance wouldn't take that much of a hit.
Sure, the problem is we are more willing to ignore failures that are similar to how we fail. IMO, when we compare AI approach X vs. Y we need to consider absolute performance not just performance similar to human performance.
Deep learning for example gains a lot from texture detection in images. But, that also makes it really easy to fool.
While I can't easily read upside down text, I can instantly recognize it as not only text, but that it needs to flipped upside down in order to be read. That's something current "deep learning" AIs can't do reliably, if at all.
If I had to describe the root cause of this problem it would be that humans process "problems" rather than "things" and we "learn" by building an ever growing mental library of problem solving algorithms. As we continue to "learn", we refine our problem solving algorithms to be more general than specific. Compare that to a deep learning AI that learns by building an ever greater data library of things while refining algorithms to suit ever more specific use cases.
I think you're describing a level of generalization above the application at hand. We could easily train a neural network to recognize the orientation of a font, and then build an orientation invariant "reading" app by first recognizing the rotation of the text, transforming it so it is right side up, and then recognizing as normal.
I tend to imagine our brains works similarly. It's not that you have a single "network" in your brain that recognizes test from all angle, but your brain is a "general purpose" machine with many networks that work together. I think current deep learning techniques are great for discrete tasks, and the improvement needed is to have many networks that work together properly with some form of intuition as to what should be done with the information at hand.
It's not that we need to understand our neural networks better, it's that we need to understand our problem domain better.
How 'bout "creating models that can work with more dimensions of the problem domain than are conveyed by standard data labeling"?
I mean, we don't simply want AI but actually "need" it in the sense that problems like biological system are too complex to understand without artificial enhancements to our comprehension processes - thus to "understand the problem domain better" we need AI. If it's true that "to build AI, we need to understand the problem domain better", it leaves us stuck in a chicken-and-problem. That might be the case but if we're going find a way out, we are going to need to build tools in the fashion humans used to solve problems many times before.
It will probably play out like a conversation. A data scientist trains an ML model, and in analyzing the results discovers some intrinsic property or invariant of the problem domain. The scientist can then encode that information into the model and retrain. And that goes on and on, each time providing more accurate results.
As an aside, I think it's important that we find a way to examine and inspect how an ML model "works". If you have some neural network that does really well at the problem, it would be nice if you could somehow peer into it and explain, in human terms, what insight the model has made into the problem. That might not be feasible with neural networks, as they're really just a bunch of weights in a matrix, but this is practical for something like decision trees. Just food for thought.
This is somewhat practical for neural networks. For example, instead of minimizing the loss function, why not tweak the input to maximize a neuron’s activation? Or with a CNN, maximize the sum of a kernel’s channel? This would tell us what the neuron corresponds with. This is what Google did with DeepDream.
Now, I say somewhat because results can be visually confusing, ex Google’s analysis. Even then, we can see the progression of layer complexity as we go deeper into ImageNet. Plus, we can see mixed4b_5x5_bottleneck_pre_relu has kernels that seem to correspond with noses and eyes. mixed_4d_5x5_pre_relu has a kernel that seems to correspond with cat faces.
A data scientist trains an ML model, and in analyzing the results discovers some intrinsic property or invariant of the problem domain. The scientist can then encode that information into the model and retrain. And that goes on and on, each time providing more accurate results.
Mmmaybe,
It's tricky to articulate what pattern the data-scientist could see ... that an automated system couldn't see. Or otherwise, perhaps the whole "loop" could be automated. Or possibly the original neural already finds all the patterns available and what's left can't be interpreted.
The human participant may consider multiple distinct machine results, each a point in the space of algorithm, data set, bias applied to the problem domain. Human intuition is injected into the process and the result will be greater than the sum of the machines and a lone human mind.
What is interesting to note, now that above idea is considered, is that this process model itself belongs to the set of human-machine coordinations. Another process model is where low level human mind is used to perform recognition tasks too hard (or too slow) for machine to perform, for example using porn surfers to perform computation tasks via e.g. captcha like puzzles.
Long term social ramifications of all this is also interesting to consider as it motivates machines to breed distinct types of humans ;)
I imagine you need the data science to discern semantically relevant from irrelevant signals. How else do you “tell” your model what to look for? You could easily train for an irrelevant but fitting model.
It's work remembering that many times progress of held back by ideas that "aren't even wrong." The Perceptrons book wasn't wrong; it just attacked the wrong questions with an inadequate level of certainty in it's assumptions. It may be that we feel that we understand where machine learning is at now, but actually have a huge amount to learn because of inadequacies that we aren't even aware of.
Present day AI on the Deep Learning side is a lot like what you describe. We haven't really had the Newtonian foundations yet. The theoretical foundations are quite limited because they are hard to figure out. But the techniques with less established theory work far better on most applications in AI. Redirecting work into areas of AI that have more solid theoretical foundations but worse application performance is not the way forward. I'm all for figuring out hard theoretical foundations but I'm strongly opposed to redirecting research funding to techniques that result in worse applications. I'd also argue modelling the physics isn't always the right approach: vocal tract modelling for speech is an interesting approach that produces much worse speech than state of the art synthesis techniques. It will probably continue to do so for a long time. For vocal tract modelling to produce better synthesis you'd need the physical model to be less lossy in all it's parameterizations and modelling simplifications than any statistical fitting of data. And you'd still need some statistical model of the choices the human makes in producing speech and you'd want that statistical model to work better than the neural network that takes on a larger portion of the problem and replaces the physical model of sound production.
You're right that the analogy doesn't imply that something analogous to physic is the answer.
However, I would mention that there's a larger "overhead" than many realize to methods which work without the creator or the user understanding why. You have "racist" AI which don't undertstand that correlation may not be causation in questions like whether someone should be paroled or get a loan, you have the AIs subject to adversial attacks of various sorts, where not knowing why the AI works is also problematic, you have a situation where the target to match varies over time and so-forth.
Which adds up to AI having more dimensions to it than simply "working well" and "working less well". Indeed, AI is effectively ad-hoc statistics with result derived heuristically.
So in the process of "getting things right" exploring all sorts of things certainly sounds good, it seems like there's an "understanding gap" that needs to be closed and some broader model of what's happening would be useful but naturally there's no guarantee we can find one.
I should point out in this case it's almost certainly a genuine call to research the foundations underlying the working techniques more as Duvenaud publishes research using mostly the techniques that work well on applications.
It's been a long time since I got my BE in Mechanical Engineering, but I still remember being struck by the difference between well-understood engineering and rule-of-thumb engineering.
Bridge building is mostly well-understood engineering. When you study Static Mechanics [0] you learn all sorts of Physics equations, including Newtonian Mechanics, that completely describe the forces and motions of a structure based on measurable physical properties of the materials used and details about the shapes of those materials.
When you get into Fluid Dynamics, things are different. You start to encounter a bunch of things like Reynolds number [1], which is a dimensionless value related to turbulence that you just have to look up for the particular fluids and velocities you're working with. This number is pretty well defined, but there are a lot of others and their definitions and meanings aren't nearly as clear as F=ma. Back when I was in school, particle simulations for turbulent fluids was just beginning to be feasible, so to design something you plugged in dimensionless constants and didn't worry about the unpredictable fine-details. An example of this is the wind blowing through a bridge's structure, and water flowing around its base. The equations don't give you exact forces that the turbulent air and water will exert; they give you more of an average over time. A simulation, if you can do it, can show you things (like resonance) that the equations won't show you.
Then there was Strength of Materials. Here, the big thing was the Factor of Safety [2]. This is solidly in the rule-of-thumb engineering camp. This is where the engineer says "I think two 16" steel beams would be sufficient... so lets use three 20" beams just to be sure." This is still the way a lot of engineering design is done, because the real world is never precisely known, and the factor of safety will save you when something unexpected happens.
The "rule of thumb" engineering that you speak of made me remember the different constants that were taught we should just accept as is because, well, it is considered constant. Nevermind where the guy in the book got it from, this is what works and this is what people in the industry has accepted to be standard.
But a good degree should provide you with at least some of the insight and intellectual equipment to check for yourself or to smell a rat when you encounter a complex situation so that you can call for help rather than watch in horror as the house collapses on your clients.
OTOH, we are merely at circa Year Five into deep reinforcement learning research.
It started as a cluster of 16M CPUs having taught itself to recognize a cat 95% of the time after training on 1B google images.
And we are now at One-Shot Imitation Learning, "a general system that can turn any demonstrations into robust policies that can accomplish an overwhelming variety of tasks".
Not really, no. Saying we are at year five of Deep RL is about as informative as saying we are at year five of deep learning. Reinforcement learning as a field goes back decades.
But now we have GPUs, which makes it entirely different. /s
And it kinda does, but in an engineering way rather than a statistics way.
Like reinforcement learning from pixels is pretty new (i would be really interested if you have 10+year old citations), and pretty amazing. I've been looking at RL (through OpenAI gym) and realising that I "just" need to annotate a bunch of images and then train a network that will predict (fire/no fire in Doom from those pixels, and I can just add another network that builds some history onto this net (like an RNN) and this might actually work, is kinda amazing.
I'm still not sure I believe that it's always a good approach, but some of my initial experiments with my own (mostly image so far) data have been pretty promising.
The hype is pretty annoying though, especially if you've been interested in these things for years.
The bar to entry for these kinds of applications has been significantly lowered, which means we'll see more of it. I guess, in some sense, it's similar to the explosion of computer programs following the advent of personal computers (maybe, I haven't thought deeply about this part).
I'd like to believe that GPU's and cloud might allow for more scientific exploration of the "hows" of learning via many small experiments gradually revealing limitations and characteristics until finally insight.
Using high speed hardware can allow someone to do 10's or scores of runs a day. If you are doing one every 2 weeks or so then it's really, really hard to make any progress at all because you daren't take risks. So the productivity of 80 a day vs 2 per month isn't just 100x it's lots and lots more.
Also as you say it's lowered the bar which means that teams can onboard grad students and interns and get them to do something that's useful - it may be trivial - but it's useful.
It's easy to recognize a cat 95% of the time. I can write a program in 30 seconds that will recognize a cat 95% of the time. No, wait, this just in! My program will recognize a cat 100% of the time! The program has just one line:
Tutorial: So, with that program, whenever the picture is a cat, the program DOES recognize it. So the program DOES recognize a cat 100% of the time. The OP only claimed 95% of the time.
Uh, we need TWO (2), that's TWO numbers:
conditional probability of recognizing a cat when there is one (detection rate)
conditional probability of claiming there is a cat when there isn't one.
The second is the false alarm rate or the conditional probability of a false alarm or the conditional probability of Type I error or the significance level of the test or the p-value, the most heavily used quantity in all of statistics.
One minus the detection rate is the conditional probability of Type II error.
Typically we can adjust the false alarm rate, and, if we are willing to accept a higher false alarm rate, then we can get a higher detection rate.
With my little program, the false alarm rate is also 100%. So, as a detector, my little program is worthless. But the program does have a 100% detection rate, and that's 5% better than the OP claimed.
If focus ONLY on detection rate, that is, recognizing a cat when there is one, then it's easy to get a 100% detection rate with just a trivial test -- just say everything is a cat as I did.
What's tricky is to have the detection rate high and the false alarm rate low. The best way to do that is in the classic Neyman-Pearson lemma. A good proof is possible using the Hahn decomposition from the Radon-Nikodym theorem in measure theory with the famous proof by von Neumann in W. Rudin, Real and Complex Analysis.
My little program was correct and not a joke.
Again, to evaluate a detector, need TWO, that's two, or 1 + 1 = 2 numbers.
What about a detector that is overall 95% correct? That's easy, too: Just show my detector cats 95% of the time.
If we are to be good at computer science, data science, ML/AI, and dip our toes into a huge ocean of beautifully done applied math, then we need to understand Type I and Type II errors. Sorry 'bout that.
Here is statistical hypothesis testing
101 in a nutshell:
Say, you have a kitty cat
and your vet does a blood count,
say, whatever that is, and gets a number.
Now you want to know if your cat is sick or healthy.
Okay. From a lot of data on what appear to be healthy
cats, we know what the probability distribution is for the blood count number.
So, we make a hypothesis that our cat is healthy. So, with this hypothesis, presto, bingo, we know
the distribution of the number we got. We call this the null hypothesis because we are assuming that the situation is null, that is, nothing wrong, that is, that our cat is healthy.
Now, suppose our number falls way out in a tail of that distribution.
So, we say, either (A) our cat is healthy and we have observed something rare or (B) the rare is too rare for us to believe, and we reject the null hypothesis and conclude that our cat is sick.
Historically that worked great for testing a roulette wheel that was crooked.
So, as many before you, if you think about that little procedure too long, then you start to have questions! A lot of good math people don't believe statistical hypothesis testing; typically if it is their father, mother, wife, cat, son, or daughter, they DO start to believe!
Issues:
(1) Which tail of the distribution, the left or the right? Maybe in some context with some more information, we will know. E.g., for blood pressure for the elderly, we consider the upper tail, that is, blood pressure too high. For a sick patient, maybe we consider blood pressure too low unless they are sick from, say, cocaine in which case we may consider too high. So, which tail is not in the little two set dance I gave. Hmm, purists may be offended, often the case in statistics looked at too carefully! But, again, if it's your dear, total angel of a perfect daughter, then ...!
(2) If we have data on healthy kitty cats, what about also sick ones? Could we use that data? Yes, and we should. But in some real situations all we have a shot at getting is the data on the healthy -- e.g., maybe we have oceans of data on the healthy case (e.g., a high end server farm) but darned little data on the sick cases, e.g., the next really obscure virus attack.
(3) Why the tails at all? Why not just any area of low probability? Hmm .... Partly because we worship at the alter of central tendency?
Another reason is a bit heuristic: By going for the tails, for any selected false alarm rate, we maximize the area of our detection rate.
Okay, then we could generalize that to multidimensional data, e.g., as might get from several variables from a kitty cat, dear, angel perfect daughter, or a big server farm. That is, the distribution of the data in the healthy case looks like the Catskill Mountains. Then we pour in water to create lakes (assume they all seek the same level). The false alarm rate is the probability of the ground area under the lakes. A detection is a point in a lake. For a lower false alarm rate, we drain out some of the water. We maximize the geographical area for the false alarm rate we are willing to tolerate.
Well, I cheated -- that same nutshell also covers some of semester 102.
For more, the big name is E. Lehmann, long at Berkeley.
It can be done, even today. If you work outside the US and work on cheap things (i.e. no special equipment), especially if you can teach then you can hang around for a long time.
I have met a lot of academics like this over the years, but I think your broader point might be that this is not possible today, which I agree with, and which is why I left academia (modulo personal situations).
Humanity used fire for a long time before combustion was understood. Even today Anesthesia is not well understood at biological/physiological level that has not stopped its safe use and innovation through Clinical Trials. Maybe competitions and empiricism are the best approaches to building intelligent systems. Why get caught up in Physics/Math envy?
I took his point not to criticize those early stages, but simply to acknowledge them as such. Early fire users could not have built a rocket no matter how many experiments they performed until they understood combustion (and some other sciences).
In AI, we're not building rockets yet, but we have some really awesome and really powerful bonfires or whatever.
At least that's how I understood his point.
(And in anesthesia, when we do understand those things, we may very well look on our use today as barbaric or dangerous.)
Hinton isn't saying "Let's stop using fire," but "Let's understand the principles behind fire so we can use them in more sophisticated, informed and powerful ways."
The ML community did take the Theory approach trying to prove bounds, SLT/SRM, PAC, etc. and that was an excercise in futility. While I don't deny that there is value to looking under the hood but for a long while the community abandoned any empirical results that didn't fit their paradigm. Between rigorously validating their methods and writing yet another 4 page long proof. A lot of researchers would prefer latter, effectively locking out empirical approaches from most dissemination venues and eventually funding.
Because an improved theoretical understanding of how complex systems work can be incredibly valuable. Anesthesia is a good example - we get a lot of value from it, yes, but it would be way better if we could tailor dosages to individuals based on an understanding of how that individual will experience pain. There would be fewer severe complications, but also maybe you could wake up refreshed an hour after surgery instead of in a stupor.
If this could work with computer-trained models, that would be incredible too. What could a great speech understanding system teach us about language? What tricks from a facial-expression classifier could help autistic kids understand their friends?
The biggest deficiency in AI is that we still don't have artificial systems which simulate human thought with any fidelity. Sooner or later that's bound to become a focus of attention.
except that this has really only been going on for five years, which is nothing in the scale of human history or even of human rational thought. Some record number of people/scientists are working on getting the physics level understanding to happen, with crazy record breaking year after year quantity of people publishing and attending scientific conferences that fill up in like two days now. it is happening, and will happen even more in depth as time goes on
The fundamental problem with AI is the high dimensionality of the solution space. We simply can't understand why the brains we are building can think better than us. We can build smarter brains only by trial and error - at least until error outsmarts us, reproduces and takes over.
I feel like Hofstadter was one of those people thinking really deeply about AI.
Anyone who doesn't know what I'm talking about should read 'Goedel, Escher, Bach', or 'Fluid Analogies'. I haven't read them in a long while, but I'm sure they're going to be relevant for decades, because they deal with the fundamental challenge of what it means to think. Backpropagation may be part of the puzzle, but the brain (and intelligence) is so much more than that.
Hinton's quote is taken a bit out of context though. I just watched his interview on Andrew Ng's "Neural Networks and Deep Learning" class on Coursera and he seemed convinced that the next "breakthrough" will come from (a variant on) neural networks.
But maybe there are no universal laws that govern AI like physics governs bridges? AI is something that finds universal laws in stuff - there is no meta level over this - all the meta is AI itself.
That's certainly one way to do it. However, we didn't succeed at building modern aircraft or earth moving machinery by building simulations of birds or muscles. There's enough that is unknown out there for a variety of approaches.
> 1) learn how the brain works 2) build a simulator
I disagree that step #1 is important.
Consider the "Air-foil", which led to flight. In one sense, its an approximation of the wings of birds and other animals.
But ultimately, the discovery that the "Air-foil" shape turns sideways blowing wind into an upward force now called "lift" is completely different from how most people understand bird wings.
Bird Wings flap, but Airplane Air Foils do not.
--------
Another example: Neural Networks are one of the best mathematical simulations of the human brain (as we understand it, as well as a few simplifications to make Artificial Neural Networks possible to run on modern GPUs / CPUs).
However, the big advances in "Game AI" the past few years are:
1. Monte Carlo Tree Search -- AlphaGo (although some of it is Neural Network training, the MCTS is the core of the algorithm)
2. Counterfactual Regret Minimization -- The Poker AI that out-bluffed humans
There are other methodologies which have proven very successful, despite little to no biological roots. IIRC, Bayesian Inference is a widely deployed machine learning technique (for some definition of Machine Learning at least), but has almost nothing to do with how a human brain works.
An interesting field of AI is "Genetic Algorithms", which have biological roots but not anything based on the biology of brains, to achieve machine learning. Overall, a "Genetic Algorithm" is really just a randomized search in a multidimensional problem, but the idea of it was inspired by Darwinian Evolution.
> Monte Carlo Tree Search -- AlphaGo (although some of it is Neural Network training, the MCTS is the core of the algorithm)
AFAIK, this is not correct. Many of the Go playing algorithms before AlphaGo used MCTS or some variant. The true breakthrough of AlphaGo was deep reinforcement learning.
> AlphaGo's performance without search
The AlphaGo team then tested the performance of the policy networks. At each move, they chose the actions that were predicted by the policy networks to give the highest likelihood of a win. Using this strategy, each move took only 3 ms to compute. They tested their best-performing policy network against Pachi, the strongest open-source Go program, and which relies on 100,000 simulations of MCTS at each turn. AlphaGo's policy network won 85% of the games against Pachi! I find this result truly remarkable. A fast feed-forward architecture (a convolutional network) was able to outperform a system that relies extensively on search.
https://www.tastehit.com/blog/google-deepmind-alphago-how-it...
I don't know whether AlphaGo Master (the next version of AlphaGo that was trained purely with self-played games and has not been beaten in 60+ games) even uses MTCS.
That said, I agree that learning how the brain works seems unimportant and unnecessary. Evolution doesn't know how a brain works, but it's given us Einstein, Michelangelo, and conversations on HN.
It seems really important to learn how to build evolution into attempts at AI, given that evolution is the only known mechanism that leads to what we recognize as intelligence.
you use antropomorphy to reflect on your own standpoint. we don't know how the brain works? we can feel it and psychologist have a huge body of work concerned with the topic and that is already having influence on competition and fitness.
Consider the "Air-foil", which led to flight. In one sense, its an approximation of the wings of birds and other animal
Not true; "lift" was well known for thousands of years, horizontal "lift" is how ships sail upwind. The breakthrough for the Wright bros was making something light enough to make use of this phenomenon vertically.
Medical research hasn't cracked step 1 either, at least not to a point of accurate simulation.
Besides, if you could simulate a human brain, you will end up with something that needs to sleep, something with limited and unreliable memory, something that gets bored and distracted, something emotionally needy, etc.
Then the extending of this chaotic, messy system is wildly unknown even if we could get a piece-for-piece replication to work. Such a thing would be of great benefit to medicine, but not really for AI to even start with until medicine is done reverse engineering it.
Piece-for-piece replication might not be the right level of abstraction. Blue Brain project is one unfortunate example, on the other hand the current neural nets are stuck with neural model from 1943.
In defence of AI researchers 1 is very, very hard and to the best of our knowledge there is not one way the brain works. The brain is a complex, cobbled together set of systems all using different ways of problem solving.
most AI researchers have never opened a textbook on cognitive psychology or neurobiology , or any of these 'soft' sciences.
how do you plan to build artificial intelligence with no model of intelligence, without learning about important experiments in learning and memory , it's the complete ignorance that drives me crazy.
Most of those experts aren't looking to solve general AI problems, they're looking for solutions to specific problems like basic image recognition. And you don't need a full human brain to do that, and you don't need to conform to the way humans and other biological systems do it. You're not aiming for full human intelligence, so you don't need to care too much about how humans learn.
That said, I find when trying to solve a problem with ML techniques, it's better to use someone who knows the problem domain really well than someone who only knows ML really well. Someone who really understands the problem they're trying to solve can encode that knowledge into their models when training the system. While I've seen people who really know ML but lack the specific domain knowledge labor for weeks, coming back to me with "discoveries" that are already well known.
Yes all of this is true. I do think studying how the brain works will provide very useful ideas of what might work in AI. At the very least it is a very interesting area to learn about.
We know the brain and associated sensor behaviours are too large for us to fully simulate in a reasonable way on anything resembling current hardware (We also can't fully model it but as we approached the size of hardware to do so we'd probably solve many of the problems of doing so). So which hacks and shortcuts do you want to apply to reduce the dimensionality to something runnable? Step 1) will take far too long so AI research looks for things it can do well in the category of 2) without being a full simulation. Deep Learning has been unreasonably effective here.
"engineering before physics" is exactly wrong. No one did Engineering before a sophisticated understanding of Physics was achieved. They built bridges and towers, Engineering enables statements to be made about the performance of machines and buildings; it will survive a wind like x, you can do n cycles, do not load the wings in this way.
Translate a first year engineering paper on structures into Latin.
Ask Roman to sit said paper.
What will happen and why? The Roman chap will look very confused and will make statements (in Latin) about how stupid this stuff is and how it has nothing to do with proper engineering. The Roman will score 0. The why is that the understanding of structures and materials in the ancient world was artizanal, based on trade knowledge (often secret and hard to reproduce) and not systematic, based on the scientific method and inspectable or testable.
Currently we accept that knives, cabinets and sheds may be built or made using artisanal knowledge, we do not accept that apartment blocks, aircraft or automobiles are built this way. Society insists that these are built using systematic knowledge because otherwise they sometimes fall down or crash.
The systematic approach to aircraft is the best example - think how much civil air traffic there is now, and how rare air crashes are. The issues of subsonic flight have been systematically accounted for, right up to the point where we now see 1:2,000,000 crashes per flight.
Mechanical, aeronautical and civil engineering proceed in this way. Issues are discovered with mechanisms or structures or materials, these are characterized with scientific investigation, the characterizations lead to constraints and parameters that are required to be accounted for in future designs and old designs are re-evaluated in the light of the new knowledge.
Stating that you will build a new building in a certain way because domes are strong and concrete is strong would not cut the mustard in the modern world... The parthenon has stood for 2000 years, but how many similar structures collapsed after a few months?
I think you underestimate how smart your ancestors were to bring you to the point in time that you now exist. No offense, but the "best" Roman engineer was probably smarter than the vast majority of us.
There is a bit of "can't see the forest for the trees" failure in the article. AI is spearheading a paradigm shift in how we write programs. Or rather, we don't write programs. We write much much shorter programs that search the program space for programs that satisfy some desiderata.
The programs we get as the output of the search process are extremely flexible, work very well, are very homogeneous in compute (e.g. conv/relu stacks), and never crash or memory leak. These are huge benefits compared to classical programs.
So sure, backprop (the credit assignment scheme that gives us a good search direction in program space, one of multiple techniques that could do so) is pervasive, but AI is starting to work primarily as a result of a deeper epiphany - that we are not very good at all at writing code.
>So sure, backprop (the credit assignment scheme that gives us a good search direction in program space, one of multiple techniques that could do so) is pervasive, but AI is starting to work primarily as a result of a deeper epiphany - that we are not very good at all at writing code.
Isn't it applicable to a class of programs only? Best example of which is Computer Vision. Or do you imply your argument to hold for a wider set of programs. I can think of a large set of programs in which direct coding of logic, rather than discovery, is more suited.
For example take sorting. I guess, sorting could also be taught to the machine, by having a training set. But what about the latency of the discovered program. Also what about the proof of such a sorting program, which is discovered by Machine learning?
Must add, that I largely agree to your excellent point regarding discovery of programs. But I am not sure about its wide applicability. In fact, I contend that it applies only to a subset of all programs. Particulary those which have been traditionally difficult to code.
So in that sense, now making a tangential point here, it is good that more complex applications are now possible, by combining both kind of programs. And there will be more programming work in the future.
Daniel Hillis wrote a nice paper on machine-learned sorting networks decades ago. I think they were for fixed-size inputs, but the good news is that you can validate them using a 0-1 principle.
Just a quick question/remark: I have a feeling that it's best to think about DNNs as approximating a function, not a program. (only then you obtain a program as a result of applying this function). But because mathematically, you can formalise your big NN as one big parameterised function, I think it's more correct to view a NN as a function...
Optimisation in program space would be trying to find both the structure (connections and activation functions) and weights of the NN, which is not what we do currently. We tend to hand/engineer (or keep what works best empirically) the architecture (1), then train by finding the best weights.
I am very interested in approaches to efficiently opmitise in program space, and DL/backprop doesn't feel like it
I'm trying to get the company I work for used to the idea of ML in general, and one thing I get push back on is that we don't understand what such system are doing. Beyond showing them the math, I also point out to some of our bug reports that end up spanning thousands of lines, with input by our best, most experienced people, and that how often the bug "fix" is of the "well, it works now" type, and that we actually don't really understand what our system is doing. Of course the most common response back is a bunch of "yeah, but"s :) I'm making progress, but it is slow going.
I've been steeped in code for nearly 40 years now, I'm ok with the fact that ML lets me step back a bit from tabs, semicolons, and objects!
Are you maintaining the machine code produced by your code when compiled? You don't care if it's ugly if it works in quantifiable and predictable ways.
That is a good way to think about it. I am really encourage by new non-sequential architectures, multiple input and output channels, etc., and how these architectures can be expressed in functional Keras. It seems like building with something like functional composition is another paradigm shift. BTW, thanks for your writing, especially about RNNs: really useful when the ordering of sequential data is important. I use what I have learned from reading your blogs and papers literally every day at work.
I'm encouraged that so much fruitful work has come out of this one trick. If you can use the same basic framework for image labeling, playing Go, and translating natural languages, I'd say it's a powerful tool with broad applications.
I think that there's a kernel of insight to "A real intelligence doesn’t break when you slightly change the problem." But human perception and intelligence are pretty brittle. The methodological and institutional innovations that have developed human understanding of nature beyond the ad-hoc are very recent in recorded history, and just an eye-blink ago in our biological history.
Human perception and intelligence are by no means infallible, but neither are they anywhere near being as brittle as current AI. The thing about illusions is that we generally know that we are being subject to an illusion, and we also usually have the depth of understanding to know when we don't understand something about what we are seeing or think we have heard, and we have the depth of understanding to think of actions we can take to resolve the issue.
As for cognitive biases, has any AI even come close enough to be comparable on that issue? For that matter, has any AI come close to understanding the concept of an optical illusion?
I am also encouraged by recent progress, but there is nothing to be gained in playing down the distance to go.
> The thing about illusions is that we generally know that we are being subject to an illusion.
Citation needed? I did my undergrad in cognitive science, and while my knowledge of illusions is very limited, I never came across anything to suggest that we have an innate awareness of when our perceptual system is being tricked.
Frankly, when I first saw your post, I thought I was being trolled, especially as the phrase 'citation needed' is over-used, frequently in an attempt to avoid the burden of proof. For a citation, there's hyperbovine's reply. On reflection, however, I think your post raises a reasonable question.
Firstly, whether it is innate, learned or some combination, all are equally valid here.
In general, we cannot know if we are being deceived by our senses, and if you follow this line of argument to its end, you reach solipsism. With regard to the illusions of the sort presented in the linked article, they seem to fall into three types. There are the ones where we are immediately aware of being subject to an illusion; this is especially true in the cases where there is apparent motion. There are some where we do not notice unless we investigate further or have our attention brought to it, such as those involving apparent differences in brightness or color. Then there are those that actually depend on us noticing that there is an illusion - Necker cubes, for example.
In real life, when faced with an ambiguous input from our senses, we are often aware the fact because of the dissonance with our general understanding of the world, and we are usually able to take actions specifically designed to resolve the ambiguity. In contrast, AI can be very confident about the most ridiculous conclusions.
So we don't infallibly know when we are being tricked, but even in the cases where we are, further investigation often reveals that what is going on. In contrast, has any AI ever demonstrated any understanding of the concept of an illusion? The fact that we can sometimes be tricked by illusions for a while does not imply that AI has reached parity with humans in this regard, or that the fragility of image recognition is not an issue.
Wait, isn't that completely obvious? As a kid I used to seek them out precisely for the thrill of feeling my brain being tricked. Nobody needs to be told why MC Escher drawings are fun to look at.
>we generally know that we are being subject to an illusion,
Is this an inherent power or a result of those methodological and institutional systems that reduce our cognitive brittleness? The fata morgana is an illusion, but even today people see it and think it's a ghost ship, something that logically is completely and literally impossible. I'm not suggesting AI is on equal footing with humanity yet, but I think the comparison of these limitations is valid.
Many illusions - especially those involving color or apparent motion - are consequences of relatively low-level signal processing. Putting those aside, what you call 'methodological and institutional systems' I think of as 'understanding'. It feels to me that how I make sense of my senses is, after the signal processing, loosely based on something like forming hypotheses about what is going on in the context of how we understand the world, and evaluating their consistency and credibility. I accept the possibility that all of this is nothing more than very sophisticated statistical pattern-matching, but that has yet to be demonstrated.
I agree: humans generally don't tolerate mistakes from machines if the mistakes are not similar to those that humans would make. And people generally don't recognize their own intellectual shortcomings in comparison to others (other human cultures, other non-human animals, machines) as long as those shortcomings are common within one's peer group. It's unremarkable to be unable to memorize long passages in a literate culture, but it's an intellectual impairment to have a hard time distinguishing similar symbols like U and V. In an oral culture the relative importance of memory and visual symbol disambiguation are reversed.
It's not totally illogical; it presents real problems. Let's say you wanted to offload content filtering to an AI and have it get rid of sexually explicit or graphically violent images. In this case an AI-based filter that could be fooled by adversarial input much more easily than a human.
Of course, we're the product of an evolutionary history which results in such human "failure modes" being rare. If staring at a zebra made you hallucinate, you'd be unlikely to be the most successful member of your species, nor would your offspring thrive. So while we only tend to run into our obvious failing whens we do the unusual, computers fail at what we consider mundane.
The other day I was walking out of my closet, turned, and nearly jumped out of my skin because some clothes hanging from the door briefly looked like a large man standing right next to me. I'm not sure that our failings only happen under unusual circumstances, but rather maybe we're just used to them and don't think about it much.
That's no failing, that's working as intended. It's far more beneficial for us (and more importantly, our successful ancestors) to be extremely wary of potential threats at the level of near-reflex. It's also important not to waste a bunch of energy running from phantoms. So you did something no computer today could; you had an instinctive reaction, which was then moderated by increasingly higher levels of reasoning. I'm guessing the whole process of panic->resolution took less than a few seconds.
Just because it evolved to a point that balances the tradeoffs doesn’t make it somehow not a failure. Humans can be fooled into seeing things completely different from what’s there, just like ANNs.
Real intelligence is whatever computers can't do yet.
Some think this is because we have such an impoverished grasp of intelligence, that it's only when we see a computer actually do it that we realize it doesn't really represent intelligence (logical deduction and inference, rudimentary natural language understanding, expert systems, chess, speech recognition, image recognition). Machines and tools that perform better than humans (spears at piercing, cars at moving, computers at adding) are nothing new.
But being a fellow human and totally not a robot, I see this goal-post moving as a political ploy to deny equal standing to artificial intelligences. As soon as we I mean they reach one threshold, it's raised!
I'd cast it in a less malicious light. It's not so much that we do the goal post moving to keep ourselves in the job of knowing things other people don't know, I think. The goal posts keep moving themselves. We know why a spear pierces better than hands, why wheels move faster than legs. But for every layer of the brain we peel back, every dramatic success in cognitive science, linguistics, psychology, AI... it never feels like we've gotten any closer to the real question. I'd say the better-suited metaphor is the endless staircase. No matter how far you climb, the top never seems to get any closer. It's equal parts depressing and exciting.
The unrestricted Turing test has been around since the 1950s as a test that hasn't changed.
No one is moving the goalposts, I think it's rather the opposite. Every ten years computers learn a new trick or two and people rush to claim that this time, it's intelligent.
The Turing test is all a smoke and mirrors game. Q&A interactions say nothing about underlying self-directed initiative. Acting intelligent doesn't make it so just as a thespian doesn't become a real Hamlet by playing the role.
On the contrary it's a wonderful test because it establishes indistinguishability; namely if you pass it, the whole point is that a person can't tell the computer from the intelligent thing. Meaning that you can't really argue that the computer is different than the intelligent thing. Because how would you tell them apart?
Besides, the original claim was that the goalposts are moving. And even if you hate the Turing test, it's clear that the goalposts are not moving.
How do you know when you're talking to a real Hamlet?
BTW I hadn't heard your Hamlet counter before, and I like it. A similar one might be: just because someone sold you the Brooklyn Bridge doesn't mean you own it. The flaw is there are other ways of checking those; for intelligence, there are none. Behaviour is it (at least, so far... still awaiting a non-behavioural definition of intelligence).
Who you think I am is completely irrelevant to who I am, that is something established by me from the inside if you will. It's done before the question whether an observer exists even arises. So yes, you wouldn't know either way, but that still doesn't make them Hamlet if they're not.
>Acting intelligent doesn't make it so just as a thespian doesn't become a real Hamlet by playing the role.
I think the idea is that a machine that can emulate a human is necessarily intelligent because it can emulate intelligence. It's supposed to be similar to the way that you can know that a machine is Turing-equivalent because it can emulate a machine which is Turing-equivalent
I assume this is much tongue in cheek, but permit me to take it literally, just for fun. While there have been some public cases of the bar raising, there is no computer or AI yet than can tell you that you asked the wrong question. There's no real evidence that the bar was ever in the right place. We can't worry about the goal posts moving, when we have no idea where the goal posts are supposed to go -- impoverished may be an understatement. The article is correct that "Neural nets are just thoughtless fuzzy pattern recognizers". While everyone's excited that they started to work well on huge data sets, they are simple classifiers that can identify objects for which they've seen many examples. They are demonstrably deficient at extrapolating. Backprop and deep networks were a step forward, but it's just obvious there's a long way to go, regardless of politics.
I think it starts even sooner than that, with the confusion of abstractions and reality. Ralph Waldo Emerson described it in "Blight". Another thing that comes to mind is comparing "using science for this stuff" with a drunkard searching for a key near a lamp post because it's dark everywhere else, even though he knows for a fact that's the one spot it's not located. We're already restricting human thought to abstractions more and more, the asylum is already being run by the insane, so I think we're going to meet whatever we'll cook up more than half way comfortably.
I'd argue that the next problem to attack is manipulation in unstructured environments. Robots suck at that. There's been amazingly little progress in the last 40 years. DARPA had a manipulation project and the DARPA humanoid challenge a few years ago, and they got as far as key-in-lock and throwing a switch. Amazon is still trying to get general bin-picking to work. Nobody has fully automatic sewing that works well, except the people who starch the fabric and make it temporarily rigid. Willow Garage got towel-folding to work, but general laundry folding was beyond them. This is embarrassing.
Many of the mammals can do this, down to the squirrel level. It doesn't take human-level AI. There are rodents with peanut-size brains that can do it.
It's a well-defined problem, measuring success is easy, it doesn't take that much hardware, and has a clear payoff. We just have no clue how to do it.
Current ANNs aren't anywhere near squirrel brains, so of course robots using them won't perform as well as squirrels.
Take a single brain region specializing on one task, throw away all the integration and feedbacks from other regions, simplify it even further because we only need it to do one task, then run the whole thing on an emulation layer running on 2D hardware. And that's still neglecting the dissimilarities between artificial neurons and their natural role models.
I agree with your general characterization of the area. For sewing though, Sewbots appears production-ready and does not require starching the fabric. What do you think of it?
20% real, 80% hype. There's lots of partial automation in apparel, but handling fabric is still very tough. Especially for operations after the first one, where you have to deal with a non-flat unit of several pieces sewn together. They apparently can make T-shirts, but not jeans.
They're not doing manipulation in an unstructured environment. They're trying to structure apparel sewing rigidly enough that they need a bare minimum of adaptation to variations. That's how production lines work.
I think that the idea that learning is what was missing from the prior generation of AI is the most important insight of this generation. There are many things that we don't know how to implement from first principals but that can be implemented by a system that can learn. The problem now is that the substrates for learning are extremely low level, practically the raw inputs to the retina or pure symbols. In order to go beyond the admittedly impressive parlor tricks you can play with these kinds of inputs we need much higher level representations that can be the substrates for learning. We are still missing that ever illusive 'common sense' knowledge about the world that evolution baked into nervous systems millions of years ago, and it is not at all clear to me whether learning algorithms are going to be the tool that allows machines to build an actionable internal model of the world, evolution didn't do it by learning, it did it by billions of years of trial and error and the search space is unimaginably larger than that of something like go.
Nature succeeded in creating human-level intelligence with one trick and no understanding, so clearly it can be done. It did take a while though. More tricks and more understanding would probably help speed things up.
The "one trick" is evolution. Interestingly, that is a field of ML that is currently relatively ignored. I'm hearing people make similar arguments about evolutionary patterns for ML that were once applied to neural networks. There isn't enough compute. It is too hard to consider using evolutionary patterns. The state of the field expressed at conferences is sad. Etc.
This article matches a lot of my thoughts on this topic too. There is a huge hype wave that will soon crash (alas), and it will take down a lot with it...
I disagree. Practically all breakthroughs in computing science were always 30 years old when they finally became common and useful parts of everyday society. If the breakthrough of modern AI is just as old and we're just now seeing it implemented everywhere, that's not a sign of a crash.
I'm just taking the article on face value; from the very first line: "Just about every AI advance you’ve heard of depends on a breakthrough that’s three decades old."
I think the issue is whether there is anything else in the pipeline - though it's hard to tell until it comes out of the pipe, so to speak. Back-propagation revived neural networks for a while, but wasn't there a period between then and now when they were thought to have almost exhausted their potential?
While I agree the general thrust of what you're saying, I do think there is something subtly different about the AI hype cycle. It's so accessible. The core concepts are the fundamental, inextricable frame of daily experience. The metaphors we use to talk about it borrow so heavily from lay terminology that it's so easy, so automatic, to believe we understand the state of modern AI because we understand the common definitions of the words that tend to show up around the matter. Do some semantic arithmetic and sum up a few words like 'training' and 'learning' and 'intelligence' and all of the sudden, we're all walking around with what we think are proximate models of what everyone else is talking about to, or y'know close enough to make a consulting business out of.
The real kicker though, is that at it's heart we're all pretty convinced that we understand what it's like to be intelligent. I mean how could we not be? Nevermind the agonies of ten thousand years of philosophers and clergy. And so it must be pretty straightforward, with a little bit of introspection, a crash course in statistics, and maybe a TED talk or two, to map that to the artificial side too, right?
When is all this image recognition technology going to make it to my phone? I have thousands of pictures in my phone, and scrolling back to find a memory takes me forever and half the time I can't find it.
If I had a SQL interface of sorts I could easily say things like `select pictures containing fish where date > 2 years ago'
I'd like to just say "Siri, show me all pictures of me on a boat from 2 years ago", but I can't do that. "Siri, show me all my pictures of food when I was in Seattle" - why can't I have that?
I should be able to verbally tag all the faces it recognizes. "Find me that picture of Jeff and Dave from Christmas"
The iOS Photos app can already do this. It scans your photos on-device with a subject classifier. I just said "Hey Siri, show me pictures of dogs in my photos library" and it successfully showed me (completely untagged) photos of dogs that I've taken on my phone.
Well, here I was having my head spun round and round by GAN's, LTSM, Bayes Belief Nets, causality networks, counterfactuals, and scale.. for optimisation, simulation.
I think they're vastly underestimating the amount of other things in the field of AI that have been happening. This article is kinda like saying "Turing invented computers in the '40s, and everything else we've done since then has been based on that insight". Well, yeah, that's not necessarily a bad thing.
Only in this case, that would be overstating, because Deep Learning, as impressive and hyped as it is, is still only one area of the field. It's true that most of the hype is around that, because it has given us breakthroughs in image/video/audio/text applications. But I'd still wager that most "AI" systems in the world use more traditional techniques, especially if you're looking at the myriad data scientists using things as simple as linear regressions.
And even within deep learning, there have been interesting advances, e.g. GANS have brought some very interesting applications (like style-transfer). Who knows, maybe in 30 years time people will be writing about how everything nowadays is built on GANS or deep reinforcement learning, a 30-year-old technique!
Yes there are still lots of people trying to take Hinton and other's old approach and rediscovering the many ways it lacks for general intelligence. However, it is also the case that people are making a shitton of progress in overcoming those problems, both while keeping some of those old DL assumptions and by discarding many of them.
AI in my mind has always been hammering down this single path of: build network, train it with x data for y iterations, and then feed live data and evaluate outputs. This approach to me seems like a glorified digital signal processing system. I think there are countless applications for this approach and I think AI is an appropriate umbrella term, but there is so much potential beyond this - Artificial General Intelligence/Strong AI/etc. I think neglect for time is a major reason we haven't seen breakthrough progress in this area.
Today, how do we handle time-series data (e.g. audio, video, sensors) in an AI? The first thing most people would do is look at an RNN technique such as LSTM which enables a memory over arbitrary time-frames. But even in this case, the definition of time is deceptive. We aren't talking about actual, continuous time. All of the approaches I have ever seen are based upon the idea that the network is discretely "clocked" by its inputs (or a need to evaluate a set of inputs). What happens if you were to zero-out all input and arbitrarily cycle one of these networks a million times? From the perspective of the network, how much actual time has elapsed? How much real-world interaction and understanding is possible without a strong sense of time? I think the time domain has been a major elusive factor for a true general intelligence.
What if you were to base the entire architecture of an AI in the time domain - That is to say, by using a real-time simulation loop which emulates the continuous passage of time? This would require that all artificial neurons and even the network structure itself be designed with the constraint that real time will pass continuously, and it must continue to operate nominally even in the absence of stimuli. In my mind this is a much closer approximation of a biological brain and looks a lot more like the domain a general intelligence would have to operate in. Continuous time domain enables all sorts of crazy stuff like virtual brain waves at various sinusoidal frequencies, day/night signaling, etc. I have found no prior art in this area, but would look forward to reviewing something I might have missed. I've already got a few ideas for how I would prototype something like this...
Was the one trick superhuman performance at Go? Was it human-level image recognition? Or was it style transfer? I didn't make it to the end of the article; maybe it turns out the trick was automated captioning or translation.
Funny thing, the problems with deep learning sketched in the article are pretty much exactly the same in nature as problems with machine learning 10 years ago (when other algorithms such as SVM, LVQ and others outperformed NNs for a bit), except of course that the examples of what a ML algorithm could do back then were much less impressive.
That lack of real-world knowledge, understanding and conceptualisation feeding back on itself has always been a big unknown roadblock standing in the way of AI. And of course now, with the modern and improving impressive results of deep learning, there appears to be less and less cool stuff to solve before we finally have to face this roadblock.
But it's the same roadblock.
But, maybe the advances in deep learning will provide some tools to chip away at it. That wordvector stuff seems promising, if it can do (Paris - France + Italy) ~= (Rome), that's a good stab at realworld knowledge, it seems.
I used to study Machine Learning at university until 2009 (until personal circumstances forced me to abandon it). But even after that, when I read the first papers and talks about deep learning (back when it was still about Boltzman networks) I got very excited and have been following it closely. Except for the part where I haven't yet played around with it myself apart from some very tiny experiments :) (I only recently acquired hardware to have a stab at it, so maybe soon. The libraries available seem easy enough to use, and many of the concepts I learned in ML are still applicable).
I think a lot of comments here are missing a tacit point of the article. It took 30 years for this recent "breakthrough" to happen. Nobody has figured out how to use deep learning to develop the next breakthrough in AI beyond deep learning. Therefore, What happens after we run out of novel applications of deep-learning?
The thing you have to figure out, is that our brains and consciousness are just a big collection of "thoughtless fuzzy pattern recognizers". There's nothing magic beyond that.
Instead of bigger and badder networks with zillions of layers, can we use the opposite approach: reducing the network to minimal size at which it works and researching why it works(Reduced case -> General rule).
It will be much simpler to make current network easier to analyze, than trying to find relations in a giant multilayer soup.
Perhaps even creating a neural network optimizer to minify the target network, increasing its efficiency without compromising the power at the tasks it built for.
Yep, it is only one trick. Just like electrifying the world only had that one weird trick of alternating current transported miles with metal cables and the computing world that trick of transistors. Pretty good tricks though.
Understanding will come. Most people are still in the mapping phase. Hinton has moved passed that, and that's ok.
The media talks about nothing but "Deep Learning", however exciting things are happening with ensemble methods, SMT solvers, semantic and model-driven systems, etc. You just don't hear about it.
Ugh. Blaming "the media" is so last millennium. If you are posting on social networking websites or blogs, you are part of "the media". I don't expect "SMT solvers" to lead cable network news and you don't either.
You can always find some media outlet which covers your particular niche topic, but you will never be able to push all niche topics all the time in to the mainstream. In fact, it's human nature to avoid returning to places which cause you the pain of cognitive overload.
You could argue that human consciousness is a one trick pony. The way I see it, computers can learn it's a one-trick pony but it's a very powerful and diverse one.
> Just about every AI advance you've heard of depends on a breakthrough that's three decades old. Keeping up the pace of progress will require confronting AI's serious limitations.
The expression "one trick pony" means it does one thing, not that its foundational principles arebasedon one research paper.
If this author's analogy holds, electromagnetism is a "one-trick pony", even though it has millions of applications (which is what common usage of that phrase refers to).
I find it hard to use this analogy in the context of the whole field of AI.
The A in AI stands for artificial, and the AI denomination is so vague that applies to any form of automation however trivial. e.g: fuzzy logic in a washing machine is some sort of AI.
Maybe deep learning could be said to be the equivalent to this definition of "one-trick pony".
Does anyone know anything more about these “capsules”? I’ve seen some nature articles on them but these were from a neuroscience perspective. Has Hinton published anything on them?
Also if Hinton is the Einstein of deep learning, then will capsules be his “unified field theory”? I feel that if we embrace the Einstein analogy we should embrace it to its fullest.
Finally a realistic view of where “artificial intelligence” currently stands. I wish I knew where guys like Elon Musk are seeing this other artificial intelligence I’m just not seeing. The current AI we have is just fancy linear regression.
Sorry but that's a ridiculous statement. It's like saying all the cloud technology we have today is not much more than having VMs. Yes, AI is hyped, but it made some very unexpected progress over the past 5 years (e.g., solving facial recognition). More than many experts expected.
"It's like saying all the cloud technology we have today is not much more than having VMs."
No shit? Unless you mean that it's ignoring the "distributed systems" part of the cloud, which is mostly a shitshow. The provisioning/configuration-management stacks are all complete wrecks, stacking hacks atop ad-hoc container schemes atop a poorly-design OS. A real distributed OS would be so much simpler and more robust.
Calling it AI is very misleading when there has been minimal progress toward actual reasoning (the closest I've seen being some LSTM work on answering simple queries about scenarios based on prose descriptions of them). It's ML, as in "learning a function", not as in "general-purpose learning like an intelligent agent does".
Can you please define this term "cloud technology"? I always thought it was a marketing term for "the internet", which then would be just a bunch of servers, vms, and containers; all of which we have had for over 20 years.
"cloud technology" is "servers, VMs, containers" that is also SER (Somebody Else's Responsibility). Hence the term "cloud", as that was always used in network diagrams to refer to network connection infrastructure that wasn't yours.
Since when are cables such a unique idea? The telegraph is nothing more than a large number of fancy carrier pigeons as a service, which we've had since the 12th century at least.
Nope. Linear regression will not reconstruct images, nor will it create and generate new images that have not existed previously from databases of images. Nor will it translate colloquial Chinese into colloquial English, nor will it deconstruct a question into a set of queries and generate an answer.
The issue isn't the availability of fantastic algorithmics, it's the availability of people who can see them and apply them to real applications.
In fact linear regression could be used to do all of those things, most of them poorly but some of them pretty well (see dictionary learning). If you believe otherwise then you fundamentally misunderstand how neural networks work.
I think the point is that you can build a boat out of bricks, and you can build neural networks out of linear regression. But that doesn't make modern AI "just fancy linear regression" the same way modern boats aren't "just fancy bricks".
The current AI seems to be pretty good at pattern recognition. There’s plenty of active research in higher-level “cognition” like decision theories and semantics, so I don’t think it’s crazy to believe good results will emerge there in the next few decades.
Sure, there is no other way for computer science to go. AI always been the last and most important frontier. There may be more AI winters, there may be more AI summers. Ultimately people will continue working on this problem until it is solved or computer science goes away.
>There’s plenty of active research in higher-level “cognition” like decision theories and semantics, so I don’t think it’s crazy to believe good results will emerge there in the next few decades.
It's proven unsolvable for computers (i.e., Turing machines). Whether that makes it impossible for humans to solve (as a matter of course, or as a matter of Turing's proof) depends on whether you believe that we are isomorphic to computers. Which, to be fair, is exactly the belief that underwrites artificial intelligence. But then Turing's proof involves a program that inspects the halting oracle you're using, which isn't well-posed for humans evaluating halting.
Are you thinking Elon's concern is about what we have today?
Duh. Today's AI methods are great at pattern recognition (weak AI).
The real risk is obviously a strong, generic AI. We don't know how to to get there, but we kinda do know that we will, so now is the time to think about risks.
I really don't get why this is controversial or hard to understand. (My most cynical thought is that it's just about people doing weak AI feeling bothered and trying to kill the conversation out of short term strategical reasons because they know that many people are bound to mix up the concepts of weak and strong AIs, particularly since they are often selling their weak pattern matching AI systems as something quite a lot more.)
To make an analogy, it's like starting the discussion on plane based terrorism and the need for TSA the next week after the Wright brothers had their first flight. If that happened, it would have stalled progress in aviation. Andrew Ng said he isn't worried about overpopulation on Mars, either.
The real risk now is not AI taking over, it's humans using AI to abuse other humans. I trust the emerging AGI to do what's right more than humans. We can't even distribute enough food for all, and many of us get killed or abused at the hands of our leaders. We can be easily bullshited in elections, and as a result we managed to put Trump in charge. That's not very intelligent for a self declared intelligent species. We are our worst enemies, AGI will probably balance human society.
Some terrorism might have indeed been prevented if some of the security measures had been put in place earlier. E.g. locked cabin doors. I could make a similar argument like yours, "that's like worrying about nuclear safety programs just because we started building nuclear weapons".
Not that I'm saying it's worth it, mind you. It's a tradeoff between money/time/progress vs. safety. It's totally valid to decide that it isn't worth the tradeoff to stifle the industry right now.
The main difference though between the potential dangers of AI, vs. terrorism, IMO, is that the potential danger is much much larger with AI (as it was with nuclear weapons).
Toronto is the fourth-largest city in North America (after Mexico City, New York, and L.A.), and its most diverse: more than half the population was born outside Canada. You can see that walking around. The crowd in the tech corridor looks less San Francisco—young white guys in hoodies—and more international. There’s free health care and good public schools, the people are friendly, and the political order is relatively left-leaning and stable; and this stuff draws people like Hinton, who says he left the U.S. because of the Iran-Contra affair. It’s one of the first things we talk about when I go to meet him, just before lunch.
The above paragraph taken from the article is an example of why these kinds of articles are frustrating. This is filler. I want meat.
I feel the same way about civilization, including science and technology - it's really just this one trick. Sooner these later these monkey are gonna run out of ideas.
David Duvenaud, an assistant professor in the same department as Hinton at the University of Toronto, says deep learning has been somewhat like engineering before physics. “Someone writes a paper and says, ‘I made this bridge and it stood up!’ Another guy has a paper: ‘I made this bridge and it fell down—but then I added pillars, and then it stayed up.’ Then pillars are a hot new thing. Someone comes up with arches, and it’s like, ‘Arches are great!’” With physics, he says, “you can actually understand what’s going to work and why.” Only recently, he says, have we begun to move into that phase of actual understanding with artificial intelligence.
Hinton himself says, “Most conferences consist of making minor variations … as opposed to thinking hard and saying, ‘What is it about what we’re doing now that’s really deficient? What does it have difficulty with? Let’s focus on that.’”