I can't be the only one that considers all these AI articles just smoke and mirror puff pieces to prop up a company's value by capitalizing on the hype (hysteria?), can I? I think the first flag is that the journalists don't seem to really understand the technical capabilities or limitations of current ML/AI applications. They accept grandiose claims at face value because there is no way to measure the real potential of AI (which the promise of seems limitless, so anything appears plausible especially coming from a big company like GOOG).
I think there are a couple of really overhyped areas right now, AR/AI/ML and IoT/IoE. Now while I don't mind the attention and money being thrown at tech, I can't help but feel we're borrowing more against promises, hopes, and dreams, while simultaneously under-delivering and I think that's going to hurt tech's image and erode investor confidence sooner than later.
A lot of things that were impractical 5-10 years ago are now moving out of the domain of the biggest companies to the smaller ones.
Applications involving computer vision and speech recognition are now buildable by small companies, which will hopefully yield a proliferation of really interesting novel applications.
Sure, we don't have terminator-style AI, but honestly people using the term AI need to shut up, a lot. There is no AI, these days, just massively creaky giant ML systems with a host of ph.d's being thrown onto the fire to keep them running. But the ML applications are super cool.
The Chinese curse might be now updated to "may you wind-up dependent on 'really interesting novel applications'..."
Machine learning-derived applications are impressive and give a good show until one winds up in a situation where they are expected to work reliably. Sure, it's nice that the insurance company's phone-based, voice-recognition-driven, registration/etc system can understand 99% of the choices people give them - except total failure in that 1% is actually going to leave a large population unserved and angry. Of course the company has keypad backup - except they don't 'cause that would cost the money they claimed voice recognition could save, etc.
Machine learning apps are great for situations where
1) You don't expect 100% reliably and the degree of non-reliably doesn't have to even be quantified.
2) Either you are accept that they'll degrade over time and have an army of tuners and massive data collect to keep that from happening OR you are dealing with an environment you completely control.
This is kind of the conditions for regular automation - except even more so.
We lost the word "AI" literally decades ago. Everything from search algorithms to machine learning to genetic algorithms to video game bots are called AI. The cool kids use the word "AGI" now.
There is a common expression that "AI is whatever computers can't do yet." At one time computers couldn't play board games or do vision, so those things were called AI.
It seems like your first and second paragraphs express opposite opinions. In the former you seem dismayed by the over-application/overhyping of the term "AI"; in the latter you seem frustrated by the high and ever-increasing bar for categorizing systems as "AI". Am I misinterpreting you?
My comment can be read two ways, and neither way is wrong. I wasn't really expressing an opinion as much as bringing up relevant facts. People label things that we don't know how to do yet "AI". And then when these hard problems are solved, they seem like they are not really "intelligent".
This leads to both overuse and trivialization of the word, but also moving goal posts for the field. And actual progress isn't taken seriously, because nothing feels like intelligence when you understand it.
Yeah, if anything, the "AI" part of the search has been part of the decline. Google aggressively gives me what it thinks I want rather than what I ask for. It seems like it's very clever in giving me something like what an entirely average person would likely want if they mistakenly typed the text that I knowingly and intentionally typed ("Kolmogorov? do you mean Kardashian?" etc).
The search does seem able to understand simple sentences but there's much less utility in that than one might imagine. Just consider that even an intelligent human who somehow had all of the web in their brain couldn't unambiguously answer X simple sentence from Y person whose location, background and desires were unknown to them. Serious search, like computer programming, actually benefits from a tool that does what you say, not what it thinks you mean. Which altogether means they're a bit behind what Alta Vista could give in the Nineties but are easier to use, maybe.
Part of the situation is the web itself has become more spam and SEO ridden and Google needs their AI just to keep up with the arms race here. So "Two cheers" or something, for AI.
They seem a bit better at dealing with search spam than they used be. I was searching for a something in ho chi minh the other day and got a spam site saying "where to get <thing> in ho chi minh" where they'd produced similar pages automatically for all cities in the world pointing to their stupid online site, and kind of got nostalgic - I hadn't seen that kind of spam for a year or two and it used to be constant.
Sometimes I get the vibe that it's a weird self-fulfilling prophesy of terrible SERPs. You reward freshness too much, so then people play that game. But you also punish duplicate content at the same time so the best content, if it already exists but is not fresh, naturally has to fall off.
That is pure nostalgia talking. I strongly suspect that if you did you would be appalled and want to switch back after only a few searches. Search in 2016 is leaps and bounds beyond what it was in 2006.
If the internet was 1 page 10years ago today it is 10 million pages. So today Google search is a lot better. Its not that Google search has detiorated its the web that has to much useless data.
I still am able to find what I am looking for in the first 5 results for most things. 10 years ago had to go to the 2nd or 3rd page for some obscure queries. Now it's a lot better at answering direct questions as well instead of just giving results.
I would argue that you need to evolve with the software. Google search isn't the same anymore. Intellectual Darwinism, Evolve or become irrelevant. You can't search the way you used to or you'll get crappy results. If you don't want to be limited by your Google search bubble then search in anon mode.
> I think there are a couple of really overhyped areas right now, AR/AI/ML and IoT/IoE. Now while I don't mind the attention and money being thrown at tech, I can't help but feel we're borrowing more against promises, hopes, and dreams, while simultaneously under-delivering and I think that's going to hurt tech's image and erode investor confidence sooner than later.
I became interested in natural language processing in the early 2000s, more as a hobby and as part of my personal projects, but even so, I remember that back then most of the AI-related discussions on things like forums and mailing lists were mentioning the AI winter as the big bad wolf that had killed an entire industry. It also killed LISP, they were saying.
Interesting to see that that memory seems to have faded away to the distant past.
I thought the same thing when I saw the title. And then I saw the byline: Cade Metz. And I read the story. Metz understands tech. Metz got this story right.
So while you may be right that many AI articles are smoke-and-mirrors from journalists who don't get the tech, I think you picked the wrong article to make that point about.
We've had artificial intelligence since you could play a computer at chess, but the expectation has always been a mechanized Arnold Schwarzenegger or Hal 9000.
The difference today is the scale at which it operates in our daily lives, and the accelerated rate at which it is growing.
You can add to this, the gigantic misuse of the term "AI" when journalists really want to say "robotics". I don't mind clever shortcuts when you need to explain something abstract or invisible to non-technical people, but, at some point, somebody will have to tell them they're two completely different fields.
>>I can't be the only one that considers all these AI articles just smoke and mirror puff pieces to prop up the companies value by capitalizing on the hype (hysteria?)
You're not.
So far I have not seen/heard anything remotely resembling AI. Neural nets are just weighted graphs.
Is anyone arguing that machine learning is not AI? The gist of the article is that Google is leaning toward ML/DL and away from the rules engines/Knowledge Graph. The headline is a shorthand, which, although it could be more precise, is not inaccurate.
Interesting that you feel that. The article mentions nothing about Google's Knowledge Graph. I don't have any privileged insight into Google, just the same surface data as the rest of you all - but I would say that, if anything, Google's Knowledge Graph can fit with _both_ a rules engine strategy and a machine learning one.
How is Google going to "organise the world's information" unless it has a model of how all the facts in the world line up? That model is the Knowledge Graph. How does Google intend to map queries that it has never seen before to pages in its vast index? With the help of the Knowledge Graph and natural language processing and machine learning.
I'm going to try to articulate something here that I've not fully worked out but that I'm sort of intuiting so cut me some slack for the next paragraph :)
People keep banging on about machine learning and the impact that it is having. This is undeniable. But we can see even from AlphaGo that a hybrid approach that combines artificial neural nets with some sort of symbolic system outperforms neural nets on their own. For AlphGo that symbolic system is tied to the mechanics of the game of Go. For internet search that symbolic system is a generalised knowledge graph.
Do you get what I mean? I'd love to hear what others think …
The article doesn't mention Google's Knowledge graph by name. But that is what the reporter is referring in sentences such as these, which mention "a strict set of rules set by humans":
> But for a time, some say, he [Singhal] represented a steadfast resistance to the use of machine learning inside Google Search. In the past, Google relied mostly on algorithms that followed a strict set of rules set by humans.
I know because I spoke with Metz at length and was quoted in the article.
The Knowledge Graph was, by definition, a rules engine. It was GOFAI in the tradition of Minsky, the semantic web and all the brittleness and human intervention that entailed.
What he's saying here is that Google has relied on machine learning in the form of RankBrain to figure out which results to serve when it's never seen a query before. And the news, in this case, is that statistical methods like RankBrain will take a larger and larger role, and symbolic scaffolding like the Knowledge Graph will take a smaller one.
You are right that the most powerful, recent demonstrations of AI combine neural nets with other algorithms. In the case of AlphaGo, NNs were combined with reinforcement learning and Monte Carlo Tree Search. I don't think a rules engine (the symbolic system you refer to) was involved at all there. Nor is it necessary, if by studying the world our algorithms can intuit its statistical structure and correlations without having them hard coded by humans before hand. It turns they do OK learning from scratch, given enough data.
So in many cases we don't need the massive data entry of a rules engine created painstakingly by humans, which is great, because those are brittle and adapt poorly to the world if left to themselves.
The Knowledge Graph is just a way of encoding the world's structure. The world may reveal its structures to our neural networks, given enough time, data and processing power.
Hmm, are you sure? Doesn't "a strict set of rules set by humans" refer to the PageRank algo alongside rules for spammy content, nd rules like whether meta keywords are set, and so on, all the little rules that feed into deciding where a page that matches ranks in the resultset. That's why it's tweakable by engineers..?
"The Knowledge Graph is just a way of encoding the world's structure." Precisely. Very well said. "The world may reveal its structures to our neural networks, given enough time, data and processing power." But that's the point, NNs don't have to perform this uncovering because we do the hard work for them in the form of Wikidata and Freebase and what have you. I don't get what you think is brittle about this.
I was referring to the very recent article[1] by Gary Marcus, I need to quote a good chunk:
"""To anyone who knows their history of cognitive science, two people ought to be really pleased by this result: Steven Pinker, and myself. Pinker and I spent the 1990’s lobbying — against enormous hostility from the field — for hybrid systems, modular systems that combined associative networks (forerunners of today’s deep learning) with classical symbolic systems. This was the central thesis of Pinker’s book Words and Rules and the work that was at the core of my 1993 dissertation. Dozens of academics bitterly contested our claims, arguing that single, undifferentiated neural networks would suffice. Two of the leading advocates of neural networks famously argued that the classical symbol-manipulating systems that Pinker and I lobbied for were not “of the essence of human computation.”""
For Marcus the symbolic system in AlphaGo _is_ Monte Carlo Tree Search. I'm saying that for the so-called Semantic Web the symbolic system is the Knowledge Graph. This Steven Levy article[2] from Jan. 2015 put the queries that evoke it at 25% back then. I figure it's more now and growing slowly, alongside the ML of RankBrain.
yep, this apparently has been a wall street operation than anything else. Google needs the capital to transform and sustain the decline of web search revenue.
There is a tendency among non-technical admirers of ML to regard deep learning methods as beyond their creators: independent entities that will one day, given refined enough algorithms and enough energy, out-comprehend their human creators and overwhelm humanity with their artificial consciousnesses. The term “neural networks” is itself a misnomer that doesn’t at all reflect the complexity of how human neurons represent and acquire information; it’s simply a term for nonlinear classification algorithms that began catching on once the computing power to run them emerged.
The question of whether or not deep neural networks are capable of “understanding” is largely a theoretical concern for the ML practitioner, who spends the bulk of his or her time undertaking the hard work of curating manually labeled data, fine-tuning his or her neural classifier with methods (or hacks) such as dropout, stochastic gradient descent, convolution and recursion, to increase its accuracy by a few fractions of a percentage point. Ten or twenty years from now, I imagine we’ll be dealing with a novel set of ML tools that will evolve with the rise of quantum computing (the term “machine learning” will probably be ancient history, too), but the essence of these methods will probably remain: to train a mathematical model to perform task X while generalizing its performance to the real world.
As fascinating and exciting as this era of artificial intelligence is, we should also remember that these algorithms are ultimately sophisticated classifiers that don't "understand" anything at all.
This is true of ANN, and deep learning. They are mathematical models of learning that are finally practical after a couple decades (not to diminish anything the researchers have accomplished which is incredible).
Then there are biologically inspired neural networks, like Hierarchical Temporal Memory (HTM), that actually correlate directly to how the cortex in mammals work. These have also demonstrated learning capabilities, and seem a lot more promising in the road map to general AI, in my opinion, because after all we should be piggy-backing on evolution (not that we can't find a mathematical model first).
So yeah, the hype is just hype, but it could be justified for the wrong reasons if we see breakthroughs in biologically inspired AI (the Brain Project, to name another example).
Demonstrated learning capabilities? I have not seen HTM models make any breakthroughs on any benchmarks. It's also stretching the facts to say it directly correlates to how mammalian cortexs work. At best, you could say it directly correlates to some theories on how mammalian cortexes work - neuroscience has an incredibly poor understanding of brains in general.
Before anyone believes the hype, they should read all the MIT research papers from the mid-1990s that mention the term "emergent intelligence". This was one of the biggest wastes of research money in the history of AI.
> At one point, Google ran a test that pitted its search engineers against RankBrain. Both were asked to look at various web pages and predict which would rank highest on a Google search results page. RankBrain was right 80 percent of the time. The engineers were right 70 percent of the time.
I don't really understand the point of this metric. Why are they predicting what ranks highest on Google search? Wouldn't a better metric be who predicts the correct place a user was looking for?
Is the thinking that if they are using machine learning, than whatever the user is looking for should have bubbled up to the top?
I am in the recent days having the impression that Google whatever it is doing is focused more and more in presenting to the user google biggest clients, and hoping that it will be useful.
Because I am having more and more trouble finding what I want, people used to consider me a master of google fu, finding whatever random stuff they wanted, now I am struggling, specially after google changed the + and "" meaning (+ went from "mandatory" to mean "google plus search" and " went to mean "literal string" to mean "a sort of mandatory thing")
If I need to find some obscure term, I know now that google won't find it, despite finding that same term in the past, finding pages with a certain information on it never happen anymore, even using the "" thing.
For example I own a ASUS N46VM laptop with nVidia Optimus... this laptop is terrible, and I am always having to look online how to make it behave properly, before the "+" change, I could type +N46VM and be guaranteed I only would get relevant ifnormation... recently I was desperately searching for some stuff, and found out no matter what I input on google, it returned completely bogus results, where the string N46VM was nowhere in the page, not even in the "time" dimension (ie: if I load the page on archive.org for example and scan every version of it, N46VM never had been on it, google just heuristically decided the page was relevant and gave it to me wrongly).
EDIT: I am having some success with DuckDuckGo
although their research system is clearly cruder than google, having much less heuristics and whatnot, frequently I find the stuff I want easier on DuckDuckGo anyway, after some pages of browsing results... while on google I browse 40 pages and all of it is completely irrelevant and unrelated (while on DuckDuckGo it shows me 40 pages with the term I want, but in the wrong context).
Same experience. Changing the special characters' meanings ruined my google-fu, and starting around ~'08-'09 they seemed to give up fighting linkfarm sites and just stopped ever ranking small sites highly unless you search for something so specific that only one small site could possibly match.
It's made the web feel way, way smaller than it used to, and probably has made it smaller by starving small-but-relevant sites of traffic. Meanwhile any content site (news sites and such) that appears on the first page of results is now virtually indistinguishable from the horrible old link farms, in terms of screen real estate devoted to scummy ads.
The Old Web had popups and such, but even so the modern web feels way dirtier, somehow.
>"It's made the web feel way, way smaller than it used to, and probably has made it smaller by starving small-but-relevant sites of traffic. "
That's the beauty of it (For Google, anyways). Now those small, traffic-starved sites are dependant on getting traffic via paid adwords. Either that, or they have to rely on "organic" social-media cruft in order to get anywhere meaningful.
There is a "verbatim" mode of search that may help in these cases, which looks like it turns off a bunch of search heuristics. When you get your result, there's a "Search tools" button that reveals a dropdown that defaults to "All results". In that dropdown is a "Verbatim" option. Try that.
You can add the query parameter "tbs=li:1" to get verbatim results right away.
> "verbatim" mode... which looks like it turns off a bunch of search heuristics.
Ugh. This sounds like one of those families of PHP escaping functions: `escape_string()`, `really_escape_string()`, `escape_string_all_the_way()`, `no_really_i_mean_it_this_time_escape_string()`, etc. Google seems to have improved somewhat at fighting SEO spam, but their efforts to "helpfully" change queries have consistently made their service worse.
I'm with you and I have a pet theory why this is so:
When Google was new I found it strange that when I looked up "Apache" pages about the web server appeared before pages about the Native American tribes.
I was happy because the webserver was what I was looking for, but I found it strange because this was not what most people around me would have expected.
DuckDuckGo is like the old Google, it's great for tech and science queries. It's great for masters of the old Google fu.
Google on the other hand tries to cater for a huge and diverse audience with different expectations from the search results.
Some would expect information about the web server, some about Native American tribes. When Google's user base grew improving overall search quality meant worse results for some specific user groups.
Google tries to solve this problem with personalized search.
My pet theory is that Google puts almost all effort into personalized search. Regular search is probably not so important to them anymore, because almost everyone is logged into Google or has at least a long living Google cookie.
tl;dr DuckDuckGo is the old Google, the new Google is only good with personalized search
I feel the same way I used to be able to find very hard to find things on Google. Now it's hard to find relativity common things using Google.
I don't know if it's because they've deprecated some search operators or the index is nurfed by some invisible changes or I'm just remembering things better than they were.
It is definitely worse. My intuition is that it's driven both by a conscious decision to care more about the masses rather than power users, and to drive users to partner/customer sites. There is a conflict of interest here - it's in Google's interest to push users to sites that drive the most revenue for google, whether through advertising or otherwise.
If they are interested ranking results based on the profile they have on you (search history, interests, demographic++), trying to predict the ranks is understandable.
This is very interesting. As late as 2008, Google said they don't use any machine learning in search. Everything was hand engineered with tons of heuristics. They said they didn't trust machine learning, and that it created bizarre failure cases.
I believe it considering 2006-2008 was when all the deep learning pieces came together (some parts were decades old, some 5 years, some 2 years). Google's main push in ML is with deep learning. Although, I would like to see the source too. Tried to find it using Google, but no luck! :)
Abstractly, we understand how neural nets work. However, looking at a specific trained neural net, it can be difficult to determine the exact reason why certain weights are they way they are, and what effect they have on the whole.
Just like how we can know somewhat how neurons work from a modeling perspective, but when you bundle millions of them together, what exactly each one is doing is not quite clear.
We understand how they work mechanically, but not why they work from a theoretical stand-point (which is I assume what you're trying to say by "abstractly").
Why it is that ASGD and backprop converges on a non-convex optimization problem, and what kinds of model topologies make it do better / worse? That's all basically art right now.
No that's not what he's trying to say. We know why gradient descent and back-propagation works.
But no one understands what an actual trained neural net is doing. You can look at the weights, and you can watch the inputs and outputs, but it is very difficult to understand why it does what it does.
It just fit a model to data, but there is no explanation why that model is best. The weights are not interprettable by humans.
There have been some attempts at making models which humans can interpret. One program named Eureqa fit the simplest possible mathematical expression possible to a set of samples. A biologist tried it on his data and found that it created an expression which actually fit the data really well. But he couldn't publish it because he couldn't explain why it worked. It just did. But there was no understanding, no explanation.
"We know why gradient descent and back-propagation works."
I would phrase that differently: we know when gradient descent and back-propagation work, not why they work for so many real world problems.
For the when, there are zillions of published mathematical results stating "if a problem has property X, this and this method will find a solution with property Y in time T", in zillions of variations (Y can be the true optimum, a value within x% of the real optimum, the true optimum 'most of the time', etc. T can be 'eventually', 'after O(n^3) iterations', 'always', etc)
However, for most real-world problems we do not know whether they have property X, or even how to go about arguing that it is likely they have property X, other than the somewhat circular "algorithm A seems to work well for it, and we know it works for problems of type X"
Well, we do know why gradient descent works (for smooth data), at least for finding a local minimum, because finding the minimum is basically what it does by construction. Similarly, we certainly know how back-propagation works, because it's simple calculus, backwards application of the chain rule.
Perhaps what you're trying to say is we don't know why finding local minima of these problems is good at solving the problem?
If that is what he was trying to say, then he's wrong. Hiddencost is right. We do not have a solid theoretical understanding of neural nets and why backprop, and all the other tricks (e.g. relu, dropout) works. There is basic intuition on why they work, but no rigorous theory. Even worse, we have no theory on which architectures work better or worse and why.
So whenever you read about some 50-layer net trained with this architecture with this padding/stride/normalization, and you wonder how they came up with that, the answer is: some grad student sat there, thought about his past experiences and the papers he's read of architectures that have worked well, and then spent months trying a bunch of things.
I don't think that's quite right. There is a lot of understanding about neural nets and why different method work. Maybe not to the level that satisfies purist mathematicians, but nothing really satisfies them.
Yes it's true that hyperparameters seem arbitrary, but that's just a consequence of the no-free-lunch theorem. Some models will always fit some problems better than others. There is no such thing as a perfect model. NNs turn out to be a really good prior for real world problems. I wrote about why that might be here: http://houshalter.tumblr.com/post/120134087595/approximating...
But in principle, that doesn't apply to things like layer numbers. In theory the best neural net has infinite layers of infinite size and infinite convolution, with a stride of 1. Because you can fit every other model into that, and as long as the parameters are properly regularized it can also avoid overfitting too. In the real world of course, we are limited to merely 50 layers, and need to cut corners with convolution sizes and such.
Likewise for training. The best algorithm for training is bayesian inference on the parameters. Since that is usually very impractical, we use approximations like maximum-likelihood or dropout.
No, that's not even what the theory says. In theory, all you need is a single hidden layer, and all you need to do is keep increasing the number of hidden units until things work, because that can model any function (universal approximation theorem). What you said makes no sense theoretically (an infinite number of layers and convolutions makes no sense - I think you meant to use the word arbitrary). In the real world, we need to stack layers, for some reason that is only vaguely understood.
It goes without saying that the reason we have 50 (vs. deeper) layer nets and strides, among other things, is not solely to reduce computational cost.
The universal approximation theorem just says that you can construct a giant NN to fit any function. By making it a giant lookup table basically. It says nothing about fitting functions efficiently. I.e. generalizing from little data, using fewer parameters.
In order to do that you do need to use multiple layers. And the same is true for digital circuit, which NNs basically are. I'm sure there is mathematical theory and literature on the representation power of digital circuits.
There is a limit to what you can do with only one layer of circuits, and you can do more functions more efficiently with more layers. That is, taking the results of some operations, then doing more operations on those results. Composing functions. As opposed to just memorizing a lookup table, which is inefficient.
That's why multiple layers work better. It isn't some strange mystery.
>an infinite number of layers and convolutions makes no sense - I think you meant to use the word arbitrary
A better way to word it would be "as it approaches infinity" or "in the limit" or something. That is, the accuracy of the neural net should increase and only increase as you increase the number of layers and units (provided you have proper regularization/priors.) Since bigger models can emulate smaller models, but not vice versa.
Yes, in order to generalize better you need deeper nets. That was my whole point. But how deep? And what are the parameters of each layer? Grad students just pull those numbers from intuition. And it goes without saying that an infinitely deep net (whatever that means) would not generalize on little data, and would get even harder to train the deeper it gets. If it means what I think it means, you're basically claiming that recurrent neural nets can easily represent anything, but RNNs exist today, and they don't do the magic you're claiming they do.
The forward pass of a net is not theoretically interesting. It's the training of the net that has no theory. The training has nothing to do with digital circuits.
It goes without saying that you've handwaved some (perfectly fine) ideas about composing functions and such. And then claim "it isn't some strange mystery." That's my point. You've argued some ideas from intuition. There is little theoretical rigor around this, however.
While we understand the math of backpropegation, we don't really understand why it lets neural networks converge as well as they do. If you asked a mathematician they'd point out that yes, backpropegation can work, but there are no guarantees even around the probability that it will work. And yet, it does work, and often enough and in enough different cases for it to be useful.
Yes, interpretability can be an issue, but that's not what is being discussed here.
There have been some attempts at making models which humans can interpret. One program named Eureqa fit the simplest possible mathematical expression possible to a set of samples. A biologist tried it on his data and found that it created an expression which actually fit the data really well. But he couldn't publish it because he couldn't explain why it worked. It just did. But there was no understanding, no explanation.
This is a pretty well understood issue in the ML community. Different tools have different strengths: Regression (sometimes), Tree based classifiers and Bayesian nets are nice for explaining your data, but others methods (SVMs, Neural Networks etc) can be better for prediction in some circumstances.
Gradient descent is guaranteed to move the parameters closer to a more fit solution. That's almost the definition of GD. I guess you are saying that we don't know why it doesn't just get stuck in local optimas?
I'm not certain, but for one, stochastic gradient descent is usually used, which helps break out of local optimas. And second, they don't seem to be as much of a problem as you add more hidden units. I was under the impression there was a mathematical explanation for both of those, though I haven't done much research.
Off the top of my head, I think as you add infinite hidden units, a subset of them will fit the function exactly just by chance, and gradient descent will only increase their parameters. In fact as long as they are in some large possibility space of the correct solution, GD can move the parameters downhill to the optimal solution. Not that we even want the optimal solution, just a good enough one.
At this point I'm going to point you to [1]. They interview Ilya Sutskever (ex Toronto, ex Google Brain, now director of OpenAI). He knows a little about neural networks[2] and possibly knows as much as anyone else about gradient descent.
In [1] he talks a lot about how there is no theoretical basis to think that a deep neural network should converge, and prior to around 2006 the accepted wisdom was that networks deep enough to outperform other methods of machine learning were useless because they couldn't be trained. Then they discovered that using larger random values for initialization means that they do converge, but that there is no theoretical basis to explain this at all.
I believe that has to do with the vanishing gradient issue which has been studied.
As for why nets converge at all, there's this paper whichI believe tries to establish some theory why bigger nets don't have such a big problem with local optima: http://arxiv.org/abs/1412.0233
>The number of local minima outside that band diminishes exponentially with the size of the network. We empirically verify that the mathematical model exhibits similar behavior as the computer simulations, despite the presence of high dependencies in real networks. We conjecture that both simulated annealing and SGD converge to the band of low critical points, and that all critical points found there are local minima of high quality measured by the test error. This emphasizes a major difference between large- and small-size networks where for the latter poor quality local minima have non-zero probability of being recovered.
Your link supports my point, not yours. I'm aware of research showing that local minima are close to the global minima. That does not mean neural nets usually converge to the global minima, only that, the local minima they converge to is close to the global minimum.
> Finally, we prove that recovering the global minimum becomes harder as the network size increases
Do you have a citation or blog post for this application of Eureqa that gave a good result but wasn't publishable from a biological study perspective? I have a background in BME and CS and am very interested in learning about this anecdote.
Neural networks are really just a fancy statistical regression.
Just the same as you could run a high-degree polynomial regression on some data and "not really know" why it chose the coefficients it did or why it yields a particular output for another input, neural networks typically involve a lot of mysterious coefficients, and it's hard to qualitatively explain why the neural network gives a particular output for another input.
But in both cases, the underlying principles and mechanism of action are very well understood.
That's not true. Depending on the method for polynomial regression, you do know why it chose the coefficients. Furthermore, you know exactly what your model looks like, because you're dictating what the model is explicitly! (... a polynomial). In contrast to a convolutional neural net where you might have no idea what the filters in the third (not first) convolutional layer are doing. This is also a pretty bad example because polynomial regression is not giving amazing groundbreaking results. Polynomial regression has been known for centuries and gives pretty meh results in the real world.
As an aside to argonaut's response (which is correct) it is important to note that there is a lot of ongoing work in understanding what the hidden layers in a CNN are doing. For image-classification type problems, it's pretty easy to pass an image in to the network and output the features each layer extracts.
Indeed, many recent systems rely on CNN-extracted features being passed to a SVN classifier to classify images that the CNN wasn't trained on (because the SVN is easier to train will less data, and using the CNN frees one from doing feature engineering).
In my experience doing 3D image reconstruction from 1D and 2D sensors, there are often situations where the engineers can only explain what's going on using math equations - and not in any other qualitative kind of way. For some people, this might be considered not "completely understanding" something.
I submit I know of no one who can honestly and verifiably say they understand quantum spin without equations either. It just falls out of the math that way and is highly verifiable by experiment.
Determining why a particular neuron in a large model fired or did not fire is still a guessing game, as far as I know. Kind of fun to look at patterns of when certain neurons activate a few layers deep in image classification convnets https://youtu.be/ta5fdaqDT3M?t=27m42s
Deep learning experts understand how neural nets work. But:
1. They don't usually try to explain the specific activations in the hidden layers of the network. This is hard and depends on the specific trained net.
2. They can't guarantee a net's performance before experimenting with it.
RIP big data. Hello AI. Makes sense, data drives a lot of 'AI' tech. I guess what I find amusing is the push from Google to rebrand themselves as an AI company. My guess is it won't be too long until we see everyone else jumping in the AI branding boat. That will kind of dilute a lot of what is being done.
I can understand Amit Singhal's opposition to replacing hand-coded features with machine learning models. He's right that ML models have bizarre failure cases across large sample sizes, but he's apparently career-endingly wrong to seemingly believe that one cannot do anything about it. He's also wrong IMO to not recognize that hand-crafted signals and features lack bizarre failure cases themselves.
IMO this shifts the focus from lovingly hand-crafted signals and features to lovingly hand-crafted loss functions and variants of boosting and training algorithms to address those bizarre failures as they occur. For example, recently much ado was made about minimal changes to the input data to image recognition convolutional nets to spoof the object ID. And the simplest remedy is to augment the training data with these cases and perhaps boost the gradients of outputs that are wrong. It's not perfect, but Google search was never perfect either. Evidence: I was on the Google search team for a bit and we had all sorts of meetings to address such failures as they happened.
While I agree that the quality of Google technical searches has declined dramatically recently, I believe there's huge opportunity to fix them by understanding why the ML models are failing (shooting from the hip, I suspect it's a long-tail problem writ large) and changing the loss functions, models and training algorithms to address these failures as they're detected.
Anything less IMO is a failure of imagination in an age of 6.6 TFLOPS for ~$1000 and the ability to stuff 8 of them into a $20K server and go wild.
Tech hype is a little like old spontaneous combustion of some oily rags in the corner: No telling just when they might ignite, but when they do the result can be a big fire, for a short while.
Once the hype gets a flicker, there are good sources of more fuel to make the fire bigger. E.g., the situation is old, say, back to the movie Lawrence of Arabia where a news reporter was talking to Prince Faisal and said: "You want your story told, and I desperately want a story to tell.". So, tech people who want their story told get with tech journalists who desperately want a story to tell.
One such case doesn't mean very much, but once the fire starts, more techies and more journalists do the same because the fact that there are already lots of stories gives each new story some automatic credibility.
But, fairly soon the stories get to be about the same, with little visible progress (usual situation in reality), and interest falls, the bubble bursts, becomes yesterday's news. Then, the world moves on to another source of a hype conflagration, bubble, viral storm, whatever.
For AI, by 1985 DARPA funding at the MIT AI Lab had gotten AI going. There were expert systems and more. Lots of hype. In a few years, the fire went out, the bubble burst, and there was AI winter.
For the next bubble, say, System-K (right, doesn't mean anything), print up some labels about System-K. Then order a gross of children's bubble bottles, right, soapy water with a plastic stick with a circle at the end good for blowing bubbles. Put the labels on the bottles and send them to various departments at Stanford, start up companies in Silicon Valley, VC firms on Sand Hill Road, and tech journalists. Then stand back and watch the media conflagration for System-K! So, get stories:
I don't quite understand why people want to dismiss examples of machine learning as valid techniques for understanding the human environment.. It's not as if the human brain was built and guided from nothing, many of the same adaptive principles are as present in our minds as they are in other mammals and equally so from where all the branches divide. Even tiny organisms. And we seems to center the brain at the core of humans intelligence, when there's a range of chemical and metabolic coordination going that might bypass the brain entirely.
It's efficient, failure resistant models that matter. We're talking about accelerated learning, finding the models that work out of all those many iterations that fail. You can model it, decompile the results and try to understand and emulate what makes things seem real, but we don't even need to analyze it, because case by case it changes and it's circumstance makes things very different. 'Many ways to skin a cat'.
I think the challenge of the future is finding the general API that can negotiate all the things and make all the parts communicate, the kernel if you want. We can determine optimum speech algorithms, babel communication, create seeing eyes that recognize objects, optimize forms that can negotiate physical terrain, work out what is meant in human expression, but it's not until all these units work together that the 'AI' will seem seamless in human terms.
All of those parts have discreet forms, they generate a lineage of algorithms from iterations based on code, languages often derived from need. A Lisp might be the best way of interpreting language, a Haskell might be work best for defining strict biomechanics and area physics. Different abstractions are better for the results they are designed to intuit. But when we are to create the ultimate neural net, the composite of all these machine languages that are constantly required to optimize beyond human intelligible understanding, what will be using? What structure will state 'this works good enough' to not bother with the computation any more - in the familiar context of why don't our eyes have faster frame rate, need better detail, or need us to see into UV. What regulates such a machine, and how does a machine understand failure without guidance?
I like to think of these questions when I see rough examples posited around potentials in machine learning. Getting one human system sorted is one thing, communicating the results to other sub-systems an optimize concurrent results is another. The data model is too huge to even comprehend!
I'm just excited that these things exist, that there are individuals, research groups and companies looking at the what makes us 'us'. It might help us unlock the features of the brain and evolution.. Used for commercial gain - who cares, just a small cog, with revenue to continue development.
One part of me truly hope Google to success in ML/AI, although I consider Google an evil company. AI, Singularity, they are the most important things in this century. The implication is simply beyond our imagination. I don't care too much if Skynet takes over the earth and kicks human being into the dustbin. If it's the destiny, so be it.
Another part of me believe it's a sign of folks don't know what they are doing, writing. How can we achieve AI without understanding? Google will fall apart.
I think there are a couple of really overhyped areas right now, AR/AI/ML and IoT/IoE. Now while I don't mind the attention and money being thrown at tech, I can't help but feel we're borrowing more against promises, hopes, and dreams, while simultaneously under-delivering and I think that's going to hurt tech's image and erode investor confidence sooner than later.