At Google Next (which happened after this post was published), Google explicitly positions their AutoML product as a beginner approach to model building to get companies up-and-running with machine learning at low time/cost, and not an end-all-be-all approach.
Here's a recording from a session titled "How to Get Started Injecting AI Into Your Applications" which illustrates case studies for AutoML: https://www.youtube.com/watch?v=O7iT1INWrqo
His comments about the "dangers" of academic research and PhDs is incredibly strange. The very examples he uses as possibly superior alternatives to the neural search approach are also by PhDs and academia. Possibly the fact that he runs an machine learning trade school is leading to this out of place criticism.
Also, it cherry picks examples of failed startups or ideas, but ignores all of the ML that Google did turn into viable end user experiences: Smart Reply, WaveNet, Translate, Speech Recognition, RankBrain, Photo/Image Search, Retinal diagnosis, Waymo (soon), and numerous other components that are used in Android and the Assistant.
The whole point of venture capital or R&D is that you don't know what will work, and expect 40 failures for 1 success. She seems to think that the way science progresses is that you iterate on a Phd thesis until it's "Done" and verified and then it becomes applied. But most Phd theses, even though that survive peer review, end up inapplicable, unused, or forgotten. Not everyone publishes a General Relativity paper. A ton of Phd papers are junk, a good number don't even have reproducible results.
The only way to know if someone will be successful is to try it and let it succeed or fail in the marketplace.
> Smart Reply, WaveNet, Translate, Speech Recognition, RankBrain, Photo/Image Search, Retinal diagnosis, Waymo (soon), and numerous other components that are used in Android and the Assistant.
These aren't examples of just productionizing a PhD thesis - they're thoughtfully designed products that solve a real problem.
Unfortunately we've watched as many ML PhD graduates launch startups which are little more than an API wrapped around the key algorithm from their thesis. These startups nearly always fail, because they don't actually address a market need.
> The only way to know if someone will be successful is to try it and let it succeed or fail in the marketplace.
There are many ways to estimate potential market size, product-market fit, etc ahead of time. They're not perfect, but they're a lot better than nothing.
> These aren't examples of just productionizing a PhD thesis
His point is that sometimes research bears fruit, and sometimes it doesn't. All universities (in my country) now have metrics for how much research must successfully bear commercial-fruit or else they lose funding from the government and the EU. As such, they have commercialisation pipelines that funnel viable commercial research into products. However, even these funding bodies don't expect that anything more than a fraction of research will be commercialisable. That isn't the point of research.
> we've watched as many ML PhD graduates launch startups which are little more than an API wrapped around the key algorithm from their thesis. These startups nearly always fail, because they don't actually address a market need.
Why do you care what someone does with their PhD research anyway?
Without PhDs work, you have no clue where to look. Pushing the idea to market not only requires a market fit, but also requires BS everywhere to educate customers. Building a product are efforts, but it is as equal as the efforts of creating the idea.
It’s nice but … I still have to scroll about 15 pages into “cat” to see a picture which isn’t my dog. I’m receptive to the argument that this is more an advertising/ positioning move than a major advance.
Many of Google's advances in machine learning really do owe their existence to just throwing computers at the problem. Jeff Dean et al realized several years back that there were techniques in the literature that didn't seem to work all that well in the small, but they might work better with a crapton of parameters, if you trained the model with unprecedented amounts of CPU time. They were right: throwing megawatts at the problem is effective.
The line on this forum is that Google's product is its customers' data, but that's never been right. Google's product is, and has always been, dirt-cheap computing. They have a really large amount of computers, they are building more right now, and they want you to use them. The surplus of computer power within Google is what makes Googlers sit around thinking "sure, that was more CPU time than anyone has ever used for anything before, but what if we used 100x that much?"
So, what should we do instead? The author gives some recommendations:
"Research to make deep learning easier to use has a huge impact, making it faster and simpler to train better networks. Examples of exciting discoveries that have now become standard practice are:
* Dropout allows training on smaller datasets without over-fitting.
* Batch normalization allows for faster training.
* Rectified linear units help avoid gradient explosions.
Newer research to improve ease of use includes:
* The learning rate finder makes the training process more robust.
* Super convergence speeds up training, requiring fewer computational resources.
* “Custom heads” for existing architectures (e.g. modifying ResNet, which was initially designed for classification, so that it can be used to find bounding boxes or perform style transfer) allow for easier architecture reuse across a range of problems.
None of the above discoveries involve bare-metal power; instead, all of them were creative ideas of ways to do things differently."
These seem quite fundamental (except for the last one which sounds like a favorite domain adaptation strategy from the author).
I'm in industry myself, but it's quite hard to come up with fundamental strategies. In particular I'm trying to merge nonparametric Bayesian models with deep networks. By enriching the latent variables in autoencoders to richer priors we might see improvements. Subsequently, we simultaneously need better control variates to do inference in more complex models. See my blog post on the overlooked topic of control variates: https://www.annevanrossum.com/blog/2018/05/26/random-gradien.... If we really want to be creative we need people from academia on board.
We're not going to get very far letting machines guess what we want by making models we can't interpret.
Explainability is important, and critical in applications where lives are on the line.
Explainability where we're guessing what's happening in a black box model won't do either. Nothing but complete transparency of the model and why it's doing what it's doing. Its source code, that makes sense to humans, is needed. Full on model audit. No guessing.
I can think of only one company that's attempting to do this, and it's not anyone you hear working on explainability, including DARPA.
If the visual cortex were a machine model, people would be complaining about how we can't explain it and how it's a dangerous black box. They'd probably tout the many optical illusions as demonstrations of this danger.
Yet we don't demand that other humans explain how their visual cortex work. There is a double standard here.
I do not approve of the red herring. Either it is visual cortex, or the intelligence and decision making/planning. One works very differently from the other.
What's the difference between the output of a network and what an expert says? Unless you can probe mathematically why networks or human mind works, there is no explanation for any of both methods. I can argue that humans learnt from examples the same ways neural networks do, you can say the opposite. But we have no way to say that that any claim is false or true.
You can't make a full on model audit on a neural net, but there are architectures such as the Transformer (an attention scheme) that can give a lot of insight into what the neural net thinks. We can also visualise what inputs maximally activate a deep neuron in a CNN. Not all DL models are truly black boxes.
The problem is the models have no reflection capabilities unlike people.
The explanation is always done by a really different external system and sometimes by an actual intelligence.
State models (Markovian) are sometimes able to explain things but not always really, especially in complex cases.
On the other hand, humans tend first to take a decision and later retrofit an explanation to it, even if it is completely wrong, and we assume the explanation is the reason and not the effect.
This has to be one of the worst comment threads I've seen here.
First of all: Fast.ai are a non-profit. There's no ulterior motive here. I think a lot of the commenters here are feeling clever "looking for an angle", when there honestly is none.
Secondly, I really can't think who should have accrued more benefit-of-the-doubt than Rachel Thomas. It's just silly to take the point of view expressed here as insincere or motivated reasoning. None of this means you have to agree with the points raised, the predictions made, or the conclusions drawn. Of course. But the snide tone of many of the comments here is really discordant.
Finally, it's a little...revealing, that there's so much discussion of Jeremy here, including comments that seem to assume he wrote the article. I don't even know what to say about that.
1) "Google’s AutoML highlights some of the dangers of having an academic research lab embedded in a for-profit corporation"
2) "We can remove the biggest obstacles to using deep learning ... by making deep learning easier to use"
3) "Research to make deep learning easier to use has a huge impact"
Because this article rants against academics, but is written by an academic, it reads as inconsistent and disingenuous. Wild guess...AutoML and Google are worrying competitors to whatever their business is?
I know I sound like a fanboy, but fast.ai and Jeremy Howard are just so level-headed and "practical". I highly recommend anybody that wants to get into ML/DL to try out the 2 coursers they offer. Really fantastic way of teaching, and to apply these problems to real. life. In particular, the emphasis on tabular data (often ignored in academic, but very much used in industry) was very helpful.
Isn’t the argument between transfer learning and neural architecture search the same problem posed by the “no free lunch” theorem, which essentially states specialized algorithms will always beat generalized algorithms in specific tasks and vice versa?
Very few people write assembler code today, because compilers are good enough most of the time.
The same can happen if we get generic and robust enough ML tool.
Could you expand on that analogy? I'm not sure if a one-model fits-all is comparable to a one-optimizer-fits-all. Wouldn't that be an argument against transfer learning, since we don't have a universal programming language and thus there does exist multiple compilers?
"In evaluating Google’s claims, it’s valuable to keep in mind Google has a vested financial interest in convincing us that the key to effective use of deep learning is more computational power, because this is an area where they clearly beat the rest of us. If true, we may all need to purchase Google products. On its own, this doesn’t mean that Google’s claims are false, but it’s good be aware of what financial motivations could underlie their statements."
Excuse the snark, but:
"In evaluating FastAI's claims, it’s valuable to keep in mind FastAI has a vested financial interest in convincing us that the key to effective use of deep learning is more machine learning experts, because ML education is their business model. If true, we may all need to purchase FastAI courses. On its own, this doesn’t mean that FastAI’s claims are false, but it’s good be aware of what financial motivations could underlie their statements."
I can think of thousands of ways that Google could increase the computational power required that would be much easier than the AutoML effort (for ex. simply recommending ridiculously deep models that take 100x to train without giving higher performance). They are putting so much effort into AutoML because it just works. A lot of the things included in later parts of this series are very useful (learning rate search, etc.) and decrease the computational power required, but most people just want to drop a dataset and pick the type of model (say, multiclass image classification) and leave the rest for machines to optimize.
Your FastAI criticism isn't really accurate, as FastAI is free online courseware. I suppose you could claim that jhoward makes his living from his personal brand, and FastAI helps that... but that's a bit of a stretch.
It would be much more valid if they were a business, rather than an open source software and free course solution. Even as a competitor, it's probably useful criticism to get a negative perspective from a competitor.
> most people just want to drop a dataset and pick the type of model (say, multiclass image classification) and leave the rest for machines to optimize.
I think the disconnect here is that you can reuse existing architectures and get state-of-the-art performance without running something like AutoML. It's not clear that creating a bespoke architecture for your specific problem is always better, let alone always a good use of your resources.
Frankly, I don't think this is obvious.
Case in point: sequential models (granted, probably not yet included in the first batch of AutoML). There are so many ways to model the problem that tweaking each of these ways, in a way that makes sense for your dataset, takes a lot of work.
I've built models that worked fairly OK, only to have a colleague build a separate model that had +10% prAUC by virtue of adding some additional mechanism (say, attention, a different RNN cell, more units, etc.).
I'll also say that this is aimed towards people that are unfamiliar with ML and would have trouble finding and re-implementing the state-of-the-art in their specific field.
Sorry for the tone here (unusually snarky comment).
Arguably, my point was that this argument, biasing your analysis by hidden motives that cannot be disproven, is invalid.
But clearly, I was wrong because fast.ai's courses are indeed free so the comparison doesn't stand.
I'll also add that fast.ai has great ML courses that go deep into details, and focus on practical, applicable, techniques. My comment was specifically about the argument that Google is doing this just to increase Cloud revenues.
Ignoring the fact that Auto ML can't have an impact is how Nokia and Blackberry laughed at iPhone that "huh! who'll use a full-touch phone". Data Scientists by profession are really insecure about this fact of Auto ML hence making a claim that can free up their mind.
Data Scientists in many companies have become close to what SQL Job or any web script job runners used to do and then CRON jobs came into pic and that's what Auto ML will do here with a a set of sequential steps replacing mundance ML activities.
Wow, you sure got them good. Thanks for rooting out the financial motivations of fast.ai, a nonprofit research lab that publishes courses online for free.
Meh. As far as I'm concerned, it does _exactly_ what it says on the tin -- democratizes access to custom ML. Will you get a state of the art model out of it? Probably not. Will you be able to train a decent model and deploy it at scale fairly easily? Probably yes. That's the problem they're solving: getting people to use ML/AI without hiring a $400K/yr research scientist.
I'm not a businessperson and I try to avoid talking as if I were one, but I don't see what is the, well, business incentive for Google to "democratize access to custom ML". Accordingly, I don't see them even trying to do that.
Rather, it seems to me that their policy so far amounts to an attempt at platform lock-in. If you want to do Deep Learning like Google does, you basically have to use their tools (yes, that's tensorflow I'm talking about) and their data (in the form of pre-trained models) and eventually pay them for CPU or GPU time (you can also pay their competitors, of course).
In fact, I'd take this a step further and say that the whole Deep Learning hype is starting to sound like a total marketing trick, just to get more people to use Google's stuff, in the vain hopes of achieving their performance. However, to beat Google at their own game, with their own tools and their own data, running their models on their own computers... that does not sound like a winning proposition.
Well, the tools are free and open source, and other pretrained models are available for download. All that’s missing is an expensive PhD or two to make sense of it all, so that option is also available. You can do ML/DL on Google Cloud too in a completely portable way, using, for instance, Facebook’s PyTorch and pretrained models for it. You will discover, however, that it’s much harder and much more expensive to get anything usable that way, without Google doing most of the yak shaving for you, even though PyTorch is much easier to use than TF.
Here's a recording from a session titled "How to Get Started Injecting AI Into Your Applications" which illustrates case studies for AutoML: https://www.youtube.com/watch?v=O7iT1INWrqo