This is an opinionated video that tries to rewrite history. For example, accordi...

pinouchon · on June 11, 2016

Omitting the other members of the LBH conspiracy and Schmidhuber is unfortunate, but I agree with the idea that the number one reason deep learning is working now is scale. Hinton also says it himself, for example (at 5m45 in the video below): "What was wrong in the 80 was that we didn't have enough data and we didn't have enough compute power. [...] Those were the main things that were wrong".

He gives a "brief history of backpropagation" here: https://youtu.be/l2dVjADTEDU?t=4m35s

lars · on June 11, 2016

I agree that scale is an important factor in deep learning's success, but that Google experiment ended up being a good example of how not to do it. They used 16000 CPU cores to get that cat detector. A short while later, a group at Baidu were able to replicate the same network with only 3 computers with 4 GPUs each. (The latter group was also lead by Andrew Ng.)

espadrine · on June 12, 2016

Incidentally, seeing the speaker set up an overkill neural network for a trivial classification problem seemed off to me. Unsurprisingly, at least 75% of the neurons were unused.

Throwing a phenomenal amount of neurons at a problem is not a goal; using a minimal amount to solve it in a given time budget is.

The statement at the end of the video, “all the serious applications from here on out need to have deep learning and AI inside”, seems awfully misguided. Even DeepMind doesn't use deep learning for everything.

withfries2 · on June 11, 2016

cs702, this deck started as a short talk so I didn't have time to acknowledge all the great work leading up to the Google YouTube experiment. Your list is good, and I'd add Rosenblatt for the first perceptron, Rumelhart & McClelland for applying techniques to perception, Werbos for backpropogation, Fukushima for convolutional networks, and so many more.

I found these helpful while researching the history: http://www.andreykurenkov.com/writing/a-brief-history-of-neu... http://www.scholarpedia.org/article/Deep_Learning

What else have you found particularly useful?

cs702 · on June 11, 2016

withfries2: Bengio's 2009 survey and Schmidhuber's "conspiracy" blog post contain useful, accurate historical background from the perspective of two leading researchers, with lots of links to additional sources:

http://www.iro.umontreal.ca/~bengioy/papers/ftml.pdf

http://people.idsia.ch/~juergen/deep-learning-conspiracy.htm...

Were I in your shoes, I would NOT have highlighted the Google YouTube experiment as "the" big breakthrough. It was just an interesting worthwhile experiment by one of many groups of talented AI researchers who have made slow progress over decades of hard work. Why single it out?

--

PS. The YouTube experiment did not produce new theory, and from a practical standpoint, it would be unfair to say that it reignited interest in deep learning. Consider that the paper currently has only ~800 citations, according to Google Scholar.[1] For comparison, Krizhevsky et al's paper describing the deep net that won Imagenet (trained on one computer with one GPU) has over 5000 citations.[2] And neither of these experiments deserves to be called "the" big breakthrough.

[1] https://scholar.google.com/citations?view_op=view_citation&h...

[2] https://scholar.google.com/citations?view_op=view_citation&h...

withfries2 · on June 11, 2016

Mostly, I wanted to highlight the importance of scale (data + compute) for the accuracy of deep networks.

smhx · on June 12, 2016

The video is almost cringe-worthy in it's factual inaccuracy around 2012.

The reigniting of deep learning around 2012 was because of Krizhevsky, Sutskever & Hinton winning the Imagenet challenge (1000 object classes)

Contrary to how much Google tried to sell Andrew Ng's "breakthrough" 2012 experiment with tons of PR, the paper is very weak, and cant be reproduced unless you do a healthy amount of hand-waving. For example, to get an unsupervised cat, you have to initialize your image close to a cat and do gradient descent wrt the input. Or else, you dont get a cat... It is not even considered a good paper, forget being breakthrough. Also, those 16000 CPU cores etc. can be reproduced with a few 2012-class GPUs and much smaller time-span than their training time.

The next slide after the 2012 breakthrough that shows the Javascript neural network -- contrary to what it looks like -- is not TensorFlow either. It has cleverly and conveniently been given TensorFlow branding, so most people just confuse it, but it's just a separate Javascript library akin to convnet.js

Since your page gets a ton of hits, it's at least worth it to publish a comment about these GLARING inaccuracies.

p1esk · on June 12, 2016

Your criticism is unreasonable. The "cat face" 2012 paper is an excellent paper, and is a breakthrough.

1. They demonstrated a way to detect high level features with unsupervised learning, for the first time. That was the main stated goal of the paper, and they achieved it magnificently.

2. They devised a new type of an autoencoder, which achieved significantly higher accuracy than other methods.

3. They improved the state of the art for the 22k ImageNet classification by 70% (compare to 15% improvement for 1k ImageNet in the Krizhevsky's paper).

4. They managed to scale their model 100 times compared to the largest model of the time - not a trivial task.

You say "it can't be reproduced" and then "can be reproduced", in the same paragraph! :-)

Regarding initializing an input image close to a cat to "get a cat", I think you missed the point of that step - it was just an additional way to verify that the neuron is really detecting a cat. That step was completely optional. The main way to verify their achievement was the histogram showing how the neuron reacts to images with cats in them, and how it reacts to all other images. That histogram is the heart of the paper, not the artificially constructed image of a cat face.

smhx · on June 12, 2016

The only objective response I'll give to this comment is the number of citations (as pointed out by another user) https://news.ycombinator.com/item?id=11885161 .

It's not perfect, but I cant give a reasonable answer to the other extreme of an opinion. fwiw, as a researcher I spent quite some time on this paper, but that subjective point doesn't mean anything to you.

p1esk · on June 12, 2016

I've read hundreds of ML papers, and this one is better than 90%, maybe even 99%. It's one of the best papers published in 2012. The authors are some of best minds in the field. Statements like "It is not even considered a good paper", or "very weak" need some explanation, to say the least.

The only negative thing I can say about that paper is they have not open-sourced their code.

cs702 · on June 11, 2016

I understand, but singling out that one experiment and its lead researcher as "the" big breakthrough was -- and is -- insulting to the hard work of a long list of others who go unmentioned. The worst part about this is that I can imagine, say, journalists with deadlines relying on your video as an authoritative source of historical information.

raverbashing · on June 11, 2016

Yeah. Between LeCunn and the Google experiment a lot of things happened

Not to mention the use of ReLu and understanding of the Vanishing Gradient issue