I mean what is the difference between like using 50 layers versus 75 layers? Ok, yeah we don't know, but to presume that we don't understand anything beyond 3-4 neurons is just nonsense.
By comparison. Go look at any synthetic chemistry paper. There's going to be crazy procedures. One time Phil Baran used Xenon Tetrafluoride as an oxidizing agent. And yeah, for some reason none of the other standard oxidizing agents worked. We don't know why. Hell it could have been that the grad student had bad hands with one of the standard ones. We can make a SWAG an to why XeF4 worked. But we'd be fooling ourselves to claim "we know", so we don't say we do. Does that discredit synthetic chemistry as an enterprise? Does that mean synthetic chemistry is not backed by theory?
That video is an introduction to the concept of CNNs for beginners. It explains in informal terms the structure and mechanics of a CNN, and probably gets around to discussing why it might work in very broad terms.
This is extremely far from what academically inclined people are talking about when they are using the word "theory".
People are hoping to see, ultimately, mathematical proofs that neural networks can e.g. efficiently learn target functions given a certain amount of training data and a certain number of trainable parameters.
To address your chemistry analogy: no, it doesn't discredit synthetic chemistry as an enterprise, it just means there is a gap in the theory. Similarly, the people bemoaning gaps in the deep learning theory aren't trying to discredit deep learning as an enterprise: the practical successes are very real; the theory is just lagging, as it often has in application-led fields.
> It explains in informal terms the structure and mechanics of a CNN
It actually gets quite formal. you could implement a (non-performant) CNN from everything in that video. The point is, that someone came up with a hypothesis "hey imposing this structure should yield faster convergence to usable results" is totally inductive science, and that is able to be explained quite well and simply from simple primitives that you don't need a math degree to understand really indicates some level of maturity in the field.
>People are hoping to see, ultimately, mathematical proofs that neural networks can e.g. efficiently learn target functions given a certain amount of training data and a certain number of trainable parameters.
yeah I don't dispute that, but to characterize our understanding of machine learning as not going beyond 3 or 4 neurons is just plain crazy.
Moreover, for some RNNs, e.g. character-based LSTMs trained on language models, we can extract feature details. I recall one case (sorry not bothered to find it) where there was a memory line dedicated to detetcting opening and closing quotation marks. Now this is a feature that one would expect to be obvious, or at least more so than other language features, but it's still a rather high level (and successful) attempt to understand how these machines function.
I'm sorry, lstms and convnets would seem to represent a rather high level of predictive power, given that someone postulated that they would work, and it turned out they were correct.