As a computer vision researcher, I'm not at all convinced that deep learning methods will be "final" in any sense. I know that in the past, neural networks were "final", and then graphical models were "final", and so on.
And while deep learning methods have indeed shown remarkable improvements recently, they're not yet state-of-the-art on the most important/relevant computer vision benchmarks.
As a computer vision researcher it must be
pain you to see that all your learnings are
for nought when faced with deep learning
methods which can get amazing performances
from raw pixels (see mnist results for
example). Also see ronan collobert's natural
language processing from scratch paper where
handily beats the past few decades of nlp
research in parsing (in terms of efficiency,
and probably performances soon too). Or see
the microsoft research speech recognition
swork which has beaten out all previous by
a significant margin using deep learning.
Not at all! I'd love for vision to be solved, no matter what the method. I'm more than happy to move onto another field if that's the case.
But I don't think it is. MNIST data is not particularly challenging. It's great that deep learning methods work there -- they must be doing something right.
getting best results on the harder
vision challenges is simply a matter
of let the computers run long enough.
Collobert's work for example took
3 months of training. I don't see why
vision challenges should any different.
Perhaps the vision researchers, of which
there are many more people than the few
deep learning groups should try it.
And while deep learning methods have indeed shown remarkable improvements recently, they're not yet state-of-the-art on the most important/relevant computer vision benchmarks.