Yes, in order to generalize better you need deeper nets. That was my whole point...

Houshalter · on Feb 6, 2016

>an infinitely deep net (whatever that means) would not generalize on little data, and would get even harder to train the deeper it gets.

Not with proper priors/regularization.

>You've argued some ideas from intuition. There is little theoretical rigor around this, however.

There's this paper which goes more into theoretical depth on the idea: http://arxiv.org/abs/1412.0233

argonaut · on Feb 7, 2016

It goes into theoretical detail to show one fact, that local optima are close to the global optimum. It does not prove anything else.