> by some miracle you do, you just use a subgradient This is the most succinct c...

fspeech · on Nov 4, 2019

I don't know why you think subgradient is that important. It's just a shorthand for anything reasonable. DNNs are overwhelmingly underdetermined and have many many minimizers. It's not so important to find the best one (an impossible task for sgd) as to find one that is good enough.

wenc · on Nov 4, 2019

> I don't know why you think subgradient is that important.

I underquoted. It's more the approach to handling of nondifferentiability in deep learning problems that is of interest to me, whether it involves subgradients or some other recovery approach.

These approaches typically do not work well in general nonlinear systems, but they seem to be ok in deep learning problems. I haven't read any attempts to explain this until I read parent comment.

> It's just a shorthand for anything reasonable. DNNs are overwhelmingly underdetermined and have many many minimizers.

This is not true for general nonlinear systems, hence my interest.