Oh yeah, our deep learning is a super cool technology, no need to study math, believe us, just buy more GPUs and they will help you, throw the data on the fan. Or, even better, give us your data, we'll decide what to do with it for you, upload your data to our cloud and relax.
Consider the following question - how many layers do you need to build a RELU network to approximate x^2 function on [0,1] with 1e-6 accuracy.
Exactly, this simple example demonstrates how hard is to learn even a simple function. Playing with your code, 10 hidden units give error 0.6 in 1000 iterations, 1000 hidden units give error 0.02 in 10000 iterations so it takes way longer to train. This is not an easy technology.
Consider the following question - how many layers do you need to build a RELU network to approximate x^2 function on [0,1] with 1e-6 accuracy.