Improving the way neural networks learn

TrainedMonkey · on Aug 1, 2014

Neural Networks class on coursera covered a lot of the same topics with both heavy math theory crafting and hefty amount of practical application. https://www.coursera.org/course/neuralnets

mailshanx · on Aug 1, 2014

It's a pity the lectures are not accessible any longer. How i wish i could find them somewhere!

varelse · on Aug 1, 2014

"View Course Record" is your friend...

And then you can download them with coursistant or coursera-dl and have them wherever you go...

maroonblazer · on Aug 1, 2014

I'm able to access them. You just need a Coursera account (free).

heurist · on Aug 1, 2014

>You have to realize that our theoretical tools are very weak. Sometimes, we have good mathematical intuitions for why a particular technique should work. Sometimes our intuition ends up being wrong [...] The questions become: how well does my method work on this particular problem, and how large is the set of problems on which it works well.

I'm not very familiar with this field. Has anyone made any progress on formalizing ways to measure the capabilities of intelligent systems? If the theory is weak, there must be someone working on improving it, right?

varelse · on Aug 1, 2014

Probably Dileep George et al. at Vicarious....

But since that's a $55M Black Hole with no published results other than a mostly meaningless claim to having solved Captcha (which wasn't all that tough a task to begin with), there's no way to tell since it doesn't seem like practitioners of the art are the ones evaluating his prospects for further funding. But don't believe some random dude on HN, here's Yann Le Cun saying pretty much the same thing:

https://plus.google.com/+YannLeCunPhD/posts/Qwj9EEkUJXY

michael_nielsen · on Aug 1, 2014

The OP is quoting LeCun, so this is not a coincidence!

varelse · on Aug 1, 2014

Hey Michael, I loved your book on Quantum Computing, but don't get me started on D-Wave or as I see it: $15M for a huge magic box that might be faster than a $15,000 GPU cluster for some problems.

But seriously, the book rocked, and this one's coming along nicely.

on Aug 1, 2014

[deleted]

dave_sullivan · on Aug 2, 2014

You can't use the data from one learning process to enhance the process of learning either a related or an unrelated task

I beg to differ sir: http://nlp.stanford.edu/pubs/SocherGanjooManningNg_NIPS2013....

This paper demonstrates something called "0 shot learning" where you can actually infer the correct label of an unseen image based on similarity among representations learned in a separate NLP task.

For instance, it can label an image "tiger" even if it has not seen tigers but has only learned about the word (and inferred its relation to cat, an image it has seen) from reading text.

It's not intelligent, not even close. But it's an awfully strange emergent phenomenon these concepts are demonstrating. Exciting stuff, I think.

araes · on Aug 1, 2014

It seems like the implicit target in the document is to achieve a critically damped system with no ringdown on learning. However, if they're trying to go for speed, then it seems like they should accept possible overshoot, and use non-linear control theory for their weights so that they're underdamped during the initial descent, and then transition into critically damped gradient descent as they move into the flat zone. Something like a variable "damper" or weights/springs based on current error. Perhaps that is done elsewhere though, and just not described as a technique here.

Houshalter · on Aug 2, 2014

That sort of sounds like Rprop which is supposedly the fastest learning algorithm. There are also other "adaptive learning rate" algorithms.

https://en.wikipedia.org/wiki/Rprop

araes · on Aug 2, 2014

Very cool. I had never heard of Rprop, but that's a neat way to trigger your learning that it needs to rapidly damp. Kind of like a limiter in CFD.

kghose · on Aug 1, 2014

I found that the statement about the cross entropy not true. When y==a the function is non-monotonic with 0 at the extremes but not at the middle. So the "proof" shown is confusing to me.

michael_nielsen · on Aug 1, 2014

Which statement about cross entropy is confusing?

kghose · on Aug 1, 2014

Hi!

The statement "if the neuron's actual output is close to the desired output, i.e., y=y(x) for all training inputs x, then the cross-entropy will be close to zero"

is not true. The function peaks in the middle (~ 0.7)

Thanks! -Kaushik

michael_nielsen · on Aug 1, 2014

This is addressed in the marginal note attached to the sentence you quoted.

The essential point is that we're considering classification problems, for which the output is intended to be 0 or 1. I address the more general case of regression problems (where y may take any value) in a later exercise.

Hope that helps!

kghose · on Aug 1, 2014

I see it now, thanks!

darksaints · on Aug 1, 2014

The sections on Regularization and Dropout have some amazing prose. I haven't read any of the other chapters, but just skimming through those sections have helped enlighten me on quite a few things that have confused me for years in completely different mediums...Such as why a random forest made up of randomly selected simple CARTs generally predicts better than a single complex CART, or why fitting a distribution to empirical data can benefit from using AIC or BIC methods.