this is very nice! I think that the reason swiss roll doesn't work as easily mig...

okigan · on April 13, 2016

In this case you see that it is the swiss roll so you could say pick "proper initialization".

But that technique would not work when you cannot see that it is a "swiss roll" or in multiple dimensions.

brianchu · on April 13, 2016

I'm pretty sure he wasn't talking about the swiss roll specifically. Big gains in neural net performance have been made through better initialization schemes (not dataset specific, just in general, e.g. an initialization scheme might adapt the initial weight distribution depending on the number of hidden units in the next layer), and smaller models are in general more sensitive to initialization.