Hacker News new | past | comments | ask | show | jobs | submit login

this is very nice! I think that the reason swiss roll doesn't work as easily might be because of initialization. In 2 dimensions you have to be very careful with initializing the weights or biases because small networks get more easily stuck in bad local minima.



In this case you see that it is the swiss roll so you could say pick "proper initialization".

But that technique would not work when you cannot see that it is a "swiss roll" or in multiple dimensions.


I'm pretty sure he wasn't talking about the swiss roll specifically. Big gains in neural net performance have been made through better initialization schemes (not dataset specific, just in general, e.g. an initialization scheme might adapt the initial weight distribution depending on the number of hidden units in the next layer), and smaller models are in general more sensitive to initialization.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: