Hacker News new | past | comments | ask | show | jobs | submit login

I alredy do that for reproducibility reasons, but I don't really think it takes luck out of the equation. 42 may be a great seed for a model with 400 cells per layer and a terrible seed for a model with 600 cells per layer, as the different layout will lead to a totally different distributions of the weights even if the seed remains the same.



Indeed, but if performance is affected so much by the initialisation, then I would avoid random initialisation in the first place. There are various publications exploring different initialisation methods for various problems.

I'm afraid that I cannot go into more specific details right now, but you can get more stable training and faster convergence with a better initialisation strategy.


Couln't you use an initialization pattern that includes all the weights of the smaller layer in the larger layer? This would keep the behavior of a subset of units exactly the same, at least at initialization time.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: