I alredy do that for reproducibility reasons, but I don't really think it takes ...

Aeolos · on Jan 30, 2017

Indeed, but if performance is affected so much by the initialisation, then I would avoid random initialisation in the first place. There are various publications exploring different initialisation methods for various problems.

I'm afraid that I cannot go into more specific details right now, but you can get more stable training and faster convergence with a better initialisation strategy.

yorwba · on Jan 30, 2017

Couln't you use an initialization pattern that includes all the weights of the smaller layer in the larger layer? This would keep the behavior of a subset of units exactly the same, at least at initialization time.