I alredy do that for reproducibility reasons, but I don't really think it takes luck out of the equation. 42 may be a great seed for a model with 400 cells per layer and a terrible seed for a model with 600 cells per layer, as the different layout will lead to a totally different distributions of the weights even if the seed remains the same.
Indeed, but if performance is affected so much by the initialisation, then I would avoid random initialisation in the first place. There are various publications exploring different initialisation methods for various problems.
I'm afraid that I cannot go into more specific details right now, but you can get more stable training and faster convergence with a better initialisation strategy.
Couln't you use an initialization pattern that includes all the weights of the smaller layer in the larger layer? This would keep the behavior of a subset of units exactly the same, at least at initialization time.