https://www.docdroid.net/faDq8Bu/swarm-training-v01a.pdf We average the weights ...

uoaei · on Jan 9, 2020

If you are averaging weights often enough, then it's basically the same as averaging gradients. If you average the weights of a bunch of independently-trained models, you're going to have a rough time. Even if the function computes the exact same thing, the order of rows and columns in the intermediate matrices will totally ruin your averaging strategy.