I always thought the preferred method was to average the gradient updates, and pass that to update the single mother-model.
I always thought the preferred method was to average the gradient updates, and pass that to update the single mother-model.