"Averaging Weights Leads to Wider Optima and Better Generalization"
Weirdly, the averaging doesn't have to be synchronous.
"Averaging Weights Leads to Wider Optima and Better Generalization"
Weirdly, the averaging doesn't have to be synchronous.