You are averaging weights in distributed training? That seems like it would be r... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

uoaei on Jan 9, 2020 | parent | context | favorite | on: An idea from physics helps AI see in higher dimens...

You are averaging weights in distributed training? That seems like it would be rife with pitfalls unless you average after every batch.

I always thought the preferred method was to average the gradient updates, and pass that to update the single mother-model.

Consider applying for YC's Spring batch! Applications are open till Feb 11.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact