For training you can often divide the batch size by n (and then only apply the b... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

matsemann on Sept 4, 2022 | parent | context | favorite | on: Running Stable Diffusion on Your GPU with Less Tha...

For training you can often divide the batch size by n (and then only apply the backprop gradient stuff after each n batches for it to be mathematically equivalent). At a cost of speed, though.

amelius on Sept 4, 2022 [–]

Do libraries like torch and tensorflow facilitate this?

fragmede on Sept 4, 2022 | | [–]

Yes, eg https://pytorch.org/docs/stable/generated/torch.nn.parallel....

amelius on Sept 4, 2022 | | | [–]

Thank you!

matsemann on Sept 4, 2022 | | [–]

Quite trivial to implement this yourself if you want to. See gradient accumulation in fastai for instance https://www.kaggle.com/code/jhoward/scaling-up-road-to-the-t...

Consider applying for YC's Spring batch! Applications are open till Feb 11.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact