Hacker News new | past | comments | ask | show | jobs | submit login

For training you can often divide the batch size by n (and then only apply the backprop gradient stuff after each n batches for it to be mathematically equivalent). At a cost of speed, though.



Do libraries like torch and tensorflow facilitate this?



Thank you!


Quite trivial to implement this yourself if you want to. See gradient accumulation in fastai for instance https://www.kaggle.com/code/jhoward/scaling-up-road-to-the-t...




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: