Hacker News new | past | comments | ask | show | jobs | submit login

I am barely understanding, so a stupid question:

Does this also mean that it would be possible to train on parallel GPU-poor setup instead of needing lots of GPU memory / bandwidth on one computer?




Probably not. The paper talks about this as an inference time optimization




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: