I am barely understanding, so a stupid question: Does this also mean that it wou...

Geee on Nov 22, 2023 | parent | context | favorite | on: Exponentially faster language modelling

I am barely understanding, so a stupid question:

Does this also mean that it would be possible to train on parallel GPU-poor setup instead of needing lots of GPU memory / bandwidth on one computer?

dartos on Nov 24, 2023 [–]

Probably not. The paper talks about this as an inference time optimization