AFAIK the model can’t be quantized during backprop, so right there you’d need a ton of RAM.
Backprop is faster bc it can be parallelized, but IIRC you need to hold an entire copy of the model for each backprop process.
AFAIK the model can’t be quantized during backprop, so right there you’d need a ton of RAM.
Backprop is faster bc it can be parallelized, but IIRC you need to hold an entire copy of the model for each backprop process.