Back of the envelop: assuming 3 characters per token, 3 bytes per character (unicode is 1-4), that's 18 trillion bytes. So about 18 TB? Reasonable for disk size, unreasonable to be loaded in GPU memory.
Compute: Building the database requires lots of BERT pre-computation. But at inference time, RETRO is at least one BERT forward pass (a batch of all the chunks that it broke the input prompt into). Then the neighbors are computer via the Retro encoder (which seems small at 2 Transformer layers). Then the input prompt is processed similar to a GPT of 32 layers (with attention to the neighbors).
Run on a laptop? Perhaps on CPU. Try running T0 [1] which is larger at 11B parameters.
Compute: Building the database requires lots of BERT pre-computation. But at inference time, RETRO is at least one BERT forward pass (a batch of all the chunks that it broke the input prompt into). Then the neighbors are computer via the Retro encoder (which seems small at 2 Transformer layers). Then the input prompt is processed similar to a GPT of 32 layers (with attention to the neighbors).
Run on a laptop? Perhaps on CPU. Try running T0 [1] which is larger at 11B parameters.
[1] https://huggingface.co/bigscience/T0pp