> One thing we can do which current LLMs can't, at least directly as far as I'm ...

> One thing we can do which current LLMs can't, at least directly as far as I'm aware, is to go back and re-read a section. Like on-demand RAG, or something.

Encoders can do that. And we can use them with diffusion to generate text [0].

This works because you don't impose a masked self attention for autoregressive decoding in the encoder, so subsequent layers can re-focus their key/query vector space to steer "backwards" information flow.

Happy reading. Feel free to get back!

[0]: https://arxiv.org/abs/2211.15029