Agreed. Embeddings are pretty big > 1024 \* 4 bits. And language is really small...

Agreed. Embeddings are pretty big > 1024 * 4 bits. And language is really small: ~1 bits per character. So it's not at all crazy that embeddings can be lossless. The paper shows a practical method to recover the text and shows how it applies generally. Here's a demo of how it works https://twitter.com/srush_nlp/status/1712559472491811221

(paper author)