Remember, this is an intermediate encoding of a hidden feature space between perception and planning. What you see at the start and the end of the neural network might be very different. Consider this: Typing at 60 words/minute, 5 characters/word and 8 bits/character gives a gross bit rate of 40 bits/second. With today's compression algorithms, you can easily get 4:1 reduction in data. That leaves you at approximately 10bits/second that are consciously processed in your brain. Probably even less since your brain might be much better at encoding language than even our best models. Even if some of those numbers are off by a certain factor, the number in the paper is certainly in the right ballpark when you consider orders of magnitude.
So the argument is that compression is not processing? That's a very weird definition of processing. Also when we do this we can always argue that we get down to 10bit/s, just change the compression ratio.