The Hutter prize submissions can get compression factors >9 on english wiki text. And if you're listening to podcasts the entropy is probably even lower. The human brain is obviously a much better language model than anything we have today, so you can assume that the latent layer in your brain deals with much less than 60 bits per second.
Each second of listening we're perceiving the speaker's identity, what accent they are using, how fast they are talking, and what emotions they are showing. Those should count for the bit rate dealt with by the conscious brain.
Again: perception is not what we're talking about and the paper acknowledges that perceptive input is orders of magnitude larger. I challenge you to listen comprehensively to someone talking about a topic you don't know while identifying someone in a police lineup.