I imagine it's because lossy codecs are tuned for human perceptual limitations, and when you process audio, it can "pull" areas of the sound that are otherwise hidden from your perception into perceptual ranges, analogous to how fiddling with the brightness/contrast of a highly compressed JPEG image can accentuate the artifacts.
But what about lossless 16/44.1KHz audio? It's not a "compressed" version of 24/192Khz. Chances are that your 192KHz music will be low pass filtered anyway, 192KHz is a trick that sound chipset vendors added so that integrators wouldn't have to make high quality analog filters for their ADCs. There's likely nothing above 22.05KHz except shaped noise, production artifacts, and very, very quiet ultrasound noises which have nothing to do with the music.
There's no such thing as lossless 16/44.1 because it's practically impossible to make an artefact-free antialiasing/reconstruction filter pair clocked at 44.1k.
The point of higher sample rates isn't to capture bat squeaks, but to relax the engineering constraints on pre-post filtering.
Nyquist is fine in theory, and if you've never actually tried to implement a clean filter you'll likely watch the xiph video and think "Well that makes sense."
If actually know something about practical DSP and the constraints and challenges of real filter design you're not going to be quite so easily impressed.
Likewise with higher bit depths. "Common sense" suggests that no one should be able to hear a noise signal at -90dB.
Common sense is dead wrong, because the effects of a single bit of dither are absolutely audible.
And if you can hear the effects of noise added at -90dB, you can certainly hear the effects of quantisation noise artefacts on reverb tails and long decaying notes at -30 to -40dB, added by recording at 16 bits instead of 24 bits.
Whether or not that level of detail is present in a typical pop or classical recording is a different issue. Realistically most music is heavily compressed and limited, so the answer is usually "no."
And not all sources have 24-bits of detail [1]. (Recordings made on the typical digital multitrack machines used in the 80s and 90s certainly don't.)
That doesn't mean that a clean unprocessed recording of music with a wide dynamic range made on no-compromise equipment won't show the difference clearly.
Speaking from experience, it certainly does.
[1] Technically no sources have 24-bits of detail. The best you'll get from a real world converter is around 22-bits.