The analogy with images is demonstrated in this screenshot [1], which is referenced in a StackOverflow answer on the topic of image scaling in web browsers [2].
When the high resolution image is downscaled poorly, some high (spatial) frequencies are aliased down into lower (spatial) frequencies, manifesting as blocky/jagged lines. The images are 2D data in the spatial domain, while the audio is 1D data in the temporal domain, but the aliasing of high frequency signal to lower frequency artifact is similar.
Viewing a high resolution image on a panel that has enough pixels to display it without scaling is analogous to listening to 192 kHz audio on a fantastic speaker system that can reproduce those high frequencies accurately, instead of causing distortion by aliasing them to lower frequencies. On the other side, viewing a high resolution image which has been downscaled poorly is analogous to listening to that 192 kHz audio on a realistic speaker system that cannot reproduce high frequencies, which results in those signals aliasing down into the audible range.
And as you say, there is a point where, for the viewer/listener's sake, it doesn't make sense to push for higher frequencies because even if you can build a panel/speaker that will faithfully reproduce those frequencies without aliasing, the eye/ear will not be able to perceive the additional detail.
Technically this is not aliasing; rather, the large and varied non-linearities of a speaker can act like a frequency mixer, which is why you'll get a 3 kHz sound (the difference) when playing, say, 20 and 23 kHz.
When the high resolution image is downscaled poorly, some high (spatial) frequencies are aliased down into lower (spatial) frequencies, manifesting as blocky/jagged lines. The images are 2D data in the spatial domain, while the audio is 1D data in the temporal domain, but the aliasing of high frequency signal to lower frequency artifact is similar.
Viewing a high resolution image on a panel that has enough pixels to display it without scaling is analogous to listening to 192 kHz audio on a fantastic speaker system that can reproduce those high frequencies accurately, instead of causing distortion by aliasing them to lower frequencies. On the other side, viewing a high resolution image which has been downscaled poorly is analogous to listening to that 192 kHz audio on a realistic speaker system that cannot reproduce high frequencies, which results in those signals aliasing down into the audible range.
And as you say, there is a point where, for the viewer/listener's sake, it doesn't make sense to push for higher frequencies because even if you can build a panel/speaker that will faithfully reproduce those frequencies without aliasing, the eye/ear will not be able to perceive the additional detail.
[1] http://www.maxrev.de/files/2012/08/screenshot_interpolation_...
[2] https://stackoverflow.com/a/11987735