It should be noted these distinctions don't correlate with timing; you can have a hard realtime system that needs some network packets at 50ms±10ms intervals, and a soft realtime system that needs packets at 500µs±5µs.
Some audio setups are run quite "close to the metal", both because it needs less buffering, but also the lower human threshold for noticing latency seems to be around 10ms. And having audio not get out of phase with multiple sources/sinks gets added on top of that.
Correct. If you imagine having a dam overflow, the release valves are a hard realtime system. If the dam overflows for more than a few minutes, damage will occur, so the release valves need to be opened within, say, overflow plus 5mins. A generous deadline for any computer, but still a deadline that needs to be kept at all cost.
Some audio setups are run quite "close to the metal", both because it needs less buffering, but also the lower human threshold for noticing latency seems to be around 10ms. And having audio not get out of phase with multiple sources/sinks gets added on top of that.