If you have direct language support, the runtime can either anticipate blocking ops and make sure there are enough other threads still running, or detect that an execution thread has already blocked (from a supervisor thread) and spawn a new one. How well that works depends on what you're doing. Computation is the trivial case. Network stuff is almost as good, because sockets are pretty good about not blocking when you've asked them not to. Filesystems aren't quite as good, so you have to use AIO and other tricks. Page faults are worst of all. Then you have to deal with NUMA scheduling issues yourself because the OS can no longer do it for you, and so on.
So yeah, if you have a really good runtime you can see some significant gains on some systems and workloads. More often you'll get very modest gains, and you'll have some extremely subtle bugs that cause a precipitous drop in performance e.g. when you start paging. That's "ending in tears" for the poor schlep who has to debug that stuff. It can be done but, as I said, it's not the best bang for the buck.
So yeah, if you have a really good runtime you can see some significant gains on some systems and workloads. More often you'll get very modest gains, and you'll have some extremely subtle bugs that cause a precipitous drop in performance e.g. when you start paging. That's "ending in tears" for the poor schlep who has to debug that stuff. It can be done but, as I said, it's not the best bang for the buck.