Interesting analogy: Erlang implements an M:N scheduling model, where it starts an OS thread for each CPU core (by default), and (preemptively) schedules its own lightweight processes on top of these threads. However, unlike OS threads, these are very lightweight: when they are created, they only need a few hundred bytes. Erlangers have been using the one process per request model for a very long time, and Erlang applications achieve massive concurrency via this.
To me it seems it is possible to do M:N right, but you need more abstraction and a different design. M:N seems to work well in Erlang-like cases (very lightweight processes on top of a kernel threads).
To me it seems it is possible to do M:N right, but you need more abstraction and a different design. M:N seems to work well in Erlang-like cases (very lightweight processes on top of a kernel threads).