Can you just have 1 thread per running task and give the thread back to a pool when the task waits for messages? Then for synchronous RPC you can swap the server task onto the current thread without OS scheduling and swap it back when it's done. You just need a combined 'send response and get next message' operation so the server and client can be swapped back again. This seems way easier and more robust, and you don't need work stealing since each running task has its own thread... what am I missing?
It doesn't work if you want to optimistically switch to the receiving task, but keep the sending task around with some work that it might like to do if other CPUs become idle. (For example, we've thought about scheduling JS GCs this way while JS is blocked on layout.)