No, I don’t, but in my experience at Google the kernel scheduler is a big problem for thread-per-request servers pushed to their limits. Async completion-passing servers are more efficient at their limits but more difficult to write, maintain, and debug. Userspace thread scheduling expands the performance envelope of thread-per-request architecture.
This fits my expectation. I still tell most folks that starting at thread per request is a sensible starting place. The difficulty in writing continuation style can cause most efforts to stall out.
Of course, that last assertion needs data. And i could be wrong. :(