I do not believe the networking issue is orthogonal. I believe you will find tha...

I do not believe the networking issue is orthogonal. I believe you will find that any communication between two OS threads is significantly slower than switching one thread between an active and a parked goroutine. Futexes are slow and spinlocks burn CPU. (This is exactly the case the C++ model I mentioned ran into, and why gccgo got an N:M model.)

But as I said I'm happy to be convinced. The easiest way to demonstrate it would be to move gccgo linux/amd64 back to 1:1 without hurting the channel benchmarks. You could use that as an argument that fanning out an epoll loop among threads can be made fast.