The article claims that aside from memory consumption the main advantage of the ...

imtringued · on Oct 9, 2018

I'm so tired of reading this over and over again.

You're basically asking if generalized preemptive multitasking that has to solve every problem equally well is really that much slower than specialized cooperative multitasking that is tailored to specific language?

Of fucking course user level scheduling is going to be superior. Just think about it. Languages like erlang and ponylang dispatch to the scheduler after finishing the execution of every single function. Do you really think that switching to a new thread on every function call is going to be faster than not doing that? Consider that the way you're supposed to do iteration is via recursion which means you will have one function call on every iteration and therefore invoke the scheduler on almost every iteration. The user level scheduler can instantly switch to the next pending actor/goroutine/whatever as if it was just a regular function call. Meanwhile your regular threads will have to switch to the kernel first, now has to load the scheduling tree from main memory because the cache are filled with user level data, then has to make a complex scheduling decision and finally switch back and reload whatever data you just flushed out of the cache.

So yes the article's argument about time wasted on schedule makes a whole lot of sense because not only is the scheduling of preemptive threads itself more expensive, it also incurs extra costs through context switches and emptying the cache.

ptx · on Oct 9, 2018

> You're basically asking ...

No.

> So yes the article's argument about time wasted on schedule makes a whole lot of sense because ... context switches ...

Right. I understand that doing multi-tasking closer to the application code can be more efficient by avoiding involving the kernel and context switching.

What I was wondering about was this specific argument that the article was making:

Suppose for a minute that in your new system, switching between new threads takes only 100 nanoseconds. Even if all you did was context switch, you could only run about a million threads if you wanted to schedule each thread ten times per second. More importantly, you’d be maxing out your CPU to do so. Supporting truly massive concurrency requires another optimization: Only schedule a thread when you know it can do useful work! If you’re running that many threads, only a handful can be be doing useful work anyway. Go facilitates this by integrating channels and the scheduler. If a goroutine is waiting on a empty channel, the scheduler can see that and it won’t run the Goroutine.

Doesn't the OS also perform this optimization, i.e., only scheduling threads that can do useful work? If you have thousands of threads that are all sleeping or blocking on read, it was my understanding that the OS will not schedule them – contrary to what the article is saying above. Am I wrong about this?

brianpgordon · on Oct 9, 2018

This is exactly what I came to the comments to point out. You're 100% right. If the operating system's scheduler is any good at all, it won't wake up a thread which is blocked on something like read or select system calls. NIO-based servers on the JVM will (in their idle state) have a bunch of operating system threads all blocked on a select system call, waiting for a connection on the TCP socket. They won't be woken up until there's something to do.