There's Task.Yield() which yields instantly. Task continuations happen on the same core by default, until you hit an IO completion or something else that knocks it onto the thread pool. This means that chaining lots of awaits together is very efficient, at least until you hit something that forces you to actually sleep.
In practice I tend to use a lot of homemade TaskCompletionSource, explicit threading and interlocked stuff where I need more control of continuation.
There's also a downside to explicit synchronization which you don't mention - if you design your threading for one load pattern, and your actual load is a different pattern, it crushes your application and it's difficult to refactor.
For instance, if you expect few users and many requests you might have a thread per user with a work queue for their requests. If you have many users with few requests then you have thousands of threads, which are actually context switches unlike Task yields.
I've heard that Midori was 50% faster than Windows, and it was nearly entirely written in something like C# with something like Tasks. The runtime was extremely different (no virtual memory, no threads) but it proves that the model can outperform traditional OS threading.
In practice I tend to use a lot of homemade TaskCompletionSource, explicit threading and interlocked stuff where I need more control of continuation.
There's also a downside to explicit synchronization which you don't mention - if you design your threading for one load pattern, and your actual load is a different pattern, it crushes your application and it's difficult to refactor.
For instance, if you expect few users and many requests you might have a thread per user with a work queue for their requests. If you have many users with few requests then you have thousands of threads, which are actually context switches unlike Task yields.
I've heard that Midori was 50% faster than Windows, and it was nearly entirely written in something like C# with something like Tasks. The runtime was extremely different (no virtual memory, no threads) but it proves that the model can outperform traditional OS threading.