Sure. First you need to separate buffered and unbuffered channels.
Unbuffered channels basically operate like cooperate async/await but without the explictness. In cooperative multitasking, putting something on an unbuffered channel is essentially a yield().
An awful lot of day-to-day programming is servicing requests. That could be HTTP, an RPC (eg gRPC, Thrift) or otherwise. For this kind of model IMHO you almost never want to be dealing with thread primitives in application code. It's a recipe for disaster. It's so easy to make mistakes. Plus, you often need to make expensive calls of your own (eg reading from or writing to a data store of some kind) so there's no really a performance benefit.
That's what makes cooperative async/await so good for application code. The system should provide compatible APIs for doing network requests (etc). You never have to worry about out-of-order processing, mutexes, thread pool starvation or a million other issues.
Which brings me to the more complicated case of buffered channels. IME buffered channels are almost always a premature optimization that is often hiding concurrency issues. As in if that buffered channels fills up you may deadlock where you otherwise wouldn't if the buffer wasn't full. That can be hard to test for or find until it happens in production.
But let's revisit why you're optimizing this with a buffered channel. It's rare that you're CPU-bound. If the channel consumer talks to the network any perceived benefit of concurrency is automatically gone.
So async/await doesn't allow you to buffer and create bugs for little benefit and otherwise acts like unbuffered channels. That's why I think it's a superior programming model for most applications.
Buffers are there to deal with flow variances. What you are describing as the "ideal system" is a clockwork. Your async-awaits are meshed gears. For this approach to be "ideal" it needs to be able to uniformly handle the dynamic range of the load on the system. This means every part of the clockwork requires the same performance envelope. (a little wheel is spinning so fast that it causes metal fatigue; a flow hits the performance ceiling of an intermediary component). So it either fails or limits the system's cyclical rate. These 'speed bumps' are (because of the clockwork approach) felt throughout the flow. That is why we put buffers in between two active components. Now we have a greater dynamic range window of operation without speed bumps.
It shouldn't be too difficult to address testing of buffered systems at implementation time. Possibly pragma/compile-time capabilities allowing for injecting 'delay' in the sink side to trivially create "full buffer" conditions and test for it.
There are no golden hammers because the problem domain is not as simple as a nail. Tradeoffs and considerations. I don't think I will ever ditch either (shallow, preferred) buffers or channels. They have their use.
I agree with many of your points, including coroutines being a good abstraction.
The reality is though that you are directly fighting or reimplementing the OS scheduler.
I haven’t found an abstraction that does exactly what I want but unfortunately any sort of structured concurrency tends to end up with coloured functions.
Something like C++ stdexec seems interesting but there are still elements of function colouring in there if you need to deal with async. The advantage is that you can compose coroutines and synchronous code.
For me I want a solution where I don’t need to care whether a function is running on the async event loop, a separate thread, a coprocessor or even a different computer and the actor/CSP model tends to model that the best way. Coroutines are an implementation detail and shouldn’t be exposed in an API but that is a strong opinion.
As you probably know, Rust ended up with async/await. This video goes deep into that and the alternatives, and it changed my opinions a bit: https://www.youtube.com/watch?v=lJ3NC-R3gSI
Golang differs from Rust by having a runtime underneath. If you're already paying for that, it's probably better to do greenthreading than async/await, which is what Go did. I still find the Go syntax for this more bothersome and error-prone, as you said, but there are other solutions to that.
I can see the appeal for simplicity of concept and not requiring any runtime, but it has some hard tradeoffs. In particular the ones around colored functions and how that makes it feel like concurrency was sort of tacked onto the languages that use it. Being cooperative adds a performance cost as well which I'm not sure I'd be on board with.
Unbuffered channels basically operate like cooperate async/await but without the explictness. In cooperative multitasking, putting something on an unbuffered channel is essentially a yield().
An awful lot of day-to-day programming is servicing requests. That could be HTTP, an RPC (eg gRPC, Thrift) or otherwise. For this kind of model IMHO you almost never want to be dealing with thread primitives in application code. It's a recipe for disaster. It's so easy to make mistakes. Plus, you often need to make expensive calls of your own (eg reading from or writing to a data store of some kind) so there's no really a performance benefit.
That's what makes cooperative async/await so good for application code. The system should provide compatible APIs for doing network requests (etc). You never have to worry about out-of-order processing, mutexes, thread pool starvation or a million other issues.
Which brings me to the more complicated case of buffered channels. IME buffered channels are almost always a premature optimization that is often hiding concurrency issues. As in if that buffered channels fills up you may deadlock where you otherwise wouldn't if the buffer wasn't full. That can be hard to test for or find until it happens in production.
But let's revisit why you're optimizing this with a buffered channel. It's rare that you're CPU-bound. If the channel consumer talks to the network any perceived benefit of concurrency is automatically gone.
So async/await doesn't allow you to buffer and create bugs for little benefit and otherwise acts like unbuffered channels. That's why I think it's a superior programming model for most applications.