If you constantly switch between two tasks from the bottom of their call stack (...

If you constantly switch between two tasks from the bottom of their call stack (as for stackless coroutines) and your stack switching code is inlined, then you can mostly avoid the mispaired call/ret penalty.

Also, if you control the compiler, an option is to compile all call/rets in and out of "io" code in terms of explicit jumps. A ret implemented as pop+indirect jump will be less less predictable than a paired ret, but has more chances to be predicted than an unpaired one.

My hope is that, if stackful coroutines become more mainstreams, CPU microarchitectures will start using a meta-predictor to chose between the return stack predictor and the indirect predictor.