If it costs as much as a context switch, you might as well just do context switches.
These hosted language scaffoldings - whether that's asyncio, go routines, TPL, Webflux, etc. - exist specifically so you don't have to do a full context switch. If they cost as much as a context switch, they have failed. Regardless of what else is taking time in the system.
If you're not any better, just replace your whole hosted concurrency system with a statement that triggers sched_yield.
If you're not any better, just replace your whole hosted concurrency system with a statement that triggers sched_yield.