You are right that the statement was overblown, however when I was testing with ...

gpderetta · 2025-10-27T12:37:05 1761568625

In my test of a similar setup in C++ (IIRC about 10 years ago!), I was able to do a context switch every other cycle. The bottleneck was literally the cycles per taken jump of the microarchitecture I was testing again. As in your case it was a trivial test with two coroutines doing nothing except context switching, so the compiler had no need to save any registers at all and I carefully defined the ABI to be able to keep stack and instruction pointers in registers even across switches.