Requiring heap allocation for coroutines was extremely contentious to say the least. But because C++ does not have a borrow checker, all the other allocation-less designs proved to be very error prone.
C++ coroutines do support allocators [1], so it is not a huge issue, but it does complicate the design further.
[1] the compiler is also allowed to elide the allocation, but a) it is not clear how this is different from the normal allocation elision allowance that compilers already had, and b) it is a fragile optimization.
What do you mean nested call? As a first approximation, you need an heap allocation for each coroutine function instance for its activation frame. Every time a coroutine instance is suspended, the previously allocated frame is reused. If you instantiate a coroutine from another coroutine, then yes you need to heap allocate again, unless the compiler can somehow merge the activation frames or you have a dedicated allocator.
Is that a given, though? Rust's generators are decent prior art - they generate a state machine that would only require heap allocation if the size of the state machine becomes unbounded (for example, a recursive generator). Otherwise the generator is perfectly capable of being stack allocated in its entirety. This turns out to be sufficient for a large amount of programs, with a sufficient workaround for the ones where you can't (box the generator, making the allocation explicit).
oh, yes, in rust coroutines do not normally allocate as far as I understand. This is not the case in C++ unfortunately. This was extremely contentious to say the least, but all alternative designs were either very unsafe or were presented very late, so the committee has preferred to go with something working now instead of something perfect in an indeterminate future.
There are already proposals to improve on the design, but we will have to see if they work out.