I'm missing some of the technical details here, but from a quick glance of the article it seems like Rust's futures are lazy. I.e. a stack would only be allocated when the future is actually awaited. But in order to execute the relevant code, a call stack per not-finished future is still needed, or am I missing something?
Afaik Rust's futures compile to a state machine, which is basically just a struct that contains the current state flag and the variables that need to be kept across yield points. An executor owns a list of such structs/futures and executes them however it sees fit (single-threaded, multi-threaded, ...). So there is no stack per future. The number of stacks depends on how many threads the executor runs in parallel.
Like a stack frame, but allocated once of a fixed size, instead of being LIFO allocated in a reserved region (which itself must be allocated upfront, when you don't know how big you're going to end up).
The difference being: if your tasks need 192B of memory each, and you spawn 10 of each, you just consumed a little less than 2kB. With green threads, you have 10 times the starting size of your stack (generally a few kB). That makes a big difference if you don't have much memory.
So that's actually green threads in my book (in a good implementation I expect to be able to configure the stack size), with the nice addition that the language exposes the needed stack size.
It's a stackless coroutine. AFAIK, the term “green thread” is usually reserved to stackful coroutines, but I guess it could also be used to talk about any kind of coroutines.
It's more efficient (potentially substantially more so). In a typical threaded system you have some leaf functions which take up a substantial amount of stack space but don't block so you don't need to save their state between context switches. In most green threaded applications you still need to allocate this space (times the number of threads). The main advantage of this kind of green threads is you can seperate out the allocation which you need to keep between context switches (which is stored per task)versus the memory you only need while actually executing (which is shared between all tasks). For certain workloads this can be a substantial saving. In principle you can do this in C or C++ by stack switching at the appropriate points but it's a pain to implement, hard to use correctly (the language doesn't help you at all), and I've not seen any actual implementations of this.
because of syntactical restrictions of how await work, at most you need to allocate a single function frame, never a full stack, and often it doesn't even need to be allocated separately and can live in the stack of the underlying OS thread.
They can, but the function itself cannot be suspended by calling something else (i.e. await being a keyword enforces this), so any function that is called can use the original OS thread stack. Any called function can in turn be an async function, and will return a future[1] that in turn capture that function stack. So yes, a chain of suspended async functions sort of looks like a stack, but its size is known a compile time [2].
[1] I'm not familiar with rust semantics here, just making educated guesses.
[2] Not sure how rust deals with recursion in this case. I assume you get a compilation error because it will fail to deduce the return value of the function, and you'll have to explicitly box the future: the "stack" would then look a linked list of activation records.
Async function don't really call anything by themselves: the executor does, and all function called in the context of an async function is called on the executor's stack. You just end up with a fixed number of executors running on their own OS thread with a normal stack.