I use higher-level APIs built on top of restartable sequences. Here's my underst...

I use higher-level APIs built on top of restartable sequences. Here's my understanding (could be wrong):

> I suspect that something like a heap implementation could use this.

Indeed. Let's say you want to have lots and lots and lots of threads, as described in the video schmichael linked. [0] Per-thread malloc pools become less attractive:

* too empty (lots of contention for the global pool) or * too full (lots of wasted RAM, probably poor CPU cache utilization as well) or * lots of sloshing

More generally, people sometimes do per-thread stuff to avoid lock contention. Some types of state might be reasonable to keep per-thread when the program is written in a thread-per-core / async style but might not be it's written in a thread-per-request / sync style. It might use too much RAM. If you ever have to access _all_ the threads' state (say, if you are doing some counters for a monitoring system: increment just the current thread's state on write; sum them on read), that path might get ridiculous. So per-CPU might work better.

Per-CPU stuff doesn't require restartable sequences. You can just use the CPU number to decide which shard to access then lock it or use atomics as you would with global state. You get less lock contention and cache-line bouncing. (Alternatively, you might get some of these benefits by picking a shard randomly, if the rng is cheap enough. Or a counter.)

Restartable sequences let you entirely avoid atomic operations for per-cpu stuff.

[0] https://www.youtube.com/watch?v=KXuZi9aeGTw