Hacker News new | past | comments | ask | show | jobs | submit login

They do happen with threads indeed, but it's a lot easier to manage and the code is straightforward. You can essentially tie your resources to your threads, and your threadpool's size becomes your resources' allocation limit. In fact one common strategy is to use thread local storage to cache certain resources like RNG or regex statemachines/parsers, and you have essentially a guarantee that you'll be fine(ofc as long as the code doesn't launch threads left and right).

Again, I'm not saying that you can't manage this with coroutines, but it is harder, and I've seen this problem pop up in very different contexts/languages(Kotlin/Java, Rust, Haskell) whenever coroutines are used. Large churn or an unexpected bottleneck(e.g. network saturated or disrupted) causing blowup of suspended coroutines that in turn leak resources without applying proper backpressure. When this happens coroutines tend to be harder to debug as well.

Plain threads work for disk IO, network IO, memory usage etc etc, and the resource allocations also compose well. E.g. classic flipped order resource alloc: coroutine 1 allocates bounded resource A, then tries to allocate B and yields. Coroutine 2 allocates B, then tries to allocate A, bam, deadlock once A's resource pool is depleted. Essentially suspended coroutines have hidden dependency edges on one another through resources. With threads, the thread pool size aligns precisely with the resource pool size, under high churn it will be the resource's actual slowness that blocks the pool temporarily, but there's no way to deadlock on resource allocations.




> With threads, the thread pool size aligns precisely with the resource pool size, under high churn it will be the resource's actual slowness that blocks the pool temporarily, but there's no way to deadlock on resource allocations.

I'm not sure I see the argument that that's definitely the case. I mean I agree it could be the case.

At a previous job we did lots of "enterprise Java" type stuff. Spring Boot (tends to hide what's going on a bit), Tomcat (fixed number of threads to process incoming HTTP requests, I think 200 by default), database connection pool (I think 10 by default in the connection pool that Spring chooses by default).

The average programmer on this team did not know about any of these pool sizes. They just wrote code and, for the most part, it worked.

But then there was the situation where, for example, one test hung on my machine, but on nobody else's. Turns out (if I remember correctly) that stream.parallel() was used, which processes things on a default thread pool, which is sized to the number of CPU cores by default. My machine had 20 cores, other people had 8 or 10 cores. So they were processing fewer items at once. On my machine this then requested more connections simultaneously than were available in the database connection pool, and I think due to having locked some other resource (again not really obviously if you read the code) then deadlocked on my machine only. As you can imagine it took me a whole afternoon to diagnose this!

So what I'm saying is, I agree with everything you've said, but I think these problems can happen just as easily with threads, at least the way there're commonly used in e.g. Java Spring Boot.


Hah, sounds like a fun debugging session!

Those are exactly the kinds of problems I've encountered with "too many processors for too few resources". At the workplace where we used Java we used a library called Quasar which implements green threading resembling Rust async (it rewrites the bytecode into a state machine). I remember encountering a very similar deadlock, except the issue was caused by certain green threads "handing over" database connections to other green threads, and in the process yielding to the scheduler. Under high churn there was a chance that all connections ended up in the suspended set, causing a deadlock when other tasks were trying to allocate. It took a couple of days to track down because attaching a debugger and even printing caused the issue to go away.

Your example is also a fun one, but to me it actually shows exactly why an unbounded/dynamic number of processing units are an issue. Coroutines are the extreme example where you are almost encouraged to launch as many tasks as you can.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: