First, there's more to it than simply whether the context switching and stack re...

ajross · on Aug 11, 2008

Tom, you're not making any sense here. You stated that "threads weren't cheap". I said this really wasn't true anymore for this problem area (mostly-blocking network I/O) and mentioned an application I'd written that shows exactly this over a load regime very close to what you'd see with a heavily-contested web application database.

Now you're talking about stuff like synchronization overhead, which wasn't at issue: synchronous access to a few hundred separate file descriptors (one per thread) obviously doesn't need to synchronize anything. You're stating some stuff of questionable veracity (select/poll certainly do scale poorly once you get past a few hundred descriptors -- just check the kqueue/epoll justification documents for copious benchmarks to that effect).

And you're even making up some, er, interesting new terms: what on earth is an "I/O scheduler" as distinct from a "context scheduler"? Usually when you use the former term, you're talking about the block device request scheduler (elevator algorithm, etc...) which (1) isn't involved here as we're talking about balancing network I/O and (2) is invoked in the same way for local I/O regardless of whether you're doing I/O via an async request for 700 blocks or via 700 synchronous read() calls from separate threads.

And you capped it all off with a few ad hominems that I honestly don't think are appropriate on this site, at least in a technical context. Stop flaming.

I'll say it once more: threads are cheap in this regime. The linked article is a hack to get around the problems of database access from monolithic, poorly threaded language interpreters. It's not a "performance enhancement" in any meaningful way. Even a shared-nothing web app architecture a-la news.arc is likely to do better than an extra hop through this thing.

tptacek · on Aug 11, 2008

If you can be specific about how what I said was uncivil, I will apologize for offending you. I think you're wrong, but I don't think you're crazy or stupid.

You're right that we started talking about something very specific (the memory costs of stacks for 700 threads) and I quickly generalized (to the performance of threading versus async code). That's a fair critique. In my defense, the performance difference between threaded code and async code is very relevant to this article.

Here are my points:

* It is not a "myth", as you say, that async network code scales better than most threaded network code.

* Your anecdote about 700 concurrently served connections probably won't cause select(2) to break a sweat, let alone kqueue/epoll.

* It's horribly unfair to tar async code with select(2)'s performance, because performant applications use kqueue or epoll to replace it. Both resolve the scaling problem you're alluding to.

* It is perhaps weird that I see a similarity between a thread scheduler, which switches CPU contexts on a timer, and a select loop, which switches them nonpreemptively on I/O events. I retain the right to say that event loops are an instance of the "scheduler" problem; if you think that's crazy, read the papers on the MIT Click modular router. In fact, do that anyways; they're great.