Hacker News new | past | comments | ask | show | jobs | submit login

First, there's more to it than simply whether the context switching and stack reserves are going to kill you when you have 700 blocking threads (and note that 700 is not a crazy high number, especially for async). Leaving aside correctness, multithreading also incurs synchronization overhead; set aside the "GIL" stuff and you're still left with the fact that in the real world, you will still wind up serializing your program on one of your own data structures.

Second, the "don't use threads for I/O" notion comes from the fact that no matter how much compute you burn on I/O, the data isn't coming any faster, and in I/O bound programs, it's more efficient to use the I/O scheduler (kqueue and epoll in crazy programs, select in everything else) than the context scheduler.

Third, you're about 10 years behind the times yourself if you think that the CPU is going to strangle itself checking 700 file descriptors. Although I don't even think a 1997 async network program is going to lose to a 2008 threaded network program over 700 connections, it isn't even a contest in 2008 vs. 2008.

I am happy to race you. =)

Regardless, I wrote an async MySQL driver because you can't demand-thread per-packet network code, and I wanted to dump stuff to a database. Some of us will thread when threading makes a program cleaner and simpler, and run everything async when threading becomes an obstacle course.




Tom, you're not making any sense here. You stated that "threads weren't cheap". I said this really wasn't true anymore for this problem area (mostly-blocking network I/O) and mentioned an application I'd written that shows exactly this over a load regime very close to what you'd see with a heavily-contested web application database.

Now you're talking about stuff like synchronization overhead, which wasn't at issue: synchronous access to a few hundred separate file descriptors (one per thread) obviously doesn't need to synchronize anything. You're stating some stuff of questionable veracity (select/poll certainly do scale poorly once you get past a few hundred descriptors -- just check the kqueue/epoll justification documents for copious benchmarks to that effect).

And you're even making up some, er, interesting new terms: what on earth is an "I/O scheduler" as distinct from a "context scheduler"? Usually when you use the former term, you're talking about the block device request scheduler (elevator algorithm, etc...) which (1) isn't involved here as we're talking about balancing network I/O and (2) is invoked in the same way for local I/O regardless of whether you're doing I/O via an async request for 700 blocks or via 700 synchronous read() calls from separate threads.

And you capped it all off with a few ad hominems that I honestly don't think are appropriate on this site, at least in a technical context. Stop flaming.

I'll say it once more: threads are cheap in this regime. The linked article is a hack to get around the problems of database access from monolithic, poorly threaded language interpreters. It's not a "performance enhancement" in any meaningful way. Even a shared-nothing web app architecture a-la news.arc is likely to do better than an extra hop through this thing.


If you can be specific about how what I said was uncivil, I will apologize for offending you. I think you're wrong, but I don't think you're crazy or stupid.

You're right that we started talking about something very specific (the memory costs of stacks for 700 threads) and I quickly generalized (to the performance of threading versus async code). That's a fair critique. In my defense, the performance difference between threaded code and async code is very relevant to this article.

Here are my points:

* It is not a "myth", as you say, that async network code scales better than most threaded network code.

* Your anecdote about 700 concurrently served connections probably won't cause select(2) to break a sweat, let alone kqueue/epoll.

* It's horribly unfair to tar async code with select(2)'s performance, because performant applications use kqueue or epoll to replace it. Both resolve the scaling problem you're alluding to.

* It is perhaps weird that I see a similarity between a thread scheduler, which switches CPU contexts on a timer, and a select loop, which switches them nonpreemptively on I/O events. I retain the right to say that event loops are an instance of the "scheduler" problem; if you think that's crazy, read the papers on the MIT Click modular router. In fact, do that anyways; they're great.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: