There's one thing that bugs me in this document (well, a couple more, but this is the biggest) - page 64:
> Multithreaded server coding is more intuitive - You simply follow the flow of whats going to happen to one client
Is it really? I'd agree if we were talking about something like server spawning a scripting language probably, but in other cases? Most network endpoints are described as state machines. They're not supposed to "flow" - they're responding to input and changing their state. I'd claim that if you implement a protocol with more than 5 states as a properly abstracted state machine, it would be more intuitive than anything that's supposed to emulate "flow". And that's more natural in asynchronous design.
Note that the difference vanishes when the language allows for closures. With closures, you can write code that looks like "one thread per request", but could in fact be using asynchronous I/O:
tl;dr: The asynchronous I/O is only hard to use in languages which lack essential features. In modern languages you can write "intuitive" code that works for both cases, and you are free to switch from multithreaded sync I/O to asynchronous I/O as you like.
I think it depends on what you are doing. If the app has a single logical thread of execution, then sequential code is easier to understand than a series of callbacks. If the app executes steps in parallel with forks and joins, then a state machine implemented with callbacks can be easier to understand.
The typical web app falls in to the former category: validate input, read from cache, maybe read from database, maybe write to database, generate response.
If youre passing any state around with the callbacks or have shared global state then having callbacks will not buy you anything over the sequential code with loops, branches and joins imo. I think to prove correctness about the shared state you pretty much have to infer the loops, branches and joins.
Real event code never looks like this. You get a callback on read-readiness. You read as much as you can into a buffer. You split the buffer into requests. You act on each request.
Only a tiny bit of your code ever touches send/recv. It never gets EWOULDBLOCK (it's only running because select told it to!). Most of your code deals with entire requests; the state it's saving is largely identical to what a sychronous server deals with; the overhead is in having to hot-potato it through multiple functions.
This can be mitigated pretty well by a good library, EventMachine for example. It handles the evented complexities for you, all you need do is provide an event source and a callback.
Don't forget you need to make sure your callback doesn't itself block, or take too long.
(Which in turn implies that all the other libraries you are using --at least ones that need to communicate-- need to have pretty deep integration with your "good library," or you're going to be writing that integration yourself.)
(Not that threading is immune here... you need to be careful choosing libraries such that they are re-entrant, etc. But threading has held more mindshare for much longer, and most OSes share a roughly similar model, so they have a large head start on this front.)
I'd like to see the code used in the benchmarks before drawing conclusions from this work. Programming Java NIO is tricky. It's possible to make some mistakes and still have things "work". One example mistake is setting the write op on the selection key when there's nothing to write on the socket channel. This bug will cause the typical select loop to consume 100% of a core. The code stil works, but is much slower.
What it means to be "faster"? Is it having a higher I/O throughput? Or, is it having a lower latency on the 95th or 98th percentile?
Also, the slides give little information about scalability of the two models. From what I know, the performance of non-blocking I/O should degrades more gracefully in presence of high concurrency. I would have liked to see some numbers about that.
From a performance point of view, the only difference between blocking and non-blocking I/O is the overall amount of state you need to find when an i/o is ready to complete (and how it is laid out in memory).
For blocking I/O, the O/S needs to wake up a thread. For non-blocking I/O. the event loop returns with a handle being ready, which the user-level code needs to then look up to find the relevant buffer and control structures.
For a benchmark (and some apps) it will come down to cpu cache misses on the loading the relevant state structures.
For scaling (as opposed to 'speed'), it depends how 'cheap' your threads are (for blocking) and if your event loop etc can use multiple cores (for non-blocking).
If you're that afraid of concurrency, just use a single Big Damn Lock to only ever let one touch any shared state at a time. You'll lose out on most of the benefits, but you won't deadlock and won't corrupt things (or at least, not any worse than you can with interdependent or chained events).
Or grab a compiler and a decent beginning textbook (or the internet) and put in your hundred hours or so, to know how to use concurrency sanely (add probably another hundred if you're planning to build you own lock-free underlying data structures).
> Multithreaded server coding is more intuitive - You simply follow the flow of whats going to happen to one client
Is it really? I'd agree if we were talking about something like server spawning a scripting language probably, but in other cases? Most network endpoints are described as state machines. They're not supposed to "flow" - they're responding to input and changing their state. I'd claim that if you implement a protocol with more than 5 states as a properly abstracted state machine, it would be more intuitive than anything that's supposed to emulate "flow". And that's more natural in asynchronous design.