Non-blocking IO isn't necessarily faster (at least in Java) [pdf]

viraptor · on July 26, 2010

There's one thing that bugs me in this document (well, a couple more, but this is the biggest) - page 64:

> Multithreaded server coding is more intuitive - You simply follow the flow of whats going to happen to one client

Is it really? I'd agree if we were talking about something like server spawning a scripting language probably, but in other cases? Most network endpoints are described as state machines. They're not supposed to "flow" - they're responding to input and changing their state. I'd claim that if you implement a protocol with more than 5 states as a properly abstracted state machine, it would be more intuitive than anything that's supposed to emulate "flow". And that's more natural in asynchronous design.

vog · on July 26, 2010

Note that the difference vanishes when the language allows for closures. With closures, you can write code that looks like "one thread per request", but could in fact be using asynchronous I/O:

  wait_for_response(request, function(response) {
      do_something_with_response(response)
      ...
  })

This becomes even more clear in Ruby syntax, where you don't have to nest the reply handler into the wait_xxx function:

  wait_for_response(request) { |response|
      do_something_with_response(response)
      ...
  }

Using monads, you can even abstract the difference away completely. That way, the code looks like ordinary imperative stuff:

  response <- wait_for_response(request);
  do_something_with_response(response)

If your language provides continuations, you can even handle multiple responses that way. Paul Graham did that in Viaweb (http://lib.store.yahoo.net/lib/paulgraham/bbnexcerpts.txt, section "Closures Simulate Subroutines").

tl;dr: The asynchronous I/O is only hard to use in languages which lack essential features. In modern languages you can write "intuitive" code that works for both cases, and you are free to switch from multithreaded sync I/O to asynchronous I/O as you like.

frognibble · on July 26, 2010

I think it depends on what you are doing. If the app has a single logical thread of execution, then sequential code is easier to understand than a series of callbacks. If the app executes steps in parallel with forks and joins, then a state machine implemented with callbacks can be easier to understand.

The typical web app falls in to the former category: validate input, read from cache, maybe read from database, maybe write to database, generate response.

dman · on July 26, 2010

If youre passing any state around with the callbacks or have shared global state then having callbacks will not buy you anything over the sequential code with loops, branches and joins imo. I think to prove correctness about the shared state you pretty much have to infer the loops, branches and joins.

ww520 · on July 26, 2010

It's more intuitive in the sense that you just do:

    while (read(...)) {
       ...
    }

instead of:

    if (read(...) == EWOULDBLOCK) {
        yield();
    } else {
        manage buffer
    }

tptacek · on July 26, 2010

Real event code never looks like this. You get a callback on read-readiness. You read as much as you can into a buffer. You split the buffer into requests. You act on each request.

Only a tiny bit of your code ever touches send/recv. It never gets EWOULDBLOCK (it's only running because select told it to!). Most of your code deals with entire requests; the state it's saving is largely identical to what a sychronous server deals with; the overhead is in having to hot-potato it through multiple functions.

Twisol · on July 26, 2010

This can be mitigated pretty well by a good library, EventMachine for example. It handles the evented complexities for you, all you need do is provide an event source and a callback.

crux_ · on July 26, 2010

Don't forget you need to make sure your callback doesn't itself block, or take too long.

(Which in turn implies that all the other libraries you are using --at least ones that need to communicate-- need to have pretty deep integration with your "good library," or you're going to be writing that integration yourself.)

(Not that threading is immune here... you need to be careful choosing libraries such that they are re-entrant, etc. But threading has held more mindshare for much longer, and most OSes share a roughly similar model, so they have a large head start on this front.)

frognibble · on July 26, 2010

I'd like to see the code used in the benchmarks before drawing conclusions from this work. Programming Java NIO is tricky. It's possible to make some mistakes and still have things "work". One example mistake is setting the write op on the selection key when there's nothing to write on the socket channel. This bug will cause the typical select loop to consume 100% of a core. The code stil works, but is much slower.

alxv · on July 26, 2010

What it means to be "faster"? Is it having a higher I/O throughput? Or, is it having a lower latency on the 95th or 98th percentile?

Also, the slides give little information about scalability of the two models. From what I know, the performance of non-blocking I/O should degrades more gracefully in presence of high concurrency. I would have liked to see some numbers about that.

axod · on July 26, 2010

This is over 2 years old, and prompts more questions than it answers. Do your own tests.

jbert · on July 26, 2010

From a performance point of view, the only difference between blocking and non-blocking I/O is the overall amount of state you need to find when an i/o is ready to complete (and how it is laid out in memory).

For blocking I/O, the O/S needs to wake up a thread. For non-blocking I/O. the event loop returns with a handle being ready, which the user-level code needs to then look up to find the relevant buffer and control structures.

For a benchmark (and some apps) it will come down to cpu cache misses on the loading the relevant state structures.

For scaling (as opposed to 'speed'), it depends how 'cheap' your threads are (for blocking) and if your event loop etc can use multiple cores (for non-blocking).

tszming · on July 26, 2010

It reminded me an old presentation - "Why Threads Are A Bad Idea (for most purposes)", a little bit old, but still contains a lot of truth.

http://www.cs.ubc.ca/~norm/508/2009W1/ouster95threadsbad.pdf [slide5]

What's Wrong With Threads? - Too hard for most programmers to use. - Even for experts, development is painful.

tbrownaw · on July 27, 2010

"Threads are hard, let's go shopping."

If you're that afraid of concurrency, just use a single Big Damn Lock to only ever let one touch any shared state at a time. You'll lose out on most of the benefits, but you won't deadlock and won't corrupt things (or at least, not any worse than you can with interdependent or chained events).

Or grab a compiler and a decent beginning textbook (or the internet) and put in your hundred hours or so, to know how to use concurrency sanely (add probably another hundred if you're planning to build you own lock-free underlying data structures).

metachris · on July 26, 2010

The paper is really focused on separated request-response connections that do not interact between each other (eg. HTTP, SMTP).

Synchronization between threads becomes a lot more complex if the connections are interacting with each other, for example for multiplayer games.