Java IO is faster than NIO: Old is New Again

rkalla · on July 27, 2010

For those that want the "Quick WTF?!" it is because: * With modern operating systems, idle threads are practically free. * Managing non-contending threads is extremely inexpensive now. * Multi-core systems. * Selectors and state-restore used by asynchronous NIO libraries in high-load environments is more expensive than waking up and putting to sleep threads.

You mix all that together and you get a new appreciation for java.io.

jbooth · on July 27, 2010

The missing detail from all of these, of course, is that transferTo doesn't work with blocking I/O. You need to use a selector on WRITE to the client socket to make sure you're actually allowed to transfer bytes. Given that transferTo involves zero-copy (always faster than copying to buffers), a mix of the two approaches is usually best.

rkalla · on July 27, 2010

jbooth,

Interesting post -- I don't know if that is something Paul evaluated. What mix/ratio would you propose?

Something like 1:10 threads:selector/connections?

jbooth · on July 27, 2010

Well, if you're getting into that, you're far enough down into the rabbit hole of optimization that you probably have a reason for being there and some business specific logic that would drive some of your specific tunings..

I'd generally recommend a selector identifying sockets that are ok for write and then delegating to a caching thread pool to handle actual transferTos. If that select loop becomes your bottleneck, you can of course horizontally scale that and have a set of selectors, striping the register()s across them to get less lock contention and having them all feed into the caching threadpool.

abstractbill · on July 27, 2010

This talks about numbers of threads being as high as 1000.

My job involves writing servers that can scale to at least 20,000 concurrent connections (I use Twisted for that, but I'm still interested to see what's happening in JavaLand).

I was disappointed the article didn't look at things at that kind of scale - it would be a much more impressive result if threaded/blocking io was still best at that level.

metachris · on July 28, 2010

This article is just a summary of Paul Tyma's 2008 presentation (posted to HN yesterday): http://www.mailinator.com/tymaPaulMultithreaded.pdf

I am working on a multiplayer framework using Python, where one gameserver can potentially handle up to 50,000 concurrent socket connections (each interacting with a couple of other ones): http://www.flockengine.com

epoll() is definitely the way to go - it's very fast with high numbers of sockets.

- http://docs.python.org/library/select.html#edge-and-level-tr...

- http://linux.die.net/man/4/epoll

famousactress · on July 27, 2010

I guess I'm not surprised. My reason for using NIO in the past has been because I've had systems that supported many long-running, mostly idle open sockets. Seems logical that there would be a tradeoff to abstracting threads & state from connections.

bnoordhuis · on July 27, 2010

You're spot on. NIO has always been slower but it allows you to process more concurrent connections. It's essentially a time/space trade-off: traditional IO is faster but has a heavier memory footprint, NIO is slower but uses less resources.

dylanz · on July 28, 2010

... and I think your synopsis is spot on as well. If someone could counter this, I'd love to hear it.

rkalla · on July 27, 2010

I don't know the scale of "many" here, but in the thousands, I think Paul's presentation makes the case for threads given the near free cost of having idle threads sitting around and activating/deactivating them as long as they aren't contending for something.

A few folks have mentioned that the cost of context switching and state-restore using NIO is more expensive than this especially on a multi-cored machine which I thought was interesting.

Again, I don't know how helpful that is to you, because your "many" could have meant 10s of thousands of connections, in which case threads wouldn't be the answer.

Unless you wanted to get fancy with a hybrid approach ;)

kls · on July 27, 2010

I did not see the specs for the systems used to perform the test, If they where using a multicore server, this would surly slant the results in favor of the threaded model. As well, the JVM / Java was designed from the first release as a blocking IO style development language so you have years of development towards that design philosophy.

I just don't see how apples to apples, on a single core, blocking can be faster than NIO. I think C would be a better testing ground, where you don't have built up layers over the years that reinforce one over the other.

tbrownaw · on July 27, 2010

> I just don't see how apples to apples, on a single core, blocking can be faster than NIO.

Well, select() and poll() both require scanning all watched fd's, so I could definitely see them being slow. Then if NIO is built on one of them, it would inherit that slowness.

epoll doesn't need to be slow, so I'm not entirely sure what's going on there. Maybe it's only servicing one or two fd's per epoll_wait()? Maybe it has to do extra work to match an event to a Java object? Maybe queuing events for the next epoll_wait() is for some bizarre reason slower than switching threads (which I imagine is a lot lighter than a full context switch to another process)?

> I think C would be a better testing ground, where you don't have built up layers over the years that reinforce one over the other.

That depends on whether you're trying to benchmark the kernel or trying to figure out what will make your particular server (presumably written in Java in this case) run faster.

dkarl · on July 27, 2010

If they where using a multicore server

Where would they find a single-core server these days?

frio · on July 27, 2010

Typically, in the non-blocking IO space, you take advantage of the multiple cores by running multiple instances of your server. For instance, I run 4 Tornado instances on my quad-core server, sitting behind a simple nginx proxy (this is, of course, Python rather than Java).

Admittedly, I haven't performed any benchmarks (I chose Tornado for reasons other than performance), but as with the OP, I was hoping the author of the article would provide more detail on their benchmark setup.

sriramk · on July 27, 2010

In the cloud. A lot of apps run on single core VMs

therockhead · on July 27, 2010

The Atmosphere team have a good slide titled "Scaling the Asynchronous Web" which has a few pages one threads vs NIO https://atmosphere.dev.java.net/conferences/2009/JavaOne/200... .

frognibble · on July 27, 2010

Here's the previous discussion on Paul Tyma's presentation: http://news.ycombinator.com/item?id=1546711

redrobot5050 · on July 27, 2010

Faster is the wrong word -- scalable. It was definitely talking about scalability and load, not just pure speed.

rkalla · on July 27, 2010

redrobot5050,

The first 30 slides or so are explicitly about the 25% performance gap between NIO and IO on a modern system -- I think it's fair to say the article was as much about performance as it was scalability.

I think he even addresses the misconception that NIO is perceived as "faster" simply because it's more "Scalable" -- the "myths" slide that he keeps going back to and crossing elements off of as he proves them wrong.