Scaling node.js to 100k concurrent connections

ericz · on Aug 9, 2012

Before everyone gets excited about these big numbers, I would like to remind you that even higher concurrency can be achieved with even lower CPU and memory usage using Erlang. These numbers are good for Node, but don't use this as evidence that Node is magical and much better at handling large numbers of connections than other systems.

jimparkins · on Aug 9, 2012

People are excited by Node doing numbers like this because there is a massive active Javascript community - with hundreds of thousands of people using Javascript all day every day at work. 0.01% of these people would ever consider learning Erlang, and even if they did they would not be able to use it at work - ever. As with everything having better features means nothing if nobody adopts. I am not saying nobody uses Erlang and I am not saying people are not adopting it - but the number are just not comparable to the Javascript community . Lastly just because you know a but of Javascript I realise that this does not mean you can architect massive real time systems. But it is like WOW even people that play casually aspire to having the best kit or playing for a top guild.

mediocregopher · on Aug 9, 2012

>As with everything having better features means nothing if nobody adopts

Well, it means my product is going to be superior since I went the better, if less well known, architecture. It's not like erlang is a little unsupported side-project of a language, it's actually older then javascript if you count the time period before it was open-sourced, and only a few years younger if you don't, and is used extensively by many industries.

Also, just because javascript the language is more well-known doesn't mean javascript the server architecture is more well-known. I would argue it isn't; when people want a highly concurrent, solid server, erlang is always mentioned.

Lastly, erlang is a pretty easy language to learn. I had the basics down in a day, I had a prototype pubsub server that could handle 50k connections in two. The syntax is a bit strange, and honestly it does get in the way sometimes, but it's not hard.

maigret · on Aug 9, 2012

You're missing the point of Node. You can construct a web rendering application and its AJAX parts in one codebase. You can move fastly code between server and client rendering. etc etc etc.

Probably Erlang is "better". BTW, Java & C are quite fast also. Java applications, well written, do scale. If they don't, they are folks out there specialized to make them scale.

Also, Erlang is probably easy to learn as language. But when you develop a web app, you have enough other skills to keep up with. Let's name CSS for one ;) The human brain is limited in its capacity to remember API and language specifics.

Also, more popular means more libraries, which makes the product in turn better. This is why so many folks turn to PHP. It's not elegant, but everything you need is already here.

Now I won't argue that you may have good reasons to use Erlang for yourself, may it be because you like the language structure, like to write libraries by yourself, or so on. But it doesn't make it "superior", foremost not as a platform.

ericmoritz · on Aug 9, 2012

So your argument is that web developers are too stupid to remember Erlang. Tell that to all those Django developers that have to juggle Python, HTML, CSS and Javascript! They must be superheros! Ruby on Rails developers must be as well!

maigret · on Aug 9, 2012

I never told they are stupid. Rather, I think a simpler environment enables more productivity for the developer. Assembler is hard, some people master it in incredible ways. Does that mean that C is useless? No. Node goes the way of unifying the web stack around JavaScript, I find it a least interesting. The future will tell the rest.

davidw · on Aug 9, 2012

> Well, it means my product is going to be superior

Perhaps, but for many people, node.js will be "good enough".

http://journal.dedasys.com/2006/02/18/maximizers-satisficers...

And because of the big community, there probably already are, or will be, more libraries available for it, meaning you have more ready-made blocks to build with.

I don't write this as a "supporter" of node.js, either - I've actually known and used Erlang for the past 8 years on and (mostly) off, and would highly encourage any hacker to have a look at it, because its way of doing things is quite enlightening, and, IMO, is superior to node.js.

weixiyen · on Aug 9, 2012

> they would not be able to use it at work - ever

They probably shouldn't be using node.js at work either.

Putting a client-side javascript engineer (even a decent one) on a node.js project can be really dangerous.

dotborg2 · on Aug 9, 2012

"client-side javascript engineer" without any server-side experience.. is that even possible?

lmm · on Aug 9, 2012

Christ, that's some serious bubble effect you've got there. Not only is it possible, it's quite common. Until about 2 years ago the only place javascript ran was in the browser or a handful of experimental projects. And big, traditionally-organized teams often have people who are responsible only for the browser side of things.

ebiester · on Aug 9, 2012

Perhaps in silicon valley, but outside it most of us were expected to be a jack of all trades. There may be a specialized DBA, and a designer, but we were responsible for understanding the full stack. The designer worked in photoshop, and the DBA only came in on designing the tables and optimizations, but we were responsible for the real work.

weixiyen · on Aug 9, 2012

A jack of all trades "might" work, but honestly it depends on the person. Someone who understands the full stack is usually not be someone qualified to be building scalable fault-tolerant pieces of server side infrastructure for a company.

Node.js is so easy to screw up, so difficult to debug, and little things can take down your entire application.

The combination of the language itself in addition to the type of people who typically would choose Node.js over other more proven options would make me worried that a blind choice is being made based on language alone and not proper evaluation or understanding of the other options available.

Personally I believe it is far better to be a language-agnostic company that thinks of different server side components as services, which might be in different languages instead of trying to use a tool just because they know the language already.

lmm · on Aug 9, 2012

If anything I'd say you have it backwards; the kind of structure you describe (and the very phrase "full stack") is a very silicon valley/startup thing. It's larger, more traditional software shops that tend to slice the stack into separate vertical layers and give different people responsibility for each.

Of course there are many companies large and small that do it differently. But having someone whose responsibility includes client-side javascript but not server-side code is not by any means unusual outside the valley, at least IME.

ebiester · on Aug 10, 2012

Perhaps I am generalizing based on my experience in Tucson, which is close enough to cross-pollinate with the bay, but most places I worked and interviewed and had friends expected everyone on the team (apart from the DBAs) to be able to touch any part of the stack.

We also tended to have companies with small teams.

dotborg2 · on Aug 9, 2012

How does current set of responsibilities determine state of your knowledge?

lmm · on Aug 9, 2012

The only vaguely plausible rationale I could conceive of for your ridiculous assumption that all client-side javascript engineers would have server-side experience was an underlying assumption that any job in client-side javascript engineering would involve server-side responsibilities.

Evidently this wasn't your actual reason for thinking that, which just leaves me even more bewildered by your position.

ricardobeat · on Aug 9, 2012

Dangerous?

scarmig · on Aug 9, 2012

Interesting example from 2008:

http://www.metabrew.com/article/a-million-user-comet-applica...

I'd also add that this shouldn't be taken as something to say that Erlang is totally superior to node or that Erlang makes scaling to 1M concurrent connections a piece of cake. If you're working at that level, there's no magical out of the box solution.

dysinger · on Aug 9, 2012

Yawn. How about 2 million connections from a single server with Erlang & FreeBSD? http://blog.whatsapp.com/index.php/2012/01/1-million-is-so-2...

testing12341234 · on Aug 9, 2012

250k concurrent connections on "... a $0.12/hr rackspace 2GB cloud server." vs 2M concurrent connections on a 24 core server with 96GB of memory.

ebiester · on Aug 9, 2012

How fast can I get a median programmer to learn Erlang, learn the libraries, and be productive enough to be able to make these high concurrency apps?

Let's say they are a full stack programmer who knows some html, some css, some javascript, some java, and some sql.

I have a pretty good idea how fast I can bring someone up to speed on node.js -- I have to teach them some advanced JS concepts, some node.js conventions, and the APIs of my library. Async takes a little bit to wrap your head around, but it's not terrible.

Node.js seems like it is on the way to "worse is better."

ericmoritz · on Aug 9, 2012

A week with the Erlang language, which will enable to write projects that would be on par or better than anything written in node.js as far as resilient and scalability.

Probably another week to get up to speed with OTP for all the promises of resilient Erlang applications.

What is often missed about Erlang is that it's not really about highly concurrent applications. That property is actually a means to accomplish its primary goal: fault-tolerant applications.

See: http://www.erlang.org/download/armstrong_thesis_2003.pdf to understand the motives of the Erlang designers.

Isofarro · on Aug 9, 2012

Firstly, I apologise for down-voting this response.

A week to learn Erlang. Where do I start?

MetaCosm · on Aug 13, 2012

http://learnyousomeerlang.com/

You can pick up the basic syntax in one full day easily (if you are an experienced developer and already understand functional programming) ... 3 or 4 days if you are new to functional coding or just very inexperienced.

It is a very brief / minimalist language from a syntax point of view. Then, it will take a week or two to get your head around OTP, which is the primary framework and has years (decades?) of mission critical work under its belt.

Then, at the end of your journey will be the really hard problems... dealing with massive netsplits at a cluster level, elections for new masters, and all the other hard problems that happen at the upper-tier of massive clusters.

If you are building an HTTP(S) app -- you can blessedly avoid a lot of these by avoiding a true massive cluster all together and using lots of individual "micro clusters"(note) balanced / routed by HTTP middle-ware.

Also, check out Cowboy [webserver] (https://github.com/extend/cowboy/), Agner [package manager] (http://erlagner.org/) and of course, the always awesome rebar [build tool] (https://github.com/basho/rebar/)

.. and I love lager [log tool, make those erlang logs less alien looking] (http://basho.com/blog/technical/2011/07/20/Introducing-Lager...)

(note) This is basically a strategy of using small clusters based on locations -- so if you are across lets say 3 locations, you would build 3 node clusters, 1 node per location and have them work as a unit localizing workloads and responding to requests, and then you allow your higher level middle-ware to deal with your many groups of "micro clusters". High reliability rather cheaply, but means you need your own system for pushing out updates.

rubyrescue · on Aug 9, 2012

about a week

ricardobeat · on Aug 9, 2012

The magic is in javascript. How many people can write Erlang?

luriel · on Aug 9, 2012

How many people can write clean, efficient and maintainable JavaScript?

And Erlang is not the only sane option for this kinds of problems, there is also Go. And to a lesser degree you can do the same in many other languages given the right libraries and careful thinking, it takes more effort than with Erlang or Go, but almost anything beats JavaScript in both performance and code clarity (both at the 'low' code-readability level, and at the high 'project organization and design' level).

davidw · on Aug 9, 2012

> almost anything beats JavaScript in both performance

http://shootout.alioth.debian.org/u32/which-programming-lang...

Javascript with V8 stacks up pretty well.

jeltz · on Aug 9, 2012

Depends on where you are. In Stockholm I would not be surprised if more people can write decent Erlang compared to the people who can write decent JavaScript.

igouy · on Aug 9, 2012

Would you be surprised if in Stockholm more people can write decent JavaScript compared to the people who can write decent Erlang?

pron · on Aug 9, 2012

... or Java.

est · on Aug 9, 2012

what about other languages?

ww520 · on Aug 9, 2012

The Netty library for Java routinely can do 500K connections without any sweat. Some people have tried 1M connections with beefy machines.

ZoFreX · on Aug 9, 2012

I wonder how far you could push Java with a naive Thread implementation. If you reduce the per-thread stack size you can quite easily get 20K on very modest hardware.

boyter · on Aug 9, 2012

I would be curious to know how something like Play! with all Async requests, or MVC with all Async would do. Probably similarly impressive, but I really have no idea.

_3u10 · on Aug 9, 2012

Probably not, the key to making 250,000 connections work is not doing a lot for each of those connections, or doing it very infrequently.

Once you're rendering views, etc it's hard to maintain 50,000 req/sec. (250K connections @ 5 req/sec)

forgotAgain · on Aug 9, 2012

Garbage collection is disabled. How is this then relevant to any real world usage?

sootzoo · on Aug 9, 2012

He's not running with GC permanently disabled, he's only disabled the automatic GC because of the huge overhead required (claiming 1-second pauses every few seconds). He also mentions it's trivial to enable manual GC and run that via setInterval/setTimeout/what-have-you.

gaius · on Aug 9, 2012

Isn't this really scaling the underlying C runtime to 100k connections?

babuskov · on Aug 9, 2012

I use Node in production. The main thing I like about it is that looking at system usage graphs while number of users grow, only thing that is going UP is bandwidth ;)

I'd really like to see a story of someone really having 100k connected browsers. My online game currently peaks at about 1000 concurrent connections, and node process rarely lasts longer than 2 hours before it crashes. Of course, using a db like Redis to keep users sessions makes the problem almost invisible to users, as restart is instantaneous. I'm using socket.io, express, crypto module, etc.

I'd really like to see real figures for node process uptime from someone having 5000+ concurrent connections.

giulianob · on Aug 9, 2012

I'm using C# for my game Tribal Hero (www.tribalhero.com). It's still in early beta so I've only had 450 concurrent users . Our CPU usage and memory usage barely moved from 0 to 450 users. We're using socket selects and not even async sockets which would have even better performance. It's also backed by MySQL though we want to eventually move to Redis. Why is Node breaking at 1k connections? Doesn't seem like much at all.

babuskov · on Aug 10, 2012

I also use MySQL as backlog, it's practically write-only as I keep the whole state in javascript objects. Only time when data is read from MySQL is at program startup. However, having SQL database enables me to run various complex SQL queries for reporting.

However, I do use Redis for one thing: user sessions. I turned persistence off as Redis seems to be rock-stable, and I really don't need sessions to persist. I was using a modified version of Node's MemoryStore, to which I added clean garbage collection, but with often restarts I mentioned earlier it has become pain for users to have to login again when in the middle of the game. Having a separate, dedicated Redis instance to handle the sessions made restarts completely seamless, as the cookie sent by user's browser remains valid between node restarts.

I was not willing to learn new db technology, but there wasn't really much to learn with Redis. You can set it up in minutes and it just works(tm). I highly recommend you try it.

giulianob · on Aug 10, 2012

I'll be switching over the entire game state to Redis. It'll be a bit of work but what I like the most is that it maps more naturally to objects. My db is mainly writes as well.

benologist · on Aug 9, 2012

I do about 300,000 - 500,000 concurrent connections on nodejs but it's all short lived web requests.

It took a while to iron out most cases that can crash, right now I have:

web.1: up for 12h

web.2: up for 12h

web.3: up for 12h

web.4: up for 12h

web.5: up for 12h

web.6: up for 4h

web.7: up for 1h

web.8: up for 12h

web.9: up for 12h

web.10: up for 12h

web.11: up for 32m

web.12: up for 12h

web.13: up for 7h

web.14: up for 7h

babuskov · on Aug 10, 2012

Interesting. What's the system's uptime? Close to 12h or rather not? If the latter, this still means 1-2 restarts per day.

Regarding crashes, do you know of any special things to look out for? I do crash dumps and log uncaught exceptions, but sometimes node simply dies without any trace in the log files.

benologist · on Aug 10, 2012

The highest I've seen on a dyno (this is all on Heroku) is over a day, web #13 and #14 are on 22 hours now. I think they're actually eventually running out of memory or being retired and replaced by Heroku rather than crashing but I'm not sure, it's not being caught in the exception catching.

Most of the crashes come down to stupid things, it's so easy to make a mistake when you don't have a compiler watching your back. External dependencies can hurt if they're laggy or unavailable. Unterminated requests are a really easy accident as well.

At this point I just use exception catching and dump the results into Redis unless I'm specifically hunting down a bug and want the crash to occur:

http://api.playtomic.com/load.html

babuskov · on Aug 10, 2012

Wow, I like your real-time monitors :) I'm about to build mine, and this gives me some great ideas.

Thanks for sharing.

poundy · on Aug 9, 2012

I could never get my socket.io instance to max out, is there a good way to load test socket.io and web sockets?

antihero · on Aug 9, 2012

Can uwsgi/nginx be configured similarly?

Is it common practise to have node face the web without nginx?

decad · on Aug 9, 2012

Link to his next post showing him breaking 250k - http://blog.caustik.com/2012/04/10/node-js-w250k-concurrent-...

devmach · on Aug 9, 2012

It's a shame, that he didn't mentioned about kernel tuning. Without custom settings ( like net.ipv4.tcp_mem ), i think, it's a very difficult to reach this numbers.

nivertech · on Aug 9, 2012

I did 3M/node on physical severs, 800K/node on EC2 instances.

We mostly use Erlang on server-side and node.js + CoffeScript on client-side (where they rightfully belong ;)

nicolast · on Aug 9, 2012

It struck me the author runs his apps as root (in screenshots). But then I remembered he's using node.js to handle "thousands of concurrent connections".

xentronium · on Aug 9, 2012

I think it's his testing machine, so it's shoot & forget setup.

dotborg2 · on Aug 9, 2012

Looks like author is not aware of some cuncurrency problems, deadlocks etc. Backend/database might not scale to 100k concurrent connections so easily.

benologist · on Aug 9, 2012

This is where NodeJS really starts to shine - persistant connections and background operations let you do a whole bunch of cool stuff to mitigate that.

In my case I have entire db tables and collections replicated in memory and kept in sync via redis pubsub, and the 100,000s of concurrent users I have are all sharing just a few dozen persistant redis and mongodb connections between them.

darkarmani · on Aug 9, 2012

Scaling the backend is a lot easier than dealing with concurrent front-end connections!

ericmoritz · on Aug 9, 2012

I would really love to know what he did to tune that Rackspace VM. I had a terrible time trying to get node.js and others to get past 5,000 concurrent websocket connections on a m1.large EC2 instance or on Rackspace.

mariuz · on Aug 9, 2012

I wonder what happens at 100k database connections , i will give a try with firebird and the nodejs driver

bradleyland · on Aug 9, 2012

That's the thing about these types of benchmarks. They're useful for showing that node has the throughput -- at a low level -- to serve a huge number of concurrent connections, but it doesn't translate directly to huge application throughput if you're relying on things like database access over a network. In practice, each of these problems must be solved individually.

I don't mean to minimize this accomplishment. If you're assuming you need 100k database connections in order to scale, you might be solving the wrong problem. Scaling is a matter of moving data as close to the CPU as possible. This means in-memory caching is where real performance comes in. I don't care how good your language/framework is, you can't defeat the physics of slow I/O over a network.

ricardobeat · on Aug 9, 2012

You would be using a connection pool instead of opening one for every client.

babuskov · on Aug 9, 2012

I have 500-600 node connections using a single DB connection and it works fine. It's MySQL using binary driver though.

bluesmoon · on Aug 9, 2012

i remember seeing this on HN back in April