Optimizing Nginx, Node.js and networking for heavy workloads

aroman · on Jan 21, 2013

Good read, but despite being a major proponent of Node.js and many of the ideals it seeks to embrace, I'm not sure I'm comfortable with calling Apache "archaic". It's not like IE6, which has objectively no redeeming value as a modern platform target -- it did back in the day for sure, and is still relevant in some spheres, but overall I don't think anyone (even Microsoft) would argue that IE6 is "archaic".

But to call Apache, one of the most popular and successful actively developed webservers _archaic_? I think that's a bit much. It's not inherently bad just because it's not really targeting the C10K problem... just different.

[A minor nitpick to be sure, but it bothered me nonetheless as I feel like I'm seeing this "Threads bad. Async good." rhetoric passed around as fact all over the place and it's starting to feel a bit like Animal Farm ;)]

gnw · on Jan 21, 2013

You're perfectly correct, and Apache has and continues to serve as one of the most successful and widely deployed web servers to date. That said, in the context of more conveniently architecting for high volumes of traffic, Apache was conceived in a time of fundamentally different problems, and in that respect it can be viewed as a more antiquated option when scoping out the landscape of appropriate web server software.

I did not intend any pejorative connotations by calling it "archaic". I just wanted to emphasise that it has been eclipsed by newer software following different design paradigms better suited to this kind of problem.

jiggy2011 · on Jan 21, 2013

Precisely , I've used Apache for yeeeeeears and see exactly 0 ROI in switching to something else.

d0m · on Jan 21, 2013

An apache expert would probably prove me wrong, but I've found it quite simpler to use nginx to dispatch multiple domains to different backend types. For instance, I have a couple nodes, one wordpress blog and lots of small django websites.

jiggy2011 · on Jan 21, 2013

Apache reverse proxies are very easy to setup once you have done it once. I've reverse proxied Tomcat,PHP and ASP.Net apps just by copy-pasting a few lines in the config.

The main strength of Apache though is the ability to separate apps run on the same system by different users without reverse proxy via .htaccess files and stuff like mod_php.

Apache also has an excellent security track record considering it's vast numbers of deployments and years of service.

Perhaps though if you are a green field developer with no Apache experience and deploy with stuff like EC2 you may as well just skip apache and go straight to nginx.

What nginx's security record will look like in 10 years if it becomes as popular as Apache though remains to be seen.

lucian1900 · on Jan 21, 2013

Also, now apache has an event worker and can even use two types of workers at the same time.

No1 · on Jan 21, 2013

For the tl;dr people:

On the nginx side, author discusses tweaking sysctl.conf, cutting down the number of sockets stuck in TIME_WAIT, some other tweaks for performance resulting in a 90% reduction in occupied sockets. On the node.js side, author uses the cluster module to fully utilize available CPU cores, arriving at N-1 for the magic number of node processes to spawn, where N is the # of CPU cores.

Definitely suggested reading for anyone running Nginx + Node.js

silenteh · on Jan 21, 2013

enabling

net.ipv4.tcp_tw_reuse

net.ipv4.tcp_tw_recycle

can create unexpected problems with NAT, so use it with caution.

gnw · on Jan 21, 2013

I only suggested changing tcp_tw_reuse but you are right, especially tcp_tw_recycle can have adverse effects.

According to this reference: http://www.speedguide.net/articles/linux-tweaking-121 "[tcp_tw_reuse] is generally a safer alternative to tcp_tw_recycle"

cpleppert · on Jan 21, 2013

I believe the section about TCP_FIN_TIMEOUT is wrong. tcp_fin_timeout has nothing to do with the time wait state at all. TCP_TIMEWAIT_LEN is the value that holds onto the TCB

pygy_ · on Jan 21, 2013

Could you provide some details (or a link)?

silenteh · on Jan 21, 2013

When tcp_tw_reuse is enabled the kernel can decide to use the sockets in TIME_WAIT, before they expire or they are closed by the clients.

This is a problem though, because the connection could still be used by the client and therefore there could be some collisions regarding the TCP sequence numbers, specially on high traffic servers. The kernel can try to avoid this collision with a technique called PAWS (protection against wrapped sequence numbers: rfc1323). Unfortunately PAWS works only with tcp_timestamps enabled on both sides (client and server). tcp_timestamps has also an overhead and therefore it is normally disabled on servers with a high traffic, leading to potential problems.

About tcp_tw_recycle, when it is enabled, it forces the verification of this tcp_timestamp. So in case of NAT, multiple clients will send different tcp timestamp to the server, to the same mapped connection which points to the TIME_WAIT socket, and because the tcp timestamp are different then the packets will be dropped by the kernel. This is the reason why it is not a good thing to enable tcp_tw_recycle when you use a load balancer or in case of NAT.

A good practice is to enable tcp_tw_reuse (instead of tcp_tw_recycle), to make sure tcp_timestamp is enabled and to decrease the size of the tcp timestamp with tcp_timewait_len.

cpleppert · on Jan 21, 2013

>>A good practice is to enable tcp_tw_reuse (instead of tcp_tw_recycle), to make sure tcp_timestamp is enabled and to decrease the size of the tcp timestamp with tcp_timewait_len.

Couple questions. what is tcp_timestamp? i assumes you are not referring to tcp_timestamps? What effect does tcp_timewait_len have on timestamps at all? Isn't it just the amount of time the connection closer holds on to TCBs?

pygy_ · on Jan 21, 2013

Thanks a lot :-)

r4vik · on Jan 21, 2013

have you tried making nodejs listen on a unix socket http://nodejs.org/docs/v0.5.4/api/net.html#socket.connect and then set your proxy upstreams in nginx to use that?

edit:

scratch that, it seems you're using node on cluster of servers (not on same box as nginx). In which case the article is good advice.

zerop · on Jan 21, 2013

should nginx be only used for serving static files. Does it have any advantage when used to serve plain data API. I want to expose REST api (django + uwsgi) over web, but not sure if should use nginx for it.

jiggy2011 · on Jan 21, 2013

I think this basically boils down to: "are you likely to have many users connected concurrently at any one time"

If your API basically involves a client connecting, quickly getting a small JSON/XML response and then disconnecting again you are probably absolutely fine with Apache unless you have truly enormous numbers of users.

OTOH if the socket is likely to be held open for a while, because maybe the API responses can take some time to be returned or the client is likely to hold the connection open in order to get a stream of data over time then you may get more mileage out of nginx.

zerop · on Jan 22, 2013

service returns quick & short JSON responses and huge number of users are going to hit it. So basically there are going to be enormous concurrent connections each returning quick and short json response. No heavy work by each connection, just that there are too many.

benesch · on Jan 21, 2013

There's probably nothing inherently wrong or slow with running Django through nginx.

That said, one of the most common deployment strategies is gunicorn. It's better documented [1], and it's always good to separate your web app server from your static file server/CDN.

[1] https://docs.djangoproject.com/en/dev/howto/deployment/wsgi/...

d0m · on Jan 21, 2013

I've had great success with nginx as the main entry to serve static files and direct the traffic at a django w/ gunicorn. Add a small supervisor system and you've got a very simple but robust server.

robertfw · on Jan 21, 2013

we use nginx - uwsgi - django and are quite pleased with the combination. between nginx/uwsgi, there are plenty of configuration options to let you optimize for your particular use case, and don't see any potential issues in terms of adding new capabilities to our setup (minus web sockets, but as mentioned in another comment that is coming soon / available with plugins)

niggler · on Jan 21, 2013

Does this setup work with web sockets?

simontabor · on Jan 21, 2013

nginx doesn't support websockets quite yet, so no, but will do soon - http://trac.nginx.org/nginx/roadmap

themgt · on Jan 21, 2013

nginx can support websockets via the tcp_proxy module: https://github.com/yaoweibin/nginx_tcp_proxy_module

babuskov · on Jan 21, 2013

I find HAProxy much easier to set up if you want to run multiple node servers w/ websockets.