Wordpress.com serves 70k requests/second using Nginx

jacques_chester · on Sept 27, 2012

I've been administering Wordpress blogs for almost a decade now; hosting for ... 5? 6 years? I'm not sure any more.

Anyhow. I started with Apache. Apache probably works fine if you load the mpm-oh-you-didnt-know-about-mpm-so-sorry module.

My next stop was lighttpd. A fine piece of software, with a persistent bug that caused it to drop into a 500 state if FastCGI instances were ever unavailable.

So now I'm on Nginx. And with some fancy footwork in the rewrite rules to support WP Super Cache, it can serve static files and gzipped HTML straight off disk.

Unsurprisingly, in this configuration, it runs like the clappers.[1][2]

One thing that always surprises me is seeing configurations where Nginx is merely a proxy or load balancer into a bunch of Apache instances. Are you nuts? Stop doing that. Just use Nginx and FastCGI -- thank me later.

[1] At least, it used to do this, until I updated everything and it silently stopped working the way it used to (which I only discovered while outlining my current config in comments below). I mean, thanks to memcached and closer servers and running Percona on a separate server it's still pretty fast, but this change bugs me and you can bet I'll be fixing it later today after work.

[2] Two optimisations I've dropped are the MySQL query cache and a PHP opcode cache.

The query cache because I don't think it buys me enough versus using memcached (just a gut feel, I've got no hard numbers to back me up) and because it often turns into a contention point in read-heavy MySQL instances. Also, by dropping it I can free up more memory for InnoDB.

The opcode cache because they tend to be flaky. I've had bad experiences with all three major ones. If you ever pop Wordpress up on a profiled PHP instance you find that its time is spent overwhelmingly on waiting for MySQL, concatenating strings and then streaming it out to the webserver, for which proper on-disk/in-memory caching is the answer. Time spent loading and compiling .php is minimal by comparison, so why put up with the hassle?

I don't use varnish because the whole philosophy is to let the OS select what to cache in memory, and ... well I already do that. Plus getting ESI to work on frequently updated objects like "recent comment" widgets is a hairy pain in the arse and I just can't be bothered.

mrb · on Sept 27, 2012

My next stop was lighttpd. A fine piece of software, with a persistent bug that caused it to drop into a 500 state if FastCGI instances were ever unavailable.

It is not a bug, it is a feature. lighttpd used to wait 60s before checking the backend again. Nowadays the default is 1s. Set disable-time to 0 if you don't like it (it should be the default IMHO.)

http://redmine.lighttpd.net/projects/lighttpd/wiki/Docs_ModF...

jacques_chester · on Sept 27, 2012

The problem was not that lighty gave a 500 error for transient unavailability.

It's that it got stuck in 500. I would have to log in every week or so and restart it when this happened.

tlack · on Sept 27, 2012

In case you're wondering whatever happened to this bug, it's still happening! I installed the latest release of Lighttpd a few weeks ago and it happened on one of our sites. So amazing to see the same bug still kicking after 3-4 years..

martindale · on Sept 27, 2012

I have this bug/feature as well, switched to nginx as a result.

ck2 · on Sept 27, 2012

I have uptime of weeks with APC opcode cache without problems on very heavy loaded servers.

You can also run a script to watch for PHP/APC segfaults and just restart the the cache.

I cannot imagine running PHP without an opcode cache, you are losing a speedup of 300% to 500%

barsxl · on Sept 27, 2012

The majority of our transient bugs on WordPress.com are probably related to APC opcode issues. We try to catch them and handle it gracefully, but it isn't always so easy.

jacques_chester · on Sept 27, 2012

Out of the opcode caches I've tried, XCache was the most stable. APC was an utter bomb, I wouldn't trust it with burnt hair.

mikescar · on Sept 27, 2012

I'm interested to hear what versions of PHP and APC you used.

In my experience, stable APC releases have been very stable, but beta APC can poop out pretty bad, even if the changelog doesn't indicate anything that might affect your application.

I've used stable APC + PHP + Apache releases to do some large things, so not sure why APC is an utter bomb in 2012.

jacques_chester · on Sept 27, 2012

We're talking ... late 2007 here. I must confess that I didn't keep an engineering diary.

mh- · on Sept 27, 2012

really? that might have been worth mentioning in all your other comments trashing opcode caching. that was 5 years ago.

mikescar · on Sept 27, 2012

No kidding. There really aren't "three major opcode caches" anymore, APC won out years ago and is now tightly integrated with PHP compiles and as a module available in any package manager.

ZoFreX · on Sept 27, 2012

It's worth revisiting. We had huge problems with it when we tried it out in 2007, but I tried it again earlier this year and was pleasantly surprised - I got everything working, easily, and nothing broke. And I got that speed up.

My gut feeling tells me that your profiling is correct, however - any significant PHP application (that isn't written terribly inefficiently) will have its bottleneck in data access and not the parse/compile stage. There's very little point to speeding up by 250% a portion of your app that only accounts for 2% of execution time!

evansolomon · on Sept 27, 2012

APC is also the most likely by 1000x to be merged into PHP core, so I would rather use/support/fix that.

jacques_chester · on Sept 27, 2012

True, and if it is, I might use it when that happens.

Maybe.

ck2 · on Sept 27, 2012

I'd be interested to read on your blog about what APC problems you've encountered - we have many wp installs with APC and the only problem is it will segfault once in a blue moon but we can automatically restart when that happens.

jacques_chester · on Sept 27, 2012

> You can also run a script to watch for PHP/APC segfaults and just restart the the cache.

... wat

> I cannot imagine running PHP without an opcode cache, you are losing a speedup of 300% to 500%

That's just not what my profiling showed.

ck2 · on Sept 27, 2012

You mentioned opcode cache problems - some people complain of segfaults and I am just pointing out you can make it restart if that ever happens. (I've had it trigger three times this entire year so far.)

There is most certainly a serious load reduction when using an opcode cache. It's not just common sense, you easily find a dozen independent benchmarks on the web proving it.

Now if you had the opcode cache running incorrectly you might not see the improvement (ie. some people misconfigure it where the memory is not shared and persistant, and instead gets destroyed as php children are created/removed).

jacques_chester · on Sept 27, 2012

My point is that loading, parsing and interpreting PHP is rarely the bottleneck in Wordpress.

In terms of microbenchmarks, opcode caches look spectacular. In terms of the larger stack, given the way Wordpress works? Meh, a few percent.

It might be worth it for a big site to shave a few dozen servers off the bill. But the overhead imposed by flakiness is not worth my time.

mh- · on Sept 27, 2012

I've never heard of anyone not get incredible speedups with APC. We're talking 250%+. Every (large) production PHP deployment I've seen, disabling APC would make everything fall over. They would need 3-4x as much capacity.

andrewaylett · on Sept 27, 2012

I think the big point here is that on a Wordpress site, most hits don't actually get as far as PHP -- most of the time, pages can be served statically from wp-supercache or similar.

velodrome · on Sept 27, 2012

"One thing that always surprises me is seeing configurations where Nginx is merely a proxy or load balancer into a bunch of Apache instances. Are you nuts? Stop doing that. Just use Nginx and FastCGI -- thank me later."

Problem is - you can't flush the output buffer.

nnq · on Sept 27, 2012

Isn't there a known fix for this for like a year or so? (http://www.justincarmony.com/blog/2011/01/24/php-nginx-and-o...)

velodrome · on Sept 27, 2012

It is only a 1k chunk. This sucks for chat-type applications (<1k per message) and slow apps.

Also, in that example, you can't use gzip. If I use a nginx+apache setup, I can do output flushing and gzip. I hope nginx fixes this issue.

*This affects php/fcgi.

nnq · on Sept 28, 2012

Nice to know! Thanks for clarifying.

_hnwo · on Sept 27, 2012

I'd be interested in seeing your 'fancy footwork' for the rewrite rules :)

jacques_chester · on Sept 27, 2012

Edit: actually, it looks like this doesn't work the way I think it does ... looks like nginx has changed enough between 0.7 and 1.0 that my original config doesn't work as correctly as it used to.

Edit 2: Actually I think WP Supercache is the one that changed.

Edit 3: This config looks more up-to-date than mine -- http://rtcamp.com/tutorials/nginx-wordpressmultisite-subdoma...

I particularly liked how they used the try_file directive to cut down on the if statements.

----

Like most people I've cobbled together what I found through Google searching. My case is further complicated because I run a multisite configuration and use the WP Domain Mapping plugin.

It works in a few stages.

WP Supercache is configured to gzip pages to disk. It used to be that you needed to add extra code by hand to support domain mapping, but WP Supercache now cooperates and puts on-disk cached files for each blog in its own directory.

Then I put specific directives in a sites-available/ config file, not the main Nginx file (I serve a static site off the same server).

First,

    server {
        listen <ipaddress> default_server;

The default_server directive means that if another listen directive doesn't pick up an incoming request, it will be handled by this config. Otherwise multisite goes kerflooie.

I like my logs to be divided by site, so:

        access_log  /var/www/log/wordpress/<sitename>.access.log;
        error_log   /var/www/log/wordpress/<sitename>.error.log;

Getting end-to-end UTF8 on a stack is a hassle because you have to do it in the database (multiple times, MySQL has about two hojillion charset configuration dials), in PHP and then in Nginx:

        charset utf-8;

Then the obvious:

        root /var/www/wordpress;

        error_page 500 501 502 503 504 = /50x.html;
        location = /50x.html {
            root /var/www/wordpress;
        }

Now for the meat:

        location / {
            # Add trailing slash to */wp-admin requests.
            rewrite /wp-admin$ $scheme://$host$uri/ permanent;

The rewrite is just a little tweak. If you go to /wp-admin, Wordpress will often redirect to the home page. If you go to /wp-admin/, it does what you expect. So this rewrite just adds a trailing slash.

            index  index.php;

For multisite, the Wordpress coders change the URLs files are served from, for reasons that are simply beyond my mortal comprehension. Anyhow, you need a rewrite rule for /files/ URLs:

            # Redirect /file/ URLs.
            rewrite    ^.*/files/(.*)    /wp-includes/ms-files.php?file=$1 last;

Then what follows is based on the common config file you'll see on a dozen blog and forum posts if you google around.

The next thing to do is try and serve a file straight off disk. Nginx does things in an unintuitive order, so even though this rule is further down, it will often fire first:

           # Rewrite URLs for WP-Super-Cache files
           # if the requested file exists, return it immediately
            # this covers the static files case
            if (-f $request_filename) {
                expires 15d;
                break;
            }

The "break" directive basically says, "Oh you found whatever.css|jpg|js? Just serve that up kthxbai".

Now we proceed to determine whether or not we'll try to serve a cached file off disk:

            set $supercache_file '';
            set $supercache_uri $request_uri;

            # don't interfere with POST events (ie form submissions)
            if ($request_method = POST) {
                    set $supercache_uri '';
            }

            # Using pretty permalinks, so bypass the cache for any query string
            if ($query_string) {
                    set $supercache_uri '';
            }

            # Don't show cached version to logged-in users
            if ($http_cookie ~* "comment_author_|wordpress|wp-postpass_" ) {
                    set $supercache_uri '';
            }

If there's still a supercache URL, we try to serve it straight off disk:

            # if we haven't bypassed the cache, specify our supercache file
            if ($supercache_uri ~ ^(.+)$) {
                    set $supercache_file /wp-content/cache/supercache/$http_host/$1index.html;
            }

            # only rewrite to the supercache file if it actually exists
            if (-f $document_root$supercache_file) {
                    rewrite ^(.*)$ $supercache_file break;
            }

Otherwise, give up and let Wordpress grind out the file:

          # all other requests go to Wordpress
          if (!-e $request_filename) {
              rewrite ^.+?(/wp-.*) $1 last;
              rewrite ^.+?(/.*\.php)$ $1 last;
              rewrite ^ /index.php last;
            }
    }

    location ~ .php$ {
        fastcgi_pass  127.0.0.1:9000;
        fastcgi_index index.php;
        fastcgi_param SCRIPT_FILENAME /var/www/wordpress$fastcgi_script_name;
        include /etc/nginx/fastcgi_params;
    }

From the Wordpress wiki I picked up a few nifty directives to make log files a little less cluttered; I've thrown those in a common file which I include here:

      include /etc/nginx/clean.conf;
    }

clean.conf looks like this:

    # Global restrictions configuration file.
    # Designed to be included in any server {} block.</p>
    location = /favicon.ico {
      log_not_found off;
      access_log off;
    }

    location = /robots.txt {
      allow all;
      log_not_found off;
      access_log off;
    }

    # Deny all attempts to access hidden files such as .htaccess, .htpasswd, .DS_Store (Mac).
    # Keep logging the requests to parse later (or to pass to firewall utilities such as fail2ban)
    location ~ /\. {
      deny all;
    }

    # Deny access to any files with a .php extension in the uploads directory
    # Works in sub-directory installs and also in multisite network
    # Keep logging the requests to parse later (or to pass to firewall utilities such as fail2ban)
    location ~* /(?:uploads|files)/.*\.php$ {
      deny all;
    }

The final step is to add this magic directive to your master nginx.conf:

    gzip_static on;

This tells Nginx to look for a file with a .gz extension. So when it gets redirected to look at whatever-blog.com/index.html, it will also check for index.html.gz. If it finds that file, it will serve it instead.

relix · on Sept 27, 2012

Googling is about the worst way to set up an nginx server configuration! The only tutorials and helpful snippets out there are riddled with bad practices.

Here's a list of "pitfalls" that are commonly used and really shouldn't: http://wiki.nginx.org/Pitfalls

I notice that you're using a lot of "if" statements. I hope you've read this http://wiki.nginx.org/IfIsEvil. I understand that in this case it might be harder to find alternatives for the if-statement, haven't looked into what you need really deeply.

jacques_chester · on Sept 28, 2012

It's been a few years since I cobbled this one together and I recall reading the IfIsEvil web page. I vaguely recall that at the time there was some limitation on how try_files worked that made it necessary to use ifs, but for the life of me I don't remember what.

The rtCamp configuration I linked to earlier makes use of try_files and says he has support for both supercache and domain mapping, so I think I'll migrate to that.

ZoFreX · on Sept 27, 2012

+1 to avoiding if. Tryfiles is amazing!

rahul286 · on Sept 27, 2012

Thanks Jacques. Author of that updated config post on rtCamp.com here.

I really think Nginx in still underestimated by large. Most article handles rewrite in nginx based on their Apache knowledge.

Nginx's try_files is magical. So does maps{..} section.

Using Nginx map, you can server static files in WordPress-multisite without PHP (much better than X-SendFile or X-Accelredirection). On large wordpress-multisite network, this can really increase Nginx's capacity by multifold.

See - http://rtcamp.com/tutorials/nginx-maps-wordpress-multisite-s...

bravura · on Sept 27, 2012

This deserves to be a full blog post.

adambenayoun · on Sept 27, 2012

Same here - interested to see that config :) --

On a side note - you're completely right - whoever use nginx as a reverse proxy should just stop doing so and transition to nginx and drop apache.

It's not that Apache isn't good - it's just that Nginx is just way better.

smcnally · on Sept 27, 2012

What's Percona's role in this environment?

jacques_chester · on Sept 27, 2012

It's slightly nicer than stock MySQL and I can more easily get Percona to help me if it goes bang (it has far more stats available than stock MySQL, for instance).

Some of my sites have sufficient write activity that using XtraDB (basically a slightly souped-up InnoDB) is a smarter option than MyISAM. For searching people can just use Google, it does a much better job than MySQL's inbuilt fulltext search.

I don't know if they've fixed it, but AFAIK tables with full text fields used to cause MyISAM joins to go to disk, even if the full text field isn't included in the query. As you can imagine, this sucks. Maybe that's been fixed.

barsxl · on Sept 27, 2012

We started using Percona's MySQL builds by default a couple of years ago. There are some nice performance and convenience features not included in the MySQL.com builds. (Un)fortunately we still have quite a few MySQL 4.1 instances happily running and are still using MyISAM across over 360 million tables. Most of that stuff is not running Percona.

mikescar · on Sept 27, 2012

Not sure in this environment, but I've been using Percona builds of MySQL for the last couple years and it's a great drop-in. Good optimizations for InnoDB Oracle-owned bits.

asithinketh · on Sept 27, 2012

When I first read about Varnish it was framed as a "correct" implementation of what Squid was trying to do. It seems like people are using Varnish in situations where they would not otherwise be using Squid.

gruseom · on Sept 27, 2012

I don't use varnish because the whole philosophy is to let the OS select what to cache in memory, and ... well I already do that.

I don't follow how you already do that. Do you mean the OS disk cache?

ddorian43 · on Sept 27, 2012

Varnish caches the rendered html and stores it in disk for os to cache like mongodb with mmap.

Wordpress super cache plugin generates .html files from the pages and saves it in the disk for the os to cache.

jacques_chester · on Sept 28, 2012

This is what I meant.

Varnish leaves it to the OS to decide what pages to keep in RAM and which to leave on disk; essentially to avoid the double-buffering problem and because the OS has a larger view of the total machine's requirements.

But with wp-supercache I already have that approximate architecture. Files are on-disk, Nginx selects them, the OS notices that some files are frequently accessed and silently caches them in RAM. Everyone wins.

And I don't need to add ESI directives to my themes to get Varnish to handle frequently-updated material correctly.

gregcmartin · on Sept 26, 2012

I was one of the core engineers @ layeredtech who managed the servers and HA for wordpress.com when they launched in 2006.

If I remember correctly we were using DNS round robin and haproxy -> apache -> mysql all on freebsd systems wow have things come a long ways since then also it's incredible the sustained growth of Wordpress after all this time. good memories... congrats Matt on all your success.

barsxl · on Sept 27, 2012

WordPress.com hasn't run on FreeBSD since the TextDrive days which was way before Layered Tech, before me and before WordPress.com was open to the public. We are 100% Debian today, but have used Ubuntu in the past. We never really used HAProxy either. I posted about out load balancer choices back in 2008 - http://barry.wordpress.com/2008/04/28/load-balancer-update/

ck2 · on Sept 26, 2012

Those must be cached pages from dozens of servers on the backend.

There is no way in heck it's realtime queries.

I can tell when authors are in the WP backend on the server just by looking at the server load because it's crippling.

The wordpress bottleneck is not nginx vs apache vs whatever, it's the problem of loading hundreds of files for any kind of page render (even to just authenticate for ajax, etc.) and over a hundred db queries in many cases.

A cache-miss in WP is a terrible, terrible thing.

Negitivefrags · on Sept 26, 2012

Cached or uncached is not relevant to the article.

They are talking about using nginx as a load balancer. It just proxies all requests off to the application servers that actually prepare the results. This article is really just about the fact that nginx can efficiently service a very impressive number of connections at once.

Those backend servers will be using caches of course though.

barsxl · on Sept 27, 2012

Many more dynamic pages than you would expect. We do hundreds of thousands of database queries/sec. We do some caching with Batcache - http://wordpress.org/extend/plugins/batcache/ but it's more to handle large spikes in traffic for single pages/blogs. CNN during the US elections, for example.

nisa · on Sept 27, 2012

Wordpress is only reasonable fast if you use APC, memcached and the MySQL query cache. I never liked the caching plugins. If you use nginx just use the fastcgi_cache module and make cookies part of the cache key or omit the caching of pages where the login cookie is set.

ck2 · on Sept 27, 2012

APC+mysql cache (not just query but keycache too) is a must, but memcached is pointless unless you have a multi-server website (on a single server APC shared memory is way faster than memcached).

All logged out users should be served a completely cached static page that bypasses PHP entirely, it's way faster than even an opcode cache and way less load.

WP-Super-Cache can serve static pages to logged out users and makes this rather easy to setup. It's practically a must for wordpress.

nisa · on Sept 27, 2012

I've did some experiments - and yes APC is a little bit faster (1-5ms per page rendering) when used as key-value store in comparision to memcached via unix socket. However I had always problems with cache fragmentation and a resulting slowdown. If you use Apache mod_fcgid the PHP processes are beeing shot down and started at random and APC is is not shared acress them. With nginx and PHP-FPM is is not so true anymore. I've had the impression memcached is more solid.

For static caching I use nginx with fastcgi_cache. This makes WP-Supercache obsolete. I bybass the cache for requests containing certain cookies, so that logged in users always receive fresh pages.

Another big Wordpress performance sink is locales. If you have a blog in a non-english language consider using a language-caching plugin. Wordpress uses a PHP gettext implementation that is quite slow. 20% speedup if you cache the generated locales.

But it's a shame to spend days or months on tuning Wordpress, what is essentially just a blog engine.

bad_user · on Sept 27, 2012

I don't know how the caching plugin in WP works, however the most difficult part of caching is always cache invalidation, which is the reason why Memcache is better than an in-memory cache, even for a single server, because then you have the flexibility of invalidating cache entries from a background process that does not run on that server.

It's also great to have the flexibility of adding more servers in case of a huge spike, even if you are running on only one server. Like on Heroku, where you can just increase the number of dynos in realtime. Doing that without a central cache that every server can access can break your MySQL because each new server will have a cold cache.

hpatoio · on Sept 28, 2012

If you want a smart cache invalidation system for NGINX please have a look at this plugin http://bit.ly/nginxmanager

Pages are cached on NGINX for not logged in user, and when a new content is published all and only the pages that contains that content are deleted from the cached and regenerated at the next request.

piers · on Sept 26, 2012

Don't forget the version of Wordpress that you download is very different from the version that's on Wordpress.com

diego · on Sept 26, 2012

It's actually not that different, and of course they make as much use of caching as possible.

So does my self-hosted wp blog, for that matter. If you are using WP Super Cache correctly, you can deal with significant amounts of traffic. I've seen peaks of hundreds of simultaneous users on my blog, with a server load < 1. The server is a modest aws small instance.

ck2 · on Sept 26, 2012

Single author mini-blogs without logged in users are super easy to cache.

But multi-author blogs with thousands of logged in users is a nightmare with wp.

I suspect 90%+ of wp blogs are in the mini-blog category though.

TillE · on Sept 26, 2012

Why would multiple authors make a difference?

ck2 · on Sept 26, 2012

Every time an article is created, saved or published, wp uses hundreds of non-cachable queries.

Every time an article is published, it causes the cache to be deleted for not only that article but related pages, which means all those pages have to be rendered again.

For one author, that can be managed. Many authors, the cache is constantly being defeated.

diego · on Sept 26, 2012

What matters is the ratio of reads per write. Blogs with multiple authors have a higher ratio than blogs with single authors.

batista · on Sept 27, 2012

>For one author, that can be managed. Many authors, the cache is constantly being defeated.

That's irrelevant. It's the pages view count that counts, not how many authors are in the same cms.

ck2 · on Sept 26, 2012

I know they replace the database class on wp.com to support replication (I think they even released the code once) but I don't think they do extensive changes otherwise unless you've actually read otherwise.

robmil · on Sept 26, 2012

It's called HyperDB, and they did indeed release it: http://codex.wordpress.org/HyperDB

stickhandle · on Sept 27, 2012

Two takeaways - (1) The article is bumf wrt the Wordpress.com setup. NGINX plays a role, but the thousands of servers for http, db, memcache, load balancing etc is integral to this discussion (and not mentioned). A typical reader couldn't relate if it did. (2) Directed at a more typical reader, use NGINX for hosting your WP.org blogs, company sites etc. Properly configured, your performance to cost ratio will be terrific. Lastly, on a personal note, it pains me when caching plugins are mentioned as the main part of the solution. No need for these folks. Use the NGINX cache instead and never worry again about a buggy/outdated cache plugin (looking at you W3TC) again. Remove that dependency and you will be happier and possibly better looking.

mstank · on Sept 27, 2012

Article didn't mention that they also use a CDN (EdgeCast). Helps make the 70k requests/second a lot more bearable.

barsxl · on Sept 27, 2012

We serve about 150k req/sec from the CDN. That (obviously) isn't included in the 70k number from the article. The 70k is only what we are serving from the origin.

misiti3780 · on Sept 27, 2012

"Ability to reconfigure and upgrade NGINX instances on-the-fly, without dropping user requests."

- how do you do this ?

mfjordvald · on Sept 27, 2012

It's explained in detail here: http://wiki.nginx.org/CommandLine#Upgrading_To_a_New_Binary_...

If you are compiling via source then you can also run "make upgrade" after "make install" and it will do it for you.

j2labs · on Sept 27, 2012

SIGHUP

nodesocket · on Sept 27, 2012

`service nginx reload` will do the trick as well.

mfjordvald · on Sept 27, 2012

Only for reloading the config, not for upgrading the binary.

sowhata · on Sept 26, 2012

And how many requests/sec per CPU?

They have like 300 servers and who knows how many cores.

What kind title is this? The target High Scalability reader must be presumed to be a complete fool.

HyprMusic · on Sept 26, 2012

According to the article they have 2 thousand servers. These are Dual Xeon 5620 4 core CPUs with hyper-threading.

So that averages out to: 70000 / (4 * 2000) = 8.75 requests per second per CPU.

That seems like quite a low number, I would presume that the load would be shared amongst less servers than this and the others would be used for replication/redundancy.

vhf · on Sept 26, 2012

It says 12 datacenters distributed around the globe.

I think we can safely assume each datacenter holds a dedicated load-balancing server. Let's assume they have only one of these at each datacenter, it gives us 70000 / (4 * 12) = 1460 requests per second per CPU, which is in line with reliably handling over 10,000 request per second of live traffic to WordPress applications from a single server.

and

In April 2008 Automattic converted all WordPress.com load balancers from Pound to NGINX. , which points to the fact that they're only talking about load balancers and they have several of them.

This link implies that one NGINX load-balancing instance cannot handle 70k r/s.

seldo · on Sept 27, 2012

It's definitely not one load balancer, but nor is it 70krps distributed across 2000 servers. I also think having only one per datacenter would be a bit risky -- I'd have at least 2. It's very frustrating that the article doesn't make clear how many load balancers they run.

barsxl · on Sept 27, 2012

WordPress.com currently has 12 load balancers per data center. They are HA and used for different subsets of traffic.

vhf · on Sept 27, 2012

Okay, so we can now do the "math" :

70 000 / (4 * 12 * 12) = 121.52 request per CPU

cowsaysoink · on Sept 26, 2012

I don't think that is right, it says that automattic has 2000 servers but they also run things like gravitar, akismet, and vaultpress. Which makes me think that using that number is completely wrong. (Also these are just the load balancers not the back end, which could be taking up a large amount of that 2000 server count)

From the article it seems more like they have ~100, maybe less, for the load balancers which comes out to 175 requests/second per cpu which is getting a little more reasonable.

barsxl · on Sept 27, 2012

Of the 2000 servers, about 90% of them are running something related to WordPress.com. There are 36 "load balancer" servers in total. I added up the req/sec across those 36 machines and came up with the 70k/sec number in the article. The requests aren't evenly distributed across that subset of machines though, so you can't just divide evenly to figure out a req/sec/CPU rate. I left another comment in this thread that mentions 5k req/sec on a "normal" load balancer and that the limiting factor isn't Nginx CPU usage.

sowhata · on Sept 26, 2012

If this is just about load balancing, then they could probably get better performance from haproxy, right? Anyone should be able to exceed 200 req/s with haproxy. In fact, you should be able to get around double that or more.

The point is the title is useless. It tells you almost nothing. That blog is a joke.

cowsaysoink · on Sept 27, 2012

Well it is average req/s so peak should be much higher, and they probably can average much more than what they are doing now in case of spikes and other such things.

But yeah the title is pretty much useless.

robmil · on Sept 26, 2012

They have two thousand servers total; that includes database servers, memcached servers, regular backends, and load balancers.

Peter Westwood gave a talk on WP.com's infrastructure in London in January; I've tried to find slides online but I can't, which is a shame because he went into quite a bit of detail about their nginx/HyperDB/memcached/mogileFS setup.

moepstar · on Sept 27, 2012

There's a video from WordCamp 2012 where Barry does some Q&A about large WordPress setups:

http://wordpress.tv/2011/08/31/barry-abrahamson-ask-barry-ab...

ww520 · on Sept 27, 2012

2000 servers? My god. Aren't most blog posts just static pages or generated static pages? A good html cache setup should help a lot.

seldo · on Sept 27, 2012

They can't possibly mean 2000 of the beefy servers described as being the load-balancers, or it wouldn't be impressive at all. But it's very frustrating that the article gives the specs of the load balancers without saying how many of them they need.

barsxl · on Sept 27, 2012

As with any tiered web application, the requests aren't evenly distributed across all the servers :) Most of our main Nginx proxies serve about 5k req/sec and have about 50k established connections. They are usually 8 cores + HT for a total of 16 "threads". They aren't at 100% utilization and the limiting factor isn't Nginx CPU usage - it's software interrupts generated by the NICs or bandwidth or something else... We have seen single machines serve upwards of 20k req/sec under "real world" conditions before.

tsahyt · on Sept 27, 2012

This figure is not from a single instance though. When running a single instance of WP I've always found it to be fantastically slow. The greatness in the described stack really lies in Nginx. It's really not WordPress serving 70k requests here, it's some decent load balancing that achieves the effect. The title is somewhat misleading.

ocharles · on Sept 27, 2012

Yea, I was trying to find some information on exactly how many servers are responsible for content generation. If this is about 100 machines then yes, 70k/s is pretty damn impressive. If it's 1000 machines, then I'm a little less impressed.

zerop · on Sept 27, 2012

We dropped nginx because of no native support for session affinity. SSL support has been introduced in HAProxy very recently.

lazyfunctor · on Sept 27, 2012

http://wiki.nginx.org/HttpUpstreamRequestHashModule this should work for you

Negitivefrags · on Sept 27, 2012

Is the ip_hash directive not good enough for what your looking for?

zerop · on Oct 4, 2012

I was not aware of ip_hash. But as per documentation it uses first 3 octets as hashing key, this may not work for me.I want to use nginx on LAN. It will throw all clients to same backend server.

http://nginx.org/en/docs/http/ngx_http_upstream_module.html#...

Negitivefrags · on Oct 6, 2012

It should be fairly trivial to patch your copy of nginx to use the entire IP address as the hashing key.

haight_geek · on Oct 9, 2012

1 instance of G-WAN serves 500k requests/second with an typical application using Oracle NoSQL (or Berkeley DB).