Speed Limit of PaaS - 64K TCP Ports

superjared · on Aug 27, 2011

This is bollocks. The argument is that you can only have ~60k connections due to the limit of ephemeral ports. This applies on a per-client basis, so you can have nearly limitless connections. I wrote about this here under "The 64k connection myth":

http://urbanairship.com/blog/2010/09/29/linux-kernel-tuning-...

kowsik · on Aug 27, 2011

Read the blog again. This is bollocks if you are running your own server where you can be multi-homed, add loopback aliases etc and crank up the memory to support 100,000+ sockets. I'm talking PaaS where your app is an multi-tenant environment with no root access.

superjared · on Aug 27, 2011

You don't need loopback aliases for > 64k sockets. You can have 64k connections to your server per ip address that connects.

EDIT: Formatting

kowsik · on Aug 27, 2011

Okay, we are talking past each other. If you have a server on an EC2 instance, with enough RAM you can support more than 64K inbound connections for exactly the reasons you mention. Try generating more than 64K connections outbound and this is exactly what I'm talking about.

superjared · on Aug 27, 2011

You don't need that either. The limitation is based on a tuple like this:

(incoming ip, outgoing ip, standard port, ephemeral port)

So yes, you're limited to 64k connections to a single MySQL port, for example.

bigiain · on Aug 27, 2011

Yeah, inbound connection to your service are not the issue the article is discussing. It's the outbound connection (for example, to memcached or mongodb) that your service makes in response to those potentially-greater-than-64k in bound connections that might trip you up. The site I'm working on right now makes 4 memcached requests for a logged in user hitting the homepage - that design may well fall apart at ~16k simultaneous connections if it's running on a shared host paas type host (or well before that if I'm sharing resources with another customer on the same paas hardware as my app)

bigiain · on Aug 27, 2011

Thinking more about it, while there is a limit here, I'm pretty sure it's one I don't need to care about.

My current project makes 4 outbound requests for some pretty common pages. That'd limit me to ~15k simultaneous page creations, which even at 250ms response times (I'm aiming for under 50ms) implies a limit of ~60,000 page views per second. (or 240,000 page views per second at my target 50ms page render time). If my app is getting 5billion pages views a day, I don't think I'm going to be worried about limits of shared paas hosting...

count · on Aug 27, 2011

If you can get 5 billion page views a day on a single interface, I'd like to shake your hand.

salem · on Aug 27, 2011

That math is not quite correct. There is a timeout before the TCP stack will release the port for use in a new connection. That timeout can also be tuned, but just saying, not that simple...

nknight · on Aug 27, 2011

This is not 1965, and we are not dropping punch cards with an operator and coming back hours or days later when our program has worked its way to the front of the queue. Five billion page views per day is irrelevant. Your peak traffic is what matters. Hitting 60,000 views in any given second is a lot easier than hitting 5 billion in a day.

moe · on Aug 27, 2011

It's an esoteric limit either way.

In just about any real-world scenario (including PaaS) you'll be hitting a whole range of other bottlenecks long before this one.

prodigal_erik · on Aug 27, 2011

Strange to warn of a problem having already dismissed the solution (pooling). Unless your backend actually has fifty thousand cores, making fifty thousand concurrent connections to it accomplishes nothing more than moving the bottleneck around while defeating TCP flow control. Capacity is always finite, some part of the system is inevitably going to play the role of a bounded queue, and you want that visible and close to (preferably in) the frontend so you can shed load through degraded service.

ericflo · on Aug 27, 2011

I wasn't able to understand what the author was trying to say under "Connection Pools"--is he suggesting that the database should be written using node.js so that, presumably, it can "scale out" and not require a connection pool?

superjared · on Aug 27, 2011

Either that, or allow for >64k simultaneous connections from node so that each of his requests has its own connection.

The author doesn't seem to understand that more connections to a database does not equal higher throughput.

kowsik · on Aug 27, 2011

We had a node.js app where each inbound request would go through a set of HTTP connections to the back-end couchDB. When the concurrency reached around 1,000 then there were 1,000 requests made through 10 couchDB pipelined connections. The result was the web requests were starting to get slower and slower. Think of a 6-lane road merging into a 1-lane road. Congestion.

ericflo · on Aug 27, 2011

So your assertion is that if you bumped up the CouchDB connection limit to 1,000 (believe it or not, Erlang--in addition to Node--is capable of this), then this slowdown would not have occurred?

kowsik · on Aug 27, 2011

My point is the thing you are connecting in the back-end has to scale at the same level as node - if not better. I'm sure making independent non-pipelined requests to CouchDB will help, but if CouchDB views are slow or it can't handle the concurrency, then the web requests to node.js will start slowing down resulting in a pipeline stall.

wmf · on Aug 27, 2011

Sure, if you're trying to shove 1,000 things through 10 connections you're going to get contention. But you could open 10K pooled connections to several different backends and still stay under the 64K limit; this should be OK up to concurrency of ~100K. Beyond that, there's always SCTP or SPDY...

sapphirecat · on Aug 27, 2011

The "speed limit" is on the outbound connections from Node to the service. Node calls connect(), and receives a new port exclusively for that one connection.

A server facing the Internet can serve lots of clients because those clients have plenty of IP+port combinations to go around on their end, to allow the server to tell the difference among them even though it only has the one IP+port on its end. But 60K node.js connections from one machine, the frontend server with a single IP address, to a single IP+port on the backend server, do not have that luxury. All that identifies the connection now is the port number on the Node server, so it must be unique per connection.

Connection pools attempt to mitigate the problem by inserting a manager (the pool) in the middle, to accept larger numbers of requests from Node and try to schedule them on a lower, sustainable number of connections to the backend. At least on an RDBMS, transactions require the app to have exclusive use of the connection for its request, so when all the real connections are scheduled out, new requests have to wait for an old connection to be relinquished.

EDIT: Going back to the blog post, it said, "You really need your back-end services to scale out with node.js." Which I think means, your back end service should have multiple IP addresses, to alleviate the bottleneck described above.

kowsik · on Aug 27, 2011

I've added a picture to clarify which ports I'm talking about. Hopefully this clarifies things a bit.

sapphirecat · on Aug 27, 2011

Looks pretty nice. Some pictures are worth quite a few words.

Is this problem made worse by the ephemeral ports remaining unavailable after disconnect, because they're stuck in TIME_WAIT? Or does a modern TCP stack note a low RTT and release the port much sooner?

kowsik · on Aug 27, 2011

You have to think about concurrency. If there were a total of 64K simultaneous requests to that physical instance, each of which is running 100+ apps because it's multi-tenant, this drastically reduces the number of ports available to each app. With evented IO, a socket could be open for 250 ms (db query taking time) that sucks up a port causing a potential DoS on the other apps.

salem · on Aug 27, 2011

Also, if there is high churn on those outgoing connections, then the problem is a little worse. The connections per second will be much lower than 60k, because of the timeout before the port number is reused for a new connection. Although, this timeout can be tuned.

kowsik · on Aug 27, 2011

Talking about port exhaustion is, I realize, a little confusing because there are the web requests coming in and the web-tier is making outbound requests to other services. Adding an image to the blog to hopefully clarify things.

DasIch · on Aug 27, 2011

Before you even come close to that limit any decent PaaS solution should move your application to a less-used server or spawn your application on several servers by load alone.

(Assuming you invested enough money.)

MostAwesomeDude · on Aug 27, 2011

There are several assumptions in this article. The biggest is that, at scale, the same NIC will be used for both front-end and back-end stuff. What if it's not? Then you aren't penalized for your back-end ports on the front-end because they're on different routes.

Also, why is Node special here? Did people not notice Comet, Athena, Flash, Java, etc.? We've been doing long-polling and extended socket usage for a long time, not to mention browser connection reuse. This port count problem hasn't been a problem at all yet.

Finally, scaling up to 60k useful active concurrent connections on a single box is pretty damn tricky. Usually, you'll be splitting that kind of load, and doing NAT, and all of a sudden you no longer have to worry about port exhaustion.

kowsik · on Aug 27, 2011

My observation/assumption is that your app is running in a multi-tenant PaaS environment, where mysteriously you start getting bind errors because the ephemeral port space is shared with other tenants.

MostAwesomeDude · on Aug 27, 2011

You have observed this? There's a trick to the way sockets work: They track a bunch of data about each connection, not just the local port number and IP. You're talking about exhausting a space of 60k+ connections/user, not connections/server.

If you've actually seen this in the wild, then I'd like to know what you're running, what you're selling, how you're doing so remarkably well, etc. This sounds like the best problem to have. :3

nknight · on Aug 27, 2011

No, it's connections/server, unless the provider has separate back-end IPs for each user (not a bad idea, but I'll bet they're not doing that).

The tuple consists of the source IP, destination IP, source port, and destination port. The first, second, and last are, in a naive implementation, potentially universal to all applications/instances/sessions on the server, so your number space is reduced to the ~60k source ports.

This is an incredibly easy problem to solve, but it does need a small bit of attention by PaaS providers to ensure they're not going to hit it.

MostAwesomeDude · on Aug 27, 2011

I am confused as to how the source IP can be the same for every connection, as even the naivest socket server will be accepting connections from many different users, each with their own source IP, which makes this 60kconns/port/user.

nknight · on Aug 27, 2011

You and several other people are completely failing to read and comprehend the blog post in its entirety. This has nothing whatsoever to do with where USERS are connecting from.

If you have X sessions on a single front-end server opening one or more individual connections to a single back-end server (e.g. CouchDB), at least X ephemeral ports are used on the front-end server to connect to the back-end server. As X increases, you can reach a point where you have a serious problem. If X gets anywhere near ~60k, it's game over.

kowsik · on Aug 27, 2011

thank you, thank you! whew Hopefully the new picture on the blog clarifies which ports I'm talking about.

MostAwesomeDude · on Aug 27, 2011

Oh. Well, looks like I was stupid on the Internet again.