Performance Tuning Linux Instances on EC2

simula67 · on March 4, 2015

Brendan Gregg seems to have worked extensively on extracting maximum performance on all major UNIX kernels ( FreeBSD, Linux, Solaris ) out in the real world. If he writes an essay around performance of these kernels, I feel it will be very close to the true state of the affairs.

falcolas · on March 4, 2015

Ironically, a swappiness of 0 may be detrimental to the health of software running on a VM, as it severely limits what may be swapped out of memory. This means that the OOM killer ends up being more aggressive against the processes on your system.

A better value is one of "2" - this allows the Kernel to swap out data in response to memory pressure, without being overly aggressive about it.

jdub · on March 4, 2015

EC2 instances are typically configured without swap at all.

mortenlarsen · on March 4, 2015

Slideshare is very annoying, when you are trying to get the PDF.

First it tells you that you must sign-up sign-in with Linked-In or Facebook. Then after finding a non Linked-In or Facebook sign-up, I need to enter my phone-number to get a link as SMS.

SIGH

deathanatos · on March 4, 2015

Note: there is some justification in the slides; the article itself just lists the "recommendations" outright.

(from the article)

> net.ipv4.tcp_tw_reuse = 1

(from the man page)

> Allow to reuse TIME_WAIT sockets for new connections when it is safe from protocol viewpoint. It should not be changed without advice/request of technical experts.

Why? Are sockets in TIME_WAIT a problem somehow?

> net.ipv4.ip_local_port_range = 10240 65535

Again, why? My understanding is that this controls the range of ports that the kernel selects from for new sockets; e.g., if you make a TCP connection to google.com on port 443, on _your side_ the connection is <your ip> : <a port from that range>; the default range is [32768, 61000], and this is per destination IP. (You can have two connections to two separate IPs with the same local port.) The default range is nearly 30k ports wide. Are you opening >30k connections to a single host?

> In the talk I described these tunables as our medicine cabinet, and to "consider these best before 2015".

Does that not mean that these are expired now? (This article was written today, though?)

brendangregg · on March 4, 2015

The problem is opening >30k connections to a single host during a TIME_WAIT period, 60 seconds. 500 connections per second. For backend servers, eg, an application server talking to a database, 500 connections per second is easy (although it's preferable if they can keep-alive).

I liked Vincent Bernat's post about TIME_WAIT: http://vincent.bernat.im/en/blog/2014-tcp-time-wait-state-li...

bboreham · on March 4, 2015

Could we say it a bit stronger than that? Opening and closing connections from the same source to the same service is wasteful since you need multiple extra round-trips and it resets all the TCP dynamic tuning like CWND.

I thought your talk was great; one minor niggle: you said that the result of too many sockets in TIME_WAIT would be dropped packets; it should refuse to open the new connection if no slots are available.

brendangregg · on March 6, 2015

Yes, it's really wasteful to have unnecessary connect()/accept() calls, plus handshaking and buffer allocation, and TCP dynamic tuning, etc.

And you're right, thanks, TIME_WAIT full should just error on the Linux client. I was thinking of a different kernel which has bugs in this area, and ends up dropping SYNs...

caw · on March 4, 2015

> Again, why? My understanding is that this controls the range of ports that the kernel selects from for new sockets; e.g., if you make a TCP connection to google.com on port 443, on _your side_ the connection is <your ip> : <a port from that range>; the default range is [32768, 61000], and this is per destination IP. (You can have two connections to two separate IPs with the same local port.) The default range is nearly 30k ports wide. Are you opening >30k connections to a single host?

This port range is sometimes known as the "ephemeral port range" and works as you described. How can you have 32 tabs open to news.ycombinator.com port 80? The source port on your machine are all different ports and from that range.

The 30K range is not to a single host, it's just all open connections waiting for data to return. That is to say, the connections are established, being established, or being torn down. If the connection was completely torn down you'd be able to reuse the port and there would be no issue.

So if your network working set is >28K ports, you may need to change this setting. Most people probably don't need to change this. If you do need to change this because you find your application is throwing errors about binding to ports in use, the above suggestion is fairly decent for setting and forgetting.

The one problem with the above suggestion is if you have an application binding to a port somewhere in the range of 10240-32768 or 61001-65535 (http://en.wikipedia.org/wiki/List_of_TCP_and_UDP_port_number... for examples, it's obviously not complete). You can't just say that 10240-65535 is fair game for ephemeral connections, because inevitably an ephemeral connection will block a known port bind attempt, and your service will fail to run.

dap · on March 4, 2015

The ephemeral port range is per host because TCP connections are identified by the four-tuple: local IP, local port, remote IP, remote port. You can have active connections from localhost:12345 to both some_server:80 and some_other_server:80 or even some_server:443.

caw · on March 4, 2015

Didn't realize it was a 4 tuple, I was thinking of just the binding to the local port. Still, increasing the ephemeral range will cause problems with applications relying on ports outside the default ephemeral range.

dap · on March 4, 2015

It can, though it's not likely. The kernel only uses free ports even in the ephemeral range, so if an application binds to something in the ephemeral range first, then the kernel just won't pick it for new connections. You've effectively removed one port out of tens of thousands.

You may be in trouble if the kernel happened to choose an ephemeral port for an outbound connection and then an application tried to bind to it for receiving new connections.

darkarmani · on March 4, 2015

TIME_WAIT: http://vincent.bernat.im/en/blog/2014-tcp-time-wait-state-li...

"to prevent delayed segments from one connection being accepted by a later connection relying on the same quadruplet (source address, source port, destination address, destination port). The sequence number also needs to be in a certain range to be accepted. This narrows a bit the problem but it still exists, especially on fast connections with large receive windows."

josegonzalez · on March 4, 2015

The last point means that while Netflix trusts these recommendations to be good for them today, if you are looking at this in the future, you should verify that the performance gains are still there for those changes. Perhaps you are running in a system where one or more of these no longer gives a performance gain, or is even detrimental to your overall system health.

Trust, but verify.

Dylan16807 · on March 4, 2015

I'm not sure exactly what state the connections were in, but I've had very obnoxious situations where I shut down a daemon and can't bring it back up quickly because the port it listens on is 'in use'.

Harmless-seeming timeouts can cause stupid problems.

hassy · on March 4, 2015

I don't seem to see a mention of the file descriptor limit. As recently as last year the default on Ubuntu was 2048 iirc, which is laughable given that it was 2014.

I guess it's not strictly performance related, but should definitely be one of the first parameters to tune.

    # in in /etc/sysctl.conf:
    fs.file-max = 100000

    # then:
    sudo sysctl -p

    # in /etc/security/limits.conf

    * soft nofile 100000
    * hard nofile 100000

    # then:
    ulimit -n 100000

(tweak the exact number as required)

jorlow · on March 4, 2015

I saw Brendan's talk live at re:Invent. Highly recommended.