Fastsocket – A highly scalable socket for Linux

bscanlan · on Oct 22, 2014

The performance data looks interesting, but this is work based on a pretty old kernel (originally released in 2010 or so). There have been many changes and improvements added to the 3.x kernel that may overlap with this work. Publishing the code and details on github is great, but working with the kernel community and merging into the mainstream kernel is the only way for work like this to have a long-term meaningful existence - Google in particular have been doing a great job getting networking improvements in.

That said, it's interesting to have this kind of thing come out of large-scale production web environments in China.

fillest · on Oct 22, 2014

Looks like they have tested a newer kernel too:

https://github.com/fastos/fastsocket#nginx

>on Linux 2.6.32 achieves 470K connection per second and 83% efficiency up to 24 cores, while performance of base 2.6.32 kernel increases non-linearly up to 12 cores and drops dramatically to 159K with 24 cores. The latest 3.13 kernel doubles the throughput to 283K when using 24 cores compared with 2.6.32. However, it has not completely solve the scalability bottlenecks, preventing performance from growing when more than 12 cores are used.

0xbadcafebee · on Oct 22, 2014

It's actually not that kernel. It's the CentOS kernel, which is the RedHat kernel, which was based on a 2.6 kernel years ago, but has since had every single kernel change under the sun backported to it. It might as well be RedHat's version of 3.10. This is also why it's a bad idea to build any kernel patches on top of RedHat kernels: it has nothing to do with the vanilla trees.

In any case, it doesn't matter if it's a 50 year old kernel. If it speeds up connections per second, someone will put up a box on the frontend with it as the load balancer.

sillysaurus3 · on Oct 22, 2014

Is it possible that by using an old kernel like this one, you'd expose yourself to security vulnerabilities?

I'm new to kernel programming. Is this submission suggesting that you downgrade your kernel to a 2010-era release in order to take advantage of the performance improvements, or is the submission showing some kind of modular component which you can integrate into your current kernel?

If it's the former, then wouldn't you be pinning yourself to the old version of the kernel, so you'll have to integrate all updates by hand rather than receive them automatically during the normal update process?

wumpus · on Oct 22, 2014

This kernel is what Red Hat Enterprise Linux 6 is currently using. Red Hat maintains it, and writes patches for security vulnerabilities. It's no surprise that Sina developed, tested, and deployed this patch against what they were running in production.

sillysaurus3 · on Oct 22, 2014

By using this kernel, will you be able to automatically receive security upgrades in the future? Or will you have to apply them manually and then recompile and install the kernel yourself?

Is "developers have to apply security patches manually, then recompile and reinstall the kernel themselves rather than automatically" not a big deal in practice?

snus · on Oct 22, 2014

All distro vendors backport critical fixes for the lifetime of the operating system. RHEL6 is supported at least until 2020.

2.6.32 is also supported by kernel.org still.

easytiger · on Oct 23, 2014

RH backport security patches and lots of features into seemingly old kernels.

jewel · on Oct 22, 2014

It's not a big deal, because you can automate it. As long as the patch applies cleanly (and it almost certainly will if the only vendor changes are security updates), it's going to be a pretty smooth process.

You'd need to test the new kernel before deploying in production, of course, but you'd be doing that before rolling out a vendor provided kernel change, anyway.

Karunamon · on Oct 22, 2014

Old kernel perhaps, but that's still what's being shipped with the latest CentOS6 (and by extension, RHEL 6 as well). Old as it might be, it's in very wide use.

This would be a tremendous boon for those environments!

wmf · on Oct 22, 2014

But people who use ancient RHEL/CentOS don't apply random kernel patches and I doubt Red Hat will upstream it either.

chaosphere2112 · on Oct 22, 2014

RHEL6/CentOS 6 are the most recent releases; not exactly what I'd call ancient.

abjorn · on Oct 22, 2014

RHEL 7/CentOS 7 are the most recent releases. 6 was released 3 years ago.

ams6110 · on Oct 22, 2014

MANY production environments are still on RHEL 5.

wolf550e · on Oct 22, 2014

Apache and openssl in rhel5 are too old to use IMO. People should upgrade.

nl · on Oct 23, 2014

Redhat applies critical security fixes for RHEL5 until 2020[1], and that's why people pay the money.

Yeah, it's old, outdated etc, but sometimes if something works it makes sense to stay on it.

[1] https://access.redhat.com/support/policy/updates/errata#Life...

wolf550e · on Oct 23, 2014

The openssl 0.9.8 with Apache/2.2.3 combo only supports TLS 1.0. I couldn't setup TLS to get better than grade "B" on Qualys' SSL Server Test. I sacrificed MSIE on WinXP, used TLS 1.0 only, TLS_RSA_WITH_AES_128_CBC_SHA, TLS_DHE_RSA_WITH_AES_128_CBC_SHA, TLS_RSA_WITH_AES_256_CBC_SHA, TLS_DHE_RSA_WITH_AES_256_CBC_SHA only, got a "B". I want Forward Secrecy only, AEAD only setup. Have to upgrade to RHEL6 for that.

uberneo · on Oct 22, 2014

Yes i totally agree .. nginx has very less memory footprint as well

chrishepner · on Oct 22, 2014

They are the second to most recent releases - RHEL7 was released in June, CentOS 7 in July.

edsiper2 · on Oct 22, 2014

Looks like its based on 2.6.32 series. I would hope they start working with the upstream Kernel otherwise this project will stay stuck in Limbo as previous initiatives to improve TCP handling at kernel level (e.g: Megapipe).

This version do not support TCP_FASTOPEN, SO_REUSEPORT, TCP_AUTOCORKING, etc.

0xbadcafebee · on Oct 22, 2014

It's RedHat's 2.6.32 kernel, which is not the vanilla Linux 2.6.32 kernel. It's been getting backported fixes since the tree began. RedHat does not release what patches they include in their kernels, but luckily for us, Oracle maintains a project called RedPatch which publicly documents the patches going into the RHEL kernels.

As an example of how this kernel is not the 2.6 tree: On April 14, 2014, in RedHat's 2.6.32-431.23.3.el6 kernel tree [1] , there was recently a patch included [2] that affects the ipv4 subsystem. You can find that same patch [3] was originally applied to the Linux kernel on April 14, 2014. This is common practice, and so RedHat kernels more closely resemble modern kernels like 3.12 than anything else.

[1] https://oss.oracle.com/git/?p=redpatch.git;a=shortlog;h=rhel... [2] https://oss.oracle.com/git/?p=redpatch.git;a=commitdiff;h=c0... [3] http://www.serverphorums.com/read.php?12,912195

desdiv · on Oct 22, 2014

    RedHat does not release what patches they include in their kernels

Stupid question here: aren't kernel patches considered derivative works? If so, then isn't RedHat legally obligated to released them under GPL?

logic · on Oct 23, 2014

They provide the complete source, as required by the GPL. However, they do not provide patch sets neatly broken out like they used to; that's what the parent is referring to.

lmz · on Oct 23, 2014

The patched source is there for others to rebuild (e.g. CentOS and other RHEL derivatives), just not separated out into nice patches.

edsiper2 · on Oct 22, 2014

I understand that, but my point is that the only way to make this project succeed (out of your business of course) is to push to upstream (LKML).

Stick with 2.6 and backported patches are not near of newer kernels distributed as part of RHEL 7 (3.10.x).

crazydoggers · on Oct 22, 2014

Why would the evaluation charts look the way they do?

https://github.com/fastos/fastsocket#online-evaluation

The "before and after" CPU series have nearly the same exact fit. If the data was from separate 24 hour periods, wouldn't you expect the graphs to look different? I recognize that with a large service, you'd get repetitive load patterns, but the similarity here look a little extreme.

sandstrom · on Oct 22, 2014

I'm guessing here, but it could be two machines, load-balanced. One running with the patch and the other without it.

desdiv · on Oct 22, 2014

I agree; it's probably just a language barrier instead of something nefarious.

    In the figure below, it is the CPU utilization
    of a 8-core servers within 24 hours.

Author probably meant to write "two 8-core servers".

teraflop · on Oct 22, 2014

I find the first graph peculiar on its own. Supposedly, each line is the load on one of 8 cores on the same machine. Why would some cores experience heavier load than others, very consistently, over the course of a day? I've never seen a workload exhibit that kind of long-term, core-level affinity on Linux.

wmf · on Oct 22, 2014

This sounds like RSS or flow director gone horribly wrong.

yxhuvud · on Oct 22, 2014

Well, the obvious reason for such a graph is that the network load balancing between several waiting worker processes isn't symmetrical.

teraflop · on Oct 22, 2014

Even if that was the case, there isn't normally a stable mapping between processes and physical cores. There would have to be something within the kernel itself that gives higher priority to some cores than others.

Not saying that's impossible, but I've worked on machines with more than 8 cores and never seen it happen.

ddevault · on Oct 22, 2014

Is this being considered for merging upstream? What's the tech behind it, what makes it faster?

minimax · on Oct 22, 2014

It looks like there are like 3 separate optimizations, but I think the most important one is the "enable_listen_spawn" feature. Here is how they describe it:

Fastsocket creates one local listen socket table for each CPU core. With this feature, application process can decide to process new connections from a specific CPU core. It is done by copying the original listen socket and inserting the copy into the local listen socket table. When there is a new connection on one CPU core, kernel tries to match a listen socket in the local listen table of that CPU core and inserts the connection into the accept queue of the local listen socket if there is match. Later, the process can accept the connection from the local listen socket exclusively. This way each network softirq has its own local socket to queue new connection and each process has its own local listen socket to pull new connection out. When the process is bound with the CPU core specified, then connections delivered to that CPU core by NIC are entirely processed by the same CPU core with in all stages, including hardirp, softirq, syscall and user process. As a result, connections are processed without contension across all CPU cores, which achieves passive connection locality.

The kernel in its normal configuration will try to spread the IRQs evenly across CPUs. So for their use case, where you have one worker thread per CPU handling zillions of short lived TCP connections, they can eliminate a bunch of locking and cache thrashing that would otherwise happen when handling new connections and dispatching the related epoll events within the kernel.

corbet · on Oct 22, 2014

The developers have not posted it to the relevant mailing lists or asked that it be merged, so, no, it is not being considered.

harry8 · on Oct 23, 2014

Maybe it's up to the more sensible in the kernel "community", to reach out to the developers of code known to be interesting to discuss what's in it for them to do the work required to get it merged, the probability of doing a ton of work, and then being ignored etc etc.

There's a sense in the above of "They haven't submitted us so we don't care." It might not be the best way to make the kernel as good as it can be, if that is the goal of anyone active in the kernel "community." (And maybe it is).

I have a lot of sympathy for someone publishing their code and their results and then saying "I won't play stupid kernel politics, your move." I don't know if that's what is happening here or it's cultural differences or something I haven't thought of. Nor do I know if this particular development is worthwhile merging, but hey, neither does the kernel "community" right?

on Oct 22, 2014

[dead]

MoOmer · on Oct 22, 2014

Have you identified a vulnerability, or is that pure conjecture?

ericfrederich · on Oct 23, 2014

the latter

sandGorgon · on Oct 22, 2014

does this occupy the same functional space as zeromq or nanomsg ? are there any comparisons?

Twirrim · on Oct 22, 2014

This is much lower level than those. This is all about the TCP stack.

This is the OSI model, from the top down:

7) Application 6) Presentation 5) Session 4) Transport 3) Network 2) Data link 1) Physical

ZeroMQ fits in neatly at the top, layer 7 (arguably it is the presentation layer too because it uses its own protocol).

What this is talking about relates to network sockets which is around layers 4 and 5 (you can find lots of debate around the subject). Any speed improvements at lower levels in the stack would be seen by stuff on layers above it.

wmf · on Oct 22, 2014

No, this just makes TCP faster.

theyoungestgun · on Oct 23, 2014

Better yet - avoid the kernel altogether!

Onload + Solarflare is a wonderful thing.

haosdent · on Oct 22, 2014

Great job!