Hacker News new | past | comments | ask | show | jobs | submit login
Filtering millions of packets per second on commodity NICs (cloudflare.com)
145 points by jgrahamc on Oct 9, 2015 | hide | past | favorite | 27 comments




Thanks! Every single post in that series seems very detailed yet the average software guy will understand them without much trouble.

I really enjoy the effort you guys put into those detailed blog posts.


What is the aggregate PPS Cloudflare handles now? Whats the goal (aside from infinite)?


We regularly have many-million pps per server.

You might find this interesting:

https://youtu.be/UcAygzNSxlI?t=7980

https://indico.dns-oarc.net/event/21/contribution/5/material...


Awesome links, thanks!


We would also like to thank Luigi Rizzo, for his Netmap work and great feedback on our patches.

Clearly a useful contribution! The linked pull request looks like more of a finished product; I appreciate it even more when companies include the details of the sausage making.


This gives a very interesting data point to the "open-source" vs. "free software" debate. Normally free/libre software zealots would tout BSD/ISC/Apache licenses as a way to never get back any downstream changes. And yet - cloudflare did contribute back nicely to a BSD-licensed project, in a situation where they were absolutely not under an obligation to do so.

In fact, even GPLv2 would not have imposed an obligation to publish changes here, only a super-strict GPLv3 would.

One data point of course, hardly warrant a far-reaching conclusion; still - that is something very nice to see.


We open source stuff because it's a virtuous circle. We think other people will look at our code and make it better!


What exactly is the process() doing in the sample? Or is that also commented out in the test?

Because if the only processing here is throw-away this still screams for a FPGA in front of the NIC. Someone mentioned higher R&D on a FPGA solution, but clearly there is massive R&D here in just making sure evil packets don't hit a slow code path.


You can get FPGA-based switches, e.g. from Arista. They're not cheap, but you can do whatever you like with the packets as the bytes arrive. But for most applications you'd stick with commodity cards for the cost.


FPGA-based switches from Arista are a gimmick of that particular vendor. 10G ethernet and beyond is absolutely commodity in the FPGA world, every dev kit has one.


An FPGA dev kit probably costs more than a NIC and is harder to program.


Does anyone know about the current state of IP routing on commodity NICs and Linux? Is 14M pps on 500'000 routes possible?


You want to check out Brocade, Intel DPDK and 6wind. Brocades Vyatta router has DPDK support as has Juniper VMX.

http://www.slideshare.net/shemminger/dpdk-performance


Nice, DPDK has even a library for longest prefix matching [1] but sadly there are no published performance results.

[1] http://dpdk.org/doc/guides/prog_guide/lpm_lib.html#lpm-api-o...


We're doing 5Mpps (routed, net bridged) on a single core with a project we're calling netmap-fwd.

I'll blog about it on blog.pfsense.org in a few days when I return from Brazil.


RSS is subscribed :)


I don't have any figures for you right now, but routing is certainly a different problem than what is described in this article as routed packets don't need to be passed to userspace for any processing.

Makes a huge difference for performance!


depends on the commodity NIC. but yes, an E5 with an Intel 10G card can almost do 14Mpps "out of the box"


When you have to bypass your operating system to get your hardware to perform, perhaps it is time to re-assess your choice of Operating system.


This isn't quite so much bypassing the OS as it is redefining the boundary of the privileged space to not include the network traffic. This lets your filtering application get the network packets directly without having to copy them out of kernel space and into user space. This is exactly the same technique all high performance network devices follow presently. The ones that aren't doing it in userspace, are doing it in some sort of RTOS that doesn't even have protected memory spaces.


In hpc circles this scheme is called OS bypass

http://blogs.cisco.com/performance/mpi-newbie-what-is-operat...


(netmap author here) I prefer to define netmap as a "network stack bypass" scheme because we use as much as possible of the OS -- all the things it does well, we do not want to reinvent. Device drivers, system calls, synchronization support etc. are part of the kernel. Native netmap support for a NIC only involves 3-400 lines of code, or 10% of the typical device driver.

Processes do ioctl(), mmap() and poll() for I/O - all standard system calls implemented by the OS, there is no NIC-specific code in the application. NICs can be switched in and out of netmap mode without reloading modules (and with the cloudflare patch, even sharing the two modes). There are no custom memory pools or hugepages to reserve. Device configuration relies on ethtool and ifconfig etc.

This approach is what let the cloudflare folks implement their traffic steering with zero new code, just a couple of ethtool lines; the change they contributed back to support the split mode is completely agnostic of the specific NIC being used.


Greatest problem is copying data in and out kernel-space to user-space.

This causes huge traffic on RAM and trashes caches all along the way.


> Greatest problem is copying data in and out kernel-space to user-space.

even if you could do away with copying e.g. packet_mmap (on linux), context switch will kill you...


Which OS would handle the load he speaks of?


HPC also use OS Bypass so the plan9 team, who develop for Blue Gene and other large clusters, worked on currying system calls to maximize throughput

http://4e.iwp9.org/papers/usecsys.pdf




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: