We would also like to thank Luigi Rizzo, for his Netmap work and great feedback on our patches.
Clearly a useful contribution! The linked pull request looks like more of a finished product; I appreciate it even more when companies include the details of the sausage making.
This gives a very interesting data point to the "open-source" vs. "free software" debate. Normally free/libre software zealots would tout BSD/ISC/Apache licenses as a way to never get back any downstream changes. And yet - cloudflare did contribute back nicely to a BSD-licensed project, in a situation where they were absolutely not under an obligation to do so.
In fact, even GPLv2 would not have imposed an obligation to publish changes here, only a super-strict GPLv3 would.
One data point of course, hardly warrant a far-reaching conclusion; still - that is something very nice to see.
What exactly is the process() doing in the sample? Or is that also commented out in the test?
Because if the only processing here is throw-away this still screams for a FPGA in front of the NIC. Someone mentioned higher R&D on a FPGA solution, but clearly there is massive R&D here in just making sure evil packets don't hit a slow code path.
You can get FPGA-based switches, e.g. from Arista. They're not cheap, but you can do whatever you like with the packets as the bytes arrive. But for most applications you'd stick with commodity cards for the cost.
FPGA-based switches from Arista are a gimmick of that particular vendor. 10G ethernet and beyond is absolutely commodity in the FPGA world, every dev kit has one.
I don't have any figures for you right now, but routing is certainly a different problem than what is described in this article as routed packets don't need to be passed to userspace for any processing.
This isn't quite so much bypassing the OS as it is redefining the boundary of the privileged space to not include the network traffic. This lets your filtering application get the network packets directly without having to copy them out of kernel space and into user space. This is exactly the same technique all high performance network devices follow presently. The ones that aren't doing it in userspace, are doing it in some sort of RTOS that doesn't even have protected memory spaces.
(netmap author here)
I prefer to define netmap as a "network stack bypass" scheme because we use as much as possible of the OS -- all the things it does well, we do not want to reinvent. Device drivers, system calls, synchronization support etc. are part of the kernel. Native netmap support for a NIC only involves 3-400 lines of code, or 10% of the typical device driver.
Processes do ioctl(), mmap() and poll() for I/O - all standard system calls implemented by the OS, there is no NIC-specific code in the application. NICs can be switched in and out of netmap mode without reloading modules (and with the cloudflare patch, even sharing the two modes). There are no custom memory pools or hugepages to reserve. Device configuration relies on ethtool and ifconfig etc.
This approach is what let the cloudflare folks implement their traffic steering with zero new code, just a couple of ethtool lines; the change they contributed back to support the split mode is completely agnostic of the specific NIC being used.
https://blog.cloudflare.com/how-to-receive-a-million-packets...
https://blog.cloudflare.com/how-to-achieve-low-latency/
https://blog.cloudflare.com/kernel-bypass/
https://blog.cloudflare.com/single-rx-queue-kernel-bypass-wi...
I hope this gives a bit more context.