Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

So my problem with SONiC, Cumulus, and all these other "network operating system" Linux platforms is that they all seem to be designed to be fiefdoms and tend to be really stale. In my view, they bring almost nothing of value to the table.

Let's talk a bit more about SONiC...

For one, SONiC literally lists that pull requests that aren't already planned and approved will not be accepted[1]. This defeats a good chunk of the value of having a community project. People will want to contribute and extend your platform in ways you never thought of, and they'll do it in a completely decentralized fashion.

Another issue I see is that SONiC literally holds back everything to an old Linux kernel and ships random BSP blobs that are unvetted. This is a nasty combination for anyone who wants to consider their NOS trusted or secure. They're on a 4.9.x kernel, and while that is still maintained, it is far from the best option if you want to take advantage of innovation in Linux networking.

I'm also generally confused on why this whole project isn't just "let's get the networking tools and hardware support stuff into standard Linux distributions and leverage their tooling and communities". This was also a problem I had with Cumulus. When I tore apart Cumulus, I figured out that it was less than a dozen unique tools and a distribution rebuilt for 32-bit MIPS and PowerPC. It was pretty trivial to rebase to standard Fedora or Debian and get a better platform out of it.

And finally, I don't really think this provides any real innovation. It's not really different from Cumulus, Open Network Linux, and others. And ONL actually is using more up to date kernels (5.4.x as of right now!) and offers better networking tools!

What I would love to see is all these people who keep doing this crap working in the actual Linux distribution communities to build and integrate with upstream projects so that everyone downstream gets all kinds of flexibility.

Imagine if you had a flavor of Fedora CoreOS for your network gear! The immutable OS, updated with RPM-OSTree, fresh software stack, and broad hardware support, all in one neat package.

If we treated the network gear like weaker servers, instead of specialty equipment, there's so many more interesting things you can do!

[1]: https://github.com/Azure/SONiC/wiki/Sonic-Roadmap-Planning



>This was also a problem I had with Cumulus. When I tore apart Cumulus, I figured out that it was less than a dozen unique tools and a distribution rebuilt for 32-bit MIPS and PowerPC.

Almost all of which are open source (or at least "source available"), with the exception of switchd, which cannot be open sourced because it links with proprietary asic sdk's. I don't see how having very few custom tools over a vanilla Linux distribution is a bad thing.

>It was pretty trivial to rebase to standard Fedora or Debian and get a better platform out of it.

If you enable upstream Debian apt sources in your sources.list then it effectively is standard Debian - plus switchd.

Of course it is entirely possible to take all of the components of Cumulus Linux and use them on a separate operating system - enter sonic, vyos, etc - so if you build out such a system which can also drive ASICs and that you prefer over Cumulus, you can take full advantage of all of Cumulus's open source contributions.

>What I would love to see is all these people who keep doing this crap working in the actual Linux distribution communities to build and integrate with upstream projects so that everyone downstream gets all kinds of flexibility

If I read you correctly, Cumulus works upstream as much as it can:

   ~/linux$ git log --author "cumulusnetworks.com" --oneline | wc -l
   773

   ~/ifupdown2$ git log --author "cumulusnetworks.com" --oneline | wc -l
   1265

   ~/frr$ git log --author "cumulusnetworks.com" --oneline | wc -l
   8107
I like to believe Cumulus is quite active in the communities of projects it uses. I feel I may have misunderstood your point, though.

>If we treated the network gear like weaker servers, instead of specialty equipment, there's so many more interesting things you can do!

I completely agree, that's the dream!

Disclaimer: I work at Cumulus


> Almost all of which are open source (or at least "source available"), with the exception of switchd, which cannot be open sourced because it links with proprietary asic sdk's. I don't see how having very few custom tools over a vanilla Linux distribution is a bad thing.

The problem historically with Cumulus on this was that it was heavily obfuscated. In the past, when I talked to Cumulus sales folks, it was not quite as honest as what you've said.

I don't have a problem with the "shipping a Linux distribution you can support" thing. I have a problem with "not making it so the stuff you have is available everywhere (i.e. push into Fedora _and_ Debian to feed into all distros and ecosystems)".

> If I read you correctly, Cumulus works upstream as much as it can. I like to believe Cumulus is quite active in the communities of projects it uses. I feel I may have misunderstood your point, though.

Cumulus is actually a nice exception to this rule. Most Linux-based network operating systems do not bother (including SONiC, VyOS, EOS, etc), but Cumulus does good work here. My only complaint is the focus on ifupdown2 instead of helping make cross-distro tools like NetworkManager support these things. It's been a long time since NetworkManager was only for desktop-only use-cases and only did Wi-Fi. It's the standard tool on a wide range of distributions and supports server use-cases very well. I personally use it over ifupdown and netconfig on my systems.


>I have a problem with "not making it so the stuff you have is available everywhere (i.e. push into Fedora _and_ Debian to feed into all distros and ecosystems)".

Almost all of our kernel patches are in mainline Linux, and ifupdown2 and FRR are packaged on Fedora and others.

>Cumulus is actually a nice exception to this rule. Most Linux-based network operating systems do not bother (including SONiC, VyOS, EOS, etc)

In defense of VyOS, they contribute to FRR and generously offer free licenses for people who work on the projects they use (https://www.vyos.io/open-source-contributors/). I think in general there's a lot of goodwill between the people working in the open NOS space.

> My only complaint is the focus on ifupdown2 instead of helping make cross-distro tools like NetworkManager support these things

Gotcha, I understand now. I can't provide any direct insight into why ifupdown2 was chosen instead of nm. I also use nm on my personal devices - though I can't say I've ever missed the ability to e.g. configure vxlan tunnels on my personal infra ;). I guess if we'd chosen nm 10 years ago then there would be similar feelings from people who prefer /etc/network/interfaces. Of course, at the end of the day Cumulus engineering time is spent primarily on things that ship in Cumulus Linux.

Btw, appreciate the feedback :)


> Almost all of our kernel patches are in mainline Linux, and ifupdown2 and FRR are packaged on Fedora and others.

I did see FRR recently make its way into Fedora, but I haven't seen anyone package up ifupdown2 there. Is someone working on that at Cumulus? I'd be happy to do the package review if someone hasn't already grabbed it before me once it's submitted. :)

> In defense of VyOS, they contribute to FRR and generously offer free licenses for people who work on the projects they use (https://www.vyos.io/open-source-contributors/). I think in general there's a lot of goodwill between the people working in the open NOS space.

Oh, I don't doubt it. But it's weird how many of them built on Linux are still not FOSS or collaborating with their upstreams...

> Gotcha, I understand now. I can't provide any direct insight into why ifupdown2 was chosen instead of nm. I also use nm on my personal devices - though I can't say I've ever missed the ability to e.g. configure vxlan tunnels on my personal infra ;). I guess if we'd chosen nm 10 years ago then there would be similar feelings from people who prefer /etc/network/interfaces. Of course, at the end of the day Cumulus engineering time is spent primarily on things that ship in Cumulus Linux.

Well, for what it's worth, /etc/network/interfaces is supported by NetworkManager. :)

As for VXLAN configuration in my personal network, I do it for homelab stuff. Setting up layered networking is kind of necessary if I am going to be messing around with things like OpenStack and Kubernetes.

I totally get that the engineering is primarily spent on things that ship in Cumulus Linux. I just want to see more work from Cumulus that benefits everyone, especially given that networking is so hard to get right! :)

> Btw, appreciate the feedback :)

You're welcome. I'm happy to see such an engaged person from Cumulus like yourself responding well to feedback! :)


What are you trying to achieve here? What tools are you going to get and how do you want to use them? You do understand that the switching ASIC is just a PCI device right and that you cannot just pump all its bandwidth into the CPU for review? The path between the data plane (ASIC) and control plane (CPU) is limited, generally only a few gigs in today’s high end switches. Anything you want to do in the data plane has to be programmed on the ASIC. The only packets that are punted to the CPU are control plane are ones that need software processing and are low bandwidth, such as LLDP, STP, BGP control, etc. This is done by programming a switch ASIC table call “my station” or “l2 user”. On some kit you can tcpdump a front panel port to the CPU but it is rate limited as you can kill the CPU or stop processing of vital control plane packets (let’s DDOS the STP process, fun). Looking at traffic flow on the CPU on a 32x100g is not gong to happen. You need to sample, so sFlow, Netflow, etc. So given the limited bandwidth and any tools need to know how to translate your Linux configuration into Ethernet ASIC pipeline programming what is it you want to do that you cannot do today?

Random note. I worked at a switching startup (a few). At one we always ran own latest code. After an update to a core switch everything looked good, but then people started to complain things where very slow. Went looking. Switch looked fine but dropping traffic towards CPU which should not happen. In checking the cacti graphs for that switch (10 second polling) all the graphs that showed the ports between the different networks were exactly the same flat line at a max of 134MB/s on 10G pots. Hum, strange. Hold on, that sounds like the max BW between the ASIC and the CPU port! Let check some bits in the ASIC configuration. Yup. New build forgot to set HW routing on in the pipeline so every packet was punted to the CPU for route processing. Lucky control plane policy had the STP etc, packets at a higher queue. Tweak the bit, blam, graphs go to 11 :) File bug.


I think it is a shame and a mistake that PC industry has chosen PCI-Express over the battle tested Infiniband technology as the upgrade for the PCI [1]. Infiniband offers native channel based peer-to-peer connection fabric for disparate nodes and most of of the important CPU bottleneck tasks (e.g. memory protection & address translation) can be outsourced to the Infiniband controller instead of the proprietary ASIC networking controller.

The bottleneck is not only affecting networking but GPU industry as well. That's probably the main reason why Nvidia bites the bullet and bought major Infiniband player Mellanox for close to USD7 Billions deal. The bottleneck is only just bearable for video and games but not when you have to scale the processing of big data AI and machine learning applications.

[1] https://www.mellanox.com/pdf/whitepapers/PCI_3GIO_IB_WP_120....


This is already done today: https://github.com/Mellanox/mlxsw/wiki




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: