Hacker News new | past | comments | ask | show | jobs | submit login

> to build clusters at scale, we want to avoid the need to place Maglev machines in the same layer-2 domain as the router, so hardware encapsulators are deployed behind the router, which tunnel packets from routers to Maglev machines

Is "hardware encapsulator" just a fancy way of saying they're using tunnel interfaces on the routers, or is a "hardware encapsulator" an actual thing?




We (Cumulus Networks) have customers who use VXLAN for this type of tunneling. It is supported in modern Ethernet switch chips.

They're doing basically what Google is, but with off the shelf hardware and a openly-buyable NOS.


It's a real piece of hardware.


Commercially available?


Yes, it's just a 10GbE ethernet switch that can encapsulate the traffic in VXLAN headers, so that it can traverse east/west between any of thousands (millions?) of hypervisors without requiring traffic to hairpin to a gateway router and back. The logical networks all exist in an overlay network, so to the customer VMs, you get L2/L3 isolation. But, to the underlying hypervisors, they actually know which vNICs are running on each hypervisor in the cluster, so they can talk directly on a large connected underlay network at 10GbE (x2) line rate.

This is the standard way of distributing traffic in large datacenters. That way you get extremely fast, non-blocking line rate between any two physical hosts in the datacenter, and since the physical hosts know which VMs/containers are running on them, they can pass the traffic directly to the other host if VMs exist in the same L2 network, and even do virtual routing if the VMs exist across L3 boundaries - still a single east/west hop.


So it was a mystery-inducing way of referring to some commodity SDN-related tech, thanks, that's far more informative than the paper :)


Broadcom makes "switch on a chip" modules that will do VxLAN encapsulation and translation to VLAN or regular Ethernet frames. That chipset is available in lots of common 10/40/100 GbE switches from Arista/Juniper/White Box.

In a regular IP Fabric environment we would all this device a VTEP.


Fair point, we should just have taken the opportunity to say SDN here.


Any way you could provide a link to this off the shelf gear?


Here's one that's pretty popular: https://www.arista.com/en/products/7050x-series


Ah sure Arista, I was thinking a white label OEM for some reason. I have no experience with the gear but it sounds great on paper. Thanks!


There is white label OEM gear (the Arista and Cisco gear is now just OEM with their firmware running on it), but unless you're Google or Facebook and can write your own firmware, chances are you're better off with an "enterprise" solution like Arista or Cisco who will give you support and fix bugs in the firmware for you.


(Tedious disclaimer: my opinion only, not speaking for anybody else. I'm an SRE at Google, and I'm oncall for this service.)

No.

Edit: expanding on this a little, it's not something that's been released so we can't talk about it. I don't think I can comment on "illumin8"s proposals other than to say that I'm pretty sure they don't work here.


Google's exact ToR (top of rack) switch code isn't available, but you can buy a switch from any number of network gear vendors (Arista, Cisco, Brocade, Juniper, HP, etc), that can do VXLAN encapsulation and send the traffic over a leaf/spine network that covers thousands of racks.


I can't imagine Google is building clusters at such an alarming rate that it would justify manufacturing its own silicon for edge deployment, which suggests whatever commodity silicon is in the magic box can probably be found in a variety of vendor equipment wrapped in OpenFlow or similar




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: