Could you expand a bit more on you comment? I feel I'm missing some context. Spe...

walrus01 · on April 5, 2017

It's gold plated because they basically built their own ISP by acquiring either:

a: dark fiber IRUs between cities/metro areas

b: N x 10 and 100 Gbps wavelengths as L2 transport services from city to city, from a major carrier such as level3 or zayo

c: some combination of A and B

and they use that to build backbone links between their own network equipment that they have full control over. Google is its own AS and operates its own transport network around the US 48 states and around the world.

the exact design of what they're doing within their own AS at layers 1 and 2 is pretty opaque unless you happen to be a carrier partner that is willing to violate a whole raft of NDAs. But basically they've built their own backbone to a very massive scale yet without the huge capital expense of actually laying their own fiber between cities.

their network has incredibly low jitter because they don't run their links to saturation, and know EXACTLY what the latency is supposed to be from router interface to router interface between the pairs of core routers that are installed in each major city. Down to five decimal places, most likely. When you have your own dark fiber IRUs and operate your own WDM transport platforms you are in possession of things like OTDR traces for your dark fiber that tells you down to four decimal places the km length of your fiber path.

It also helps that the sort of people who have 'enable' on the AS15169 routers and core network gear are recruited from the top tier of network engineers and appropriately compensated. If they weren't working for Google they would be working for another major global player like NTT, DT, France Telecom/Orange, SingTel or Softbank.

puzzle · on April 5, 2017

Where do you get the crazy idea that Google doesn't run its links to saturation? It's crazy because it would cost an enormous amount of money.

The B4 paper states multiple times that Google runs links at almost 100% saturation, versus the standard 30-40%. That's accomplished through the use of SDN technology and, even before that, through strict application of QoS.

https://web.stanford.edu/class/cs244/papers/b4-sigcomm2013.p...

A few more details about strategies here:

https://research.google.com/pubs/archive/45385.pdf

Then there's a whole bunch of other host-side optimizations, including the use of new congestion control algorithms.

http://queue.acm.org/detail.cfm?id=3022184

You might recognize the name of the last author...

dspillett · on April 5, 2017

No, it would be crazy for them to run things at saturation under normal circumstances as that does not allow at all for abnormal circumstances. The opportunity cost of not using something 100% all the time is offset against the worth of increased stability/predictability in the face of changing conditions.

Though you do need to define "saturation". Are you referring to bulk bandwidth or some other measure of throughput/goodput? Saturating in terms of raw bandwidth can reduce useful throughput due to latency issues.

walrus01 · on April 5, 2017

What I mean is that they do not run their links to saturation in the same way as an ordinary ISP. And because their traffic patterns are very different than an ordinary ISP, and much, much more geographically distributed, they can do all sorts of fun software tricks. The end result is the same: Low/no jitter and no packet loss.

As contrasted with what would happen if you had a theoretical hosting operation behind 2 x 10 Gbps transit connections to two upstreams, and tried to run both circuits at 8 to 9 Gbps outbound 24x7.

danpalmer · on April 5, 2017

For clarity, do you mean that Google can, for example, run to 99% saturation all the time, whereas a typical ISP might have 30-40% average, with peaks to full saturation that causes high latency/packet loss when it occurs?

pas · on April 5, 2017

Yes, that's about right. Since they control both sides of the link, they can manage the flow from higher up on the [software] stack. Basically, if the link is getting saturated, the distributed system simply throttles some requests upstream by diverting traffic from places that result in traffic over that link. (And of course this requires a very complex control plane, but doable, and with proper [secondary] controls it probably stays understandable, manageable, and doesn't go haywire when shit hits the fan.)

collinmanderson · on April 6, 2017

So I wonder if that means they can do TCP control flow without dropping packets.

pas · on April 11, 2017

I guess they do drop packets (it's the best - easiest/cheapest/cleanest - way to backpropagate pressure - aka backpressure), but they watch for it a lot more vigorously. Also as I understand they try to separate long lived connections (between DCs) from internal short lived traffic. Different teams, different patterns, different control structures.

packetslave · on April 5, 2017

@puzzle: while you're not wrong, do note that B4 is not (and is not designed to be) a low-latency, low-jitter network. It's designed for massive bandwidth for inter-datacenter data transfer.

walrus01 · on April 5, 2017

running your own internal links to near saturation (such as a theoretical 100 Gbps DWDM or MPLS circuit between two google datacenters in two different states) is a very different thing than running a BGP edge connection to saturation, such as a theoretical 100 Gbps, short reach intra building crossconnect from a huge CDN such as Limelight to a content-sink ISP such as Charter/TWTC or Comcast.

packetslave · on April 5, 2017

Very much so. B4 can be run near 100% because of strict admission control and optimized routing to maximize the use of all paths. It's much harder to do that on peering links where the traffic is bursty and you don't have control over the end-to-end latency and jitter. SDN isn't a magic pill for this, but it can most definitely lead to better performance and higher utilization than Ye Olde BGP Traffic Engineering.

dalailambda · on April 5, 2017

Building distributed systems makes you aware of how unreliable things are at the large scale, e.g. the network. The parent comment implies that Google's network is so fast and reliable that it becomes tempting to ignore best practices and work as if it's a non distributed system.

jemfinch · on April 5, 2017

He's saying that it's so reliable and the throughput is so high that sometimes you have to convince yourself that your computers are halfway across the planet.

daenney · on April 5, 2017

I'm not sure they're saying that, they're just claiming Google has really good and well run networks. But even Google hasn't solved the speed of light issue, packets can only travel so fast. If your computer is halfway across the planet, you'll notice no matter how fancy your network is.

geeio · on April 5, 2017

I assume they are talking about things like the truetime clocks used in spanner, which are not available on commodity hardware

wbl · on April 5, 2017

Depends on commodity. You can just buy GPS slaved rubidium clocks with PTP output.