AWS's availability zones are unlike Google Clouds AZs, as they are physically meaningfully distant.
A geological or technical event near one AWS AZ is highly unlikely to also affect the others of that region, unless the event is regional (heh) in scale. For Google Cloud, the AZs are effectively co-located buildings (or at least, Eemshaven has 3 GC AZ DCs at < 1km distance from each other), making even local disruptions reasonably likely to impact availability for the whole region.
Yes, that being said, I would expect those same precautions to be taken in all regions, and thus not give a meaningful difference per region, between regions (if that makes sense).
Different regions have different types of worries. Distance and precautions required to deal with redundancy for an earthquake may not be the same as required for a tornado, or hurricane, or flood, etc. In some cases, I imagine elsewhere in the same large city may suffice. In others, maybe it requires 50-100 miles.
You're correct that some Google Cloud zones are in the same physical building. However, the separate zones are designed with independent power, cooling, and networking. So, failure events usually affect only a single zone.
Perhaps OP can answer. The AZ ID[0], as opposed to the name, does identify a specific AZ location, so certainly possible to measure AZ-to-AZ latencies.
It would be interesting to also see what the lowest possible latency between the regions would be so you can see how much overhead the interchanges are adding.
For example, the distance between us-east-1 and us-west-1 is approximately 2,500 miles. Light travels at 186,000 miles/second, so that's 13 milliseconds one-way. So the fastest possible TCP SYN/ACK/SYNACK is 39 milliseconds between the two DCs just to establish a connection.
Light travels at 186,000 miles/second in a vacuum. A quick websearch says fiber's index of refraction is ~1.467. One round trip then takes 39 ms (coincidentally the same number you said for 3 legs), vs the 62.36 ms on this chart. In theory they could reduce latency by ~23 ms then with a truly direct path. To do better would require abandoning fiber.
Neat! I'd never heard of that. I look forward to the day when it's generally used rather than yet another cool technology wasted on stupid high-frequency trading.
If you’re going to calculate distances between two geological locations, then you want to calculate Great Circle distances, as that will be the shortest path at the surface of the planet.
A straight line distance might require tunneling down into the planet a surprising amount of distance, and is an absolute lower bound on the possible distance between the two points, but is not a realistic distance that could actually be achieved in most cases.
I used to read ThousandEyes's annual free reports on networking capabilities and measurements across the Big 3 cloud providers, which were quite comprehensive in terms of covering various scenarios [0]. I don't think they publish those anymore.
With cloudping.co, it isn't clear from the website, but it seems like the inter-region latency they measure is over the public Internet. The Big 3 run their own backbone across all their DCs around the globe, and so I reckon, those numbers would look vastly different if traffic was instead relayed through those uncongested backbones.
With AWS, the cheapest way I know (in terms of development time and cost) to accomplish inter-region over their backbone is via Global Load Accelerator. The clients connect to GLA at the nearest anycast IP location advertised across 150+ AWS PoPs. You can then play with endpoint-groups, connection-affinities, source/port tuples to control routing traffic to different backends in various regions.
We used this technique to prototype a VPN with exit nodes in multiple countries but entry nodes closest to clients at every AWS PoP. It worked quite nicely for a toy: https://news.ycombinator.com/item?id=21071593
Inter region traffic always goes over the backbone (this includes EIP to EIP). This also includes going from EC2 to any service like S3 in another region.
Except China. China to rest of world is not via backbone.
I doubt unless you're using VPC peering, Transit Gateway, or Private Link that it would be the case that user-generated traffic between regions (for ex, between EC2 instances in Dublin and Sydney) is automatically routed through their backbone. Can you point to the re:Invent presentation? Genuinely curious.
Thanks. To confirm: You're pinging between the EC2s using their public DNS, right?
If AWS backbone is used automagically, I wonder why would anyone pay for Transit Gateways or VPC Peering rather than do mTLS between their cross-region instances or tunnel via Wireguard-esque transports like tailscale or defined.net, for example. Also, since when has this been the case, if you'd know?
I'm curious what the bandwidth charges are for EC2 to EC2 cross-region when using their public IPs / DNS? Same as VPC Peering?
VPC Peering bandwidth rates are $0.01 / GB. EC2 (public Internet?) bandwidth rates are $0.09 / GB. For xfers between EC2 to EC2 via AWS backbone, I assume I'd still be charged the public Internet bandwidth rates, right?
This might be a rare case that actually benefits from being an interactive map with lines colored differently/dotted differently to indicate interconnect speed (of course, keeping the table for non-js compatibility).
I don't know how they are accounting for the various different data centers within a region (IE. Availability zones). It could explain self latency if a packet is leaving a DC
Definitely not true. Cross AZ is incredibly high compared to within the same AZ. We run a bespoke multi-master database setup that must be colocated to the same AZ due to unacceptable latencies when run spread across 3 AZ's. A few ms difference
yes, while in my opinion most services won't care about the distinction between cross-az/same-az (you're definitely not most services :) ), you can definitely tell the difference between
* in the same region vs not
* in the same az vs not
* in the same placement group or not
which shouldn't be surprising. you should get a latency benefit from increased physical proximity!
1) <10ms should be gray since those are effectively in-region numbers (and it would make the chart easier to understand quickly)
2) Times should be tagged with a theoretical minimum time and how far off the real number is from that.
3) (Bonus points) Times should also be tagged with a 'real' minimum time that is based on the lengths of the actual fiber lines and number of interchanges.
This looks great, but I wish that this was more colorblind friendly.
Note: For anyone else that is on a Windows 10 machine and is struggling to see the difference, there is a colorblind mode that you can toggle with Winkey + Ctrl + C.
Ah, I found this when we were discussing internally building out a multi region installation of one of our products. We were discussing the tradeoffs between a single master database with calls from other regions to that database, vs replicating a database between regions. The latter is more complicated to run but has far less latency.
That's when I found this chart through the magic of google to bring some real numbers (tm) to the latency discussion.
Ironically less colorful. :)