The calls per server is probably not the difficult part - this is the type of scale where you start hitting much, much harder problems, e.g.:
- Load balancing across regions [0] without significant latency overhead
- Service-to-service mesh/discovery that scales with less than O(# of servers)
- Reliable processing (retries without causing retry storms, optimistic hedging)
- Coordinating error handling and observability
All without the engineers actually writing the functions needing to know anything about the system (which requires airtight reliability, abstractions, observability).
I don't mean to comment on whether this is impressive or not, just pointing out that per-server throughput would never be the difficult part of reaching this scale.
[0] And apparently for this system, load balancing across time, which is at least a mildly interesting way of thinking about it
Was just trying to break the number down to something more easy to understand, I don’t know enough if this is impressive or not! Depends on the complexity of the request, and I guess the complexity of routing that many requests over such a large network. I’ve never worked at that scale.