Works out to around 115 function calls a second per server

kyeb · on Oct 23, 2023

The calls per server is probably not the difficult part - this is the type of scale where you start hitting much, much harder problems, e.g.:

- Load balancing across regions [0] without significant latency overhead

- Service-to-service mesh/discovery that scales with less than O(# of servers)

- Reliable processing (retries without causing retry storms, optimistic hedging)

- Coordinating error handling and observability

All without the engineers actually writing the functions needing to know anything about the system (which requires airtight reliability, abstractions, observability).

I don't mean to comment on whether this is impressive or not, just pointing out that per-server throughput would never be the difficult part of reaching this scale.

[0] And apparently for this system, load balancing across time, which is at least a mildly interesting way of thinking about it

booi · on Oct 23, 2023

That doesn’t sound right… maybe per user?

eddtries · on Oct 23, 2023

1 Trillion function calls over 100,000 servers. Technically they say trillions over hundreds of thousands, but I went for the lower case.

1,000,000,000,000 / 100,000 = 10,000,000

10,000,000 / 24 / 3600 = 115.7

xxs · on Oct 23, 2023

We don't know what the servers are; dual socket EPIC, i.e. 128cores, makes the number look beyond trivial.

kunley · on Oct 23, 2023

So... 115.7 rps doesn't sound groundbreaking, right?

eddtries · on Oct 23, 2023

Was just trying to break the number down to something more easy to understand, I don’t know enough if this is impressive or not! Depends on the complexity of the request, and I guess the complexity of routing that many requests over such a large network. I’ve never worked at that scale.

nick0garvey · on Oct 23, 2023

This isn't unreasonable. ML workloads benefit from more computational time per request. Lower QPS = better results.