Jeff Dean - Achieving Rapid Response Times in Large Online Services

colmmacc · on April 4, 2012

Jeff Dean's presentation comes close to suggesting concurrency as an approach, with the 2ms wait-and-see approach, but I often wonder if a truly concurrent pattern of access is feasible in some cases.

For many idempotent operations, it seems likely the most fault-tolerant and latency-optimizing pattern is to ask multiple endpoints concurrently - take the first to respond, and discard or terminate the duplicates (and if one fails, so what?). Obviously this isn't cheap; sufficient bus/network and endpoint capacity is necessary - but isn't it the natural progression of a trend?

Some RAID1(0) implementations behave like this (for reads) already.

In say 5 years, might it be optimal for web browsers to initiate TCP connections to multiple web endpoints, and go with the first to initialise? One would imagine the additional TCP load to be tolerable. Or maybe even multiple full HTTP requests and take the first to fully succeed - maybe bits will be cheap enough to transmit by then that the gains are worth it.

In say 20 years, might it be optimal for networks to disperse packets via multiple diverse paths, and ignore duplicates? Just how cheap can Moore's law and Shannon's theorem push network transmission?

DNS requests are so cheap that many DNS resolvers already implement this technique - sending queries to authoritative servers in parallell and just going with the first to respond.

Clearly there are many times when concurrency is worse; non-idempotent operations, or when read-after-write consistency is required. And of course the costs have to be manageable and make sense. But I'd love to know of any economic or technical studies that have been made for the approach in general.

modeless · on April 4, 2012

In say 5 years, might it be optimal for web browsers to initiate TCP connections to multiple web endpoints, and go with the first to initialise?

In fact, browsers do something like this today to deal with broken IPv6 implementations, under the name "Happy Eyeballs": http://tools.ietf.org/html/draft-ietf-v6ops-happy-eyeballs-0...

The idea is that if a IPv6 connection isn't established after 300ms then the browser just starts a duplicate connection over IPv4 and uses whichever one comes back sooner.

colmmacc · on April 4, 2012

That's another good example! But 300 ms is nearly four times the limit of human perception. So I do wonder if - as network and service costs diminish - simply using many connections for the same thing won't seem so "wasteful".

strlen · on April 4, 2012

> For many idempotent operations, it seems likely the most fault-tolerant and latency-optimizing pattern is to ask multiple endpoints concurrently - take the first to respond, and discard or terminate the duplicates (and if one fails, so what?). Obviously this isn't cheap; sufficient bus/network and endpoint capacity is necessary - but isn't it the natural progression of a trend?

That's called latency leveling. It's actually done (at least on writes) by the different open source Dynamo implementations.

In addition to idempotence, unless your requests are being serialized through a single gateway (i.e., you're taking advantage of the TCP ordering guarantee), this also requires commutativity.

Don't get me wrong, it's a great approach and has many advantages (and I'll heartily endorse it for many applications) but there are a few drawbacks:

1. When you do want read read-your-writes consistency even under failures, you have to use quorum protocols. Again, quorum protocols are great, but they have drawbacks: you have to wait for the slowest node to finish, you need to aggressively increase replication to achieve read-your-writes and maintain fault tolerance (when |Writes| + |Reads| > replication factor, you can only survive the failure of 1 node with replication factor is 3).

2. Getting de-duplicated commit log with even a partial ordering to ship to another application (or to another instance of the storage system, e.g., in a different datacenter) is more difficult. Although I suppose you could, e.g., notify the other application that a change has taken place and have the other application _itself_ make (batched?) requests for the full after image (as opposed to just the individual mutation) of the data.

Either way, there's no write and wrong, just different approaches with separate trade offs.

[Edit: just read your bio and noticed you're from Amazon/AWS: A far more heterogenous environment than Google. I wonder how the new kinds of applications Google is building are influencing the infrastructure trade offs underneath.]

Maro · on April 4, 2012

Assuming first GET will succeed and return quickly in 99.9..9% of the cases, so always sending a second GET seems wasteful, as it would put constant additional load on the network, CPUs and disks.

colmmacc · on April 4, 2012

Here's how I'm thinking about that; the costs of network transmission, computation and storage are all decreasing at a greater rate than the global economy (and population) is growing. Per person (or spending-unit) they are getting very cheap.

At some point, those cost-reduction curves plateau and the improvements become marginal - but at whatever rate they settle, will the costs be so low that using many systems as a form of massively-distributed optimistic concurrency will be a no-brainer?

And at the meta-level; are the benefits for human interaction compelling enough to themselves help drive the costs down.

bbgm · on April 4, 2012

There is a lot of good stuff on Luiz Barroso's page as well: http://research.google.com/pubs/LuizBarroso.html

sanxiyn · on April 5, 2012

See also a previous discussion: http://news.ycombinator.com/item?id=3694148