Disk Locality in Datacenter Computing Considered Irrelevant (2011) [pdf]

jandrewrogers · on Nov 19, 2015

A number of the assumptions in the paper are questionable in hindsight.

Disk bandwidth consistently exceeds network bandwidth by a significant factor in common systems and this is likely to be the case for the foreseeable future. A platform with a good I/O scheduler can easily and demonstrably turn that into extra throughput on the same hardware for many use cases.

Their corroborating examples are platforms that happen to have poor I/O scheduling. In these cases, the effects of poor I/O schedules would be expected to dominate the effects of differences in bandwidth. You would expect the differences in available bandwidth to have no practical effect as a result. But there are many platforms that do implement good I/O schedulers where differences in bandwidth materially affect performance because the I/O schedulers can take advantage of that bandwidth.

While not entirely their fault, their assumptions about the cost of SSDs is incorrect. The difference in cost/GB is down to a relatively small integer factor and shrinking, well under a single order of magnitude. And these can deliver enormous local bandwidth in cheap systems. In many cloud clusters now, the cost of a large, local SSD JBOD (how you would want to use it) is less than the cost of the server, even with cheap servers. The cost of using SSD over spinning disk has become increasingly marginal for many applications.

In summary: platforms with good I/O schedulers see real benefits to disk locality, assuming the application is I/O intensive. Platforms with poor I/O schedulers not so much. Make the appropriate choice for your use case and platform.

vidarh · on Nov 19, 2015

The thing that struck me is that they are looking at what networks are available. But to date I've never worked on a single system using 10GbE or above. Of course they are out there in large numbers within specific niches.

But most people are still stuck on 1Gbps. At the same time a steadily increasing proportion of those systems have SSDs which can do 2GB/sec+ reads.

kijiki · on Nov 19, 2015

Also relevant: http://research.microsoft.com/pubs/170248/fds-final.pdf

PantaloonFlames · on Nov 19, 2015

What is the date of this publication? (Also, why do authors decline to date their papers?)

scott_s · on Nov 19, 2015

It probably slips their mind. This is the author's copy of a conference paper. The conference version is dated by virtue of being a part of that conference's proceedings: https://www.usenix.org/system/files/conference/osdi12/osdi12...

ddlatham · on Nov 19, 2015

I wish papers and pretty much all content had a date attached. For this paper, the max citation date is april 2011, so I'm guessing within a year after that.

gtrubetskoy · on Nov 19, 2015

The disk locality speed-up is increasingly not due to bandwidth, but latency (directly related to physical distance), which still matters.

To provide some perspective - in 1 cycle of a 1GHz CPU light (or electric potential) travels about 30cm. So round-trip communication with a disk (or whatever) that is 5cm away vs one that is one 1m away (could be same rack) would take 20 times longer in each direction. Now consider communicating across a football-field-sized datacenter.

wmf · on Nov 19, 2015

I don't think the numbers support this. Mechanical disk latency is ~10ms while datacenter network latency is <100us, so the network contributes 1% extra latency. NVMe over fabrics is worse off because the flash latency is only ~100us and the network adds ~10us, but that's still nowhere near 20x.

PantaloonFlames · on Nov 19, 2015

This is really great for people who build out datacenters, but a large portion of consumers of compute+data will lease that from providers like Amazon and Azure, which means they cannot enjoy the benefit of the Disk I/o ~= local network i/o equation. There's no guarantee that a parcel of EC2 nodes is in a single rack, or even in a single datacenter, and so network i/o is not comparable to disk i/o.

Am I mistaken?

wmf · on Nov 19, 2015

According to the docs, EBS goes up to 500 MB/s while a d2.8xlarge can read 3,500 MB/s from local disk so local is still faster (if you can actually use that much throughput).

andrewchambers · on Nov 19, 2015

You can easily make sure your machines in the same datacenter.

_3u10 · on Nov 19, 2015

It's not really irrelevant considering the paper outlines that it's important to keep data in the same rack.

Disk locality is not that important when you're fully async, as throughput tends to be similar, however, if your code is not fully async then you usually pay significant penalties in latency, and if multiple nodes are able to access the data then you pay a huge price in caching.

Upvoter33 · on Nov 19, 2015

The paper makes poor conclusions about SSDs/Flash - which are increasingly becoming the performance tier in the data center

mwilcox · on Nov 19, 2015

For disk, sure, but not for Flash / NVM

PantaloonFlames · on Nov 19, 2015

Did you read the paper? It discusses Flash / SSD and explains why SSD does not disrupt the conclusion.