Scalable NFS, riiite.

aprdm · on Jan 21, 2019

If you have some time to read "how Google works" you would be surprised by how long the company ran on NFS. I assume there are lots of workloads running on Borg to this day on top of NFS. If that isn't enough for you you should have a look in the client list of Isilon and see which kind of work they do, in case you ever attend a SIGGRAPH most of what you see is built on top of NFS, so, essentially, all of the computer graphics you see in movies. At last job our NFS cluster did 300 000 IOPS with 82gb/s throughput

m0zg · on Jan 21, 2019

82gb/s (assuming you mean gigabit) is _per-node_ throughput at Google (or FB, or I assume Amazon/Microsoft -- they all use 100GbE networks now). 300K IOPS is probably per-node, too, at this point. :-)

kortilla · on Jan 21, 2019

Having a 100gbps nic in a node isn’t the same thing as doing storage at that speed in an HA cluster.

Also, don’t confuse 100 gbe networks where spine links are 100 but the node links are only bonded 10s (much more common at $fang).

m0zg · on Jan 21, 2019

Nope. It's all 100GbE throughout as far as I know. And people do work really hard to be able to saturate that bandwidth as it is by no means a trivial task to saturate it through the usual, naive means without the use of RDMA and Verbs. Years ago when I was there it was (IIRC) 40Gbps to each node straight up.

It's a necessity really. All storage at Google has been remote and distributed for at least the past decade. That puts serious demands on network throughput if you want your CPUs to actually do work and not just sit there and wait for data.

Here's some detail as of 2012: https://storage.googleapis.com/pub-tools-public-publication-.... Note that host speed is 40Gbps. And here's FB talking about migrating from 40Gbps to 100Gbps in 2016: https://code.fb.com/data-center-engineering/introducing-back...

dilyevsky · on Jan 21, 2019

Sorry I don’t have to read it because i was borg sre for 6 years and i know how (the server part of) it works. You assume wrong.

I know there are a lot of companies that try to put some lipstick on nfs pig and call it reliable/scalable/etc. so long their clients don’t actually try to run it at scale or don’t complain too publicly when they try and can’t, they are able to get away with it.

aprdm · on Jan 21, 2019

Your concept of what is scale looks very different than mine, in my experience NFS does a very good job for in-datacenter workloads. CG rendering, oil/gas and others usually take this approach for HPC as far as I've seen. I consider this "scale". Close to 100k procs sharing the nfs is the biggest cluster I've worked at.

Of course that over longer networks it isn't suitable as the roundtrips have too much latency, other than that, is your experience much different regarding nfs?

kortilla · on Jan 21, 2019

What you consider ‘scale’ is a high watermark used by cloud providers that is irrelevant to 99.999% of the industry.

Supporting all of a Fortune 500’s business operations is very reasonable to call ‘scale’ in the normal world.

Your comment is a like a billionaire claiming that somebody that managed to hit 30 million isn’t rich.

geggam · on Jan 21, 2019

worked at a company with 4 Petabytes on NFS ... FWIW