MPI? I thought that went the way of CORBA. Is there any reason to use MPI over, ...

jandrewrogers · on Sept 10, 2012

MPI is optimized for efficient, low-latency communication on reliable networks. ZeroMQ is optimized for high throughput on unreliable networks but sacrifices latency to do it. Most parallel algorithms assume reliable networks and value minimizing latency more than maximizing throughput.

As for why no Thrift, ProtoBuf, JSON, etc, it is because those are slow serialization formats that exist primarily for portability and compactness and not for computational efficiency or low latency. You would lose a lot of performance using those for many parallel codes. I wrote an implementation of the ProtoBuf codecs that use faster algorithms than the ones in Google's implementation but raw memory copies are still much faster.

MPI is not that useful for distributed systems because of its "reliable network" assumption. It is not designed to handle network failures gracefully. ZeroMQ is designed to handle network failures gracefully but is not designed to support parallel applications, only weakly coupled systems. There is a middle ground of tightly coupled, fault tolerant systems that matches many kinds of application use cases but there is no off-the-shelf framework for that (and it would necessarily be more complex than either of the other two cases).

ajdecon · on Sept 10, 2012

MPI shines mostly in the context of tightly-coupled applications operating on multiple nodes with a fast, reliable network. Most implementations are well-optimized for operating on extremely high-bandwidth low-latency networks like Infiniband or Cray's Gemini. They can also use remote direct memory access (RDMA) to do cool stuff like let CUDA applications directly access GPU memory accross the network (GPUDirect).

Scientific applications which involve a lot of interprocess communication -- for example, large CFD simulations -- can get a lot of benefits, especially since the popular MPI implementations are highly tuned for those types of apps. Also there's a ridiculous amount of legacy code, which leads to the usual lock-in effects.

edit: clarification

scott_s · on Sept 10, 2012

I agree with ajdecon, but I want to add: the highest performing MPI codes scale to hundreds-of-thousands of processing cores. It is used for high performance computing where application writers want to squeeze as much performance as possible from very large, very expensive machines. Technologies such as ZeroMQ and formats such as JSON were not designed for such environments.

rglullis · on Sept 10, 2012

How are the bindings for Thrift/ProtoBuf/JSON in FORTRAN 77, which still runs the good majority of scientific applications that require large-scale clusters?

rabidsnail · on Sept 10, 2012

That's fair. It's not very useful to me, but if you work at JPL I can see why you would still use it.

ryanmolden · on Sept 10, 2012

Cray also uses MPI afaik, so if you are using big-iron you may see it, this would include JPL as well as pretty much all the three-letter agencies under the DOD umbrella I suspect.