Parallel Programming in the Age of Big Data

jaydub · on Nov 9, 2008

Professor Hellerstein essentially stresses the distributed nature of MapReduce:

" it works on 'shared-nothing' clusters of computers in a data center", "the MapReduce framework is a parallel dataflow system that works by partitioning data across machines"

To what extent does MapReduce leverage parallelism (on a single machine)?

Would "Distributed Programming in the Age of Big Data" be a more appropriate title?

neilc · on Nov 10, 2008

Leveraging parallelism on a single multi-core machine has a definite resemblance to leveraging parallelism in a cluster of machines: memory access costs are likely to be non-uniform, for example (the cores on chip X are "closer" to memory region A than the cores on chip Y, etc.)

This paper examines Map/Reduce performance on multi-core/SMP machines:

http://csl.stanford.edu/~christos/publications/2007.cmp_mapr...