Leveraging parallelism on a single multi-core machine has a definite resemblance to leveraging parallelism in a cluster of machines: memory access costs are likely to be non-uniform, for example (the cores on chip X are "closer" to memory region A than the cores on chip Y, etc.)
This paper examines Map/Reduce performance on multi-core/SMP machines:
This paper examines Map/Reduce performance on multi-core/SMP machines:
http://csl.stanford.edu/~christos/publications/2007.cmp_mapr...