Well, the 10 beefy boxes will be much, much faster if your problem is not very distributable. Say, Facebook as an application shards very easily, because most users don't interact much with each other. Other applications, might have much more interactions.
What you're really paying for when buying a 256 processor POWER7 box is the fact that the interconnect (and therefore the time to acquire a lock/update data from another node) is much faster and more reliable than commodity networks/kernels/stack.
Depends on what you are programming on. If its in a language far removed from the machine your mileage may vary.
I have had the opportunity to try out Google's implementation of mapreduce implemented in C++ way back in time (6 years ago). These would run on fairly impoverished processors, essentially laptop grade. Have done stuff on Yahoo's Hadoop setup as well, these used high end multicore machines provisioned with oodles of RAM (I dont think I should share more than that). If I were to be generous, Hadoop ran 4 times slower as measured by wall clock times. Not only that, Hadoop required about 4 times more memory for similar sized jobs. So you ended up requiring more RAM, running for longer and potentially burning more electricity. This is by no means a benchmark or anything like that, just an anecdote.
That Hadoop would require much more memory did not surprise me, that was expected. What was really surprising was that it was so much slower. JVM is one of the most well optimized virtual machines we have out there, but its view of the processor is very antiquated and it does not surface those hardware level advances to the programmer. You pay for a hot-rod machine but run it like an old faithful crown victoria.
Four times might not seem like much, for one thing I am being generous, and it makes a big difference when you can make multiple run through the data in a single day and make changes to the code/model. Debugging and ironing out issues is a lot more efficient.
I think Hadoop gave Google a significant competitive advantage over the rest, probably still does.
Interconnect may be faster but as a whole system it is hard to compete with the raw speed of an x64 box with all the latest/greatest chipsets. You usually wind up having to write non-portable code to eke full performance out of the massive box and in the end your apps will probably still be faster on x64. They're best suited for massive parallel computation that isn't afraid of getting down to the metal and taking advantage of lots of the special chip instructions in asm. (Or alternatively you want POWER specifically because it has hardware dfp support.) The total gain from running on x64 will most likely exceed any loss from a network hop in a case where both have to go off to SAN for their data.
What you're really paying for when buying a 256 processor POWER7 box is the fact that the interconnect (and therefore the time to acquire a lock/update data from another node) is much faster and more reliable than commodity networks/kernels/stack.