Amdahl's law. The von Neumann bottleneck continues to get bigger and bigger with each iteration. We can do far more calculations in the same amount of time on a chip, due to the increasing number of transistors we can fit on them, and also improvements in the internal designs (superscalar architectures, etc). The speed of memory access however, has only been increasing at marginal rates in comparison, and now we have increasing contention for execution units in the processor to access it as we increase the number of distinct cores. Caching, prefetching, etc. have partially helped to mitigate some of the problems, but only as far as you can avoid cache misses (common in object oriented code, due to frequent pointer dereferencing). We're still at the stage, with all these technologies, that the VNB is what's preventing us from doing more in the same amount of time. Writing high-performance code is also difficult as often the hardware specializations are inaccessible to a programmer, and he has to rely on the compiler/microassembler to do the right thing. Often this means designing code in ways to suit the CPU (eg, vectorizing all your data), rather than to suit the programmers (eg, OOP).
We should really be aiming towards NUMA, many-core processors with high-speed message passing between cores, and a programming model which suits development over that kind of architecture (eg, the Actor model).
We should really be aiming towards NUMA, many-core processors with high-speed message passing between cores, and a programming model which suits development over that kind of architecture (eg, the Actor model).