IIRC, that's maybe a 16x improvement (32x if you count 32->64 bit). Which accounts for less than half of the (orders of magnitude of) improvement we should have got from Moore's law.
(More cores aren't a performance improvement; if you were willing to deal with non-serial execution, you could have just bought 32 Pentium Fours; putting them all on the same chip is convenient (and cheap), but as a price/performance improvement, it's all price, no performance.)
> (More cores aren't a performance improvement; if you were willing to deal with non-serial execution, you could have just bought 32 Pentium Fours; putting them all on the same chip is convenient (and cheap), but as a price/performance improvement, it's all price, no performance.)
That's only true if you only consider ALU throughput for performance, but in terms of real world performance, where the interconnect between cores and memory is hugely significant, a multicore processor has many advantages over a rack of otherwise equivalent single-core NUMA nodes.
(More cores aren't a performance improvement; if you were willing to deal with non-serial execution, you could have just bought 32 Pentium Fours; putting them all on the same chip is convenient (and cheap), but as a price/performance improvement, it's all price, no performance.)