Moore's Law says absolutely nothing about performance, as this article repeatedly implies. Moore's Law says that the number of transistors in an integrated circuit would double every two years, and that has fairly consistently held true and continues to do so.
For years, those transistors went into increasing CPU speed. Now, additional transistors go into building more CPU cores, or more execution units. Either way, Moore's Law still holds.
This is technically true, but it happens that clock speeds and MIPs have also increased at a geometric rate over long periods of time. It's a bit harder to characterize because processor architecture changes have a discontinuous effect, but on average MIPs doubled every 36 months. [1]
The regular doubling of clock speed ended years ago, around the time of the Pentium 4. (The paper you cited appeared in 2002.) In any case, Moore's Law never said anything about clock speed; that just became a popular misinterpretation.
No it's not but it is what people have often taken as one of the main effects of Moore's law and it is the phenomena the implications of who's end the article is discussing. And those implications are important even if they aren't directly related to Moore's law.
> Moore's Law says that the number of transistors in an integrated circuit would double every two years, and that has fairly consistently held true and continues to do so.
This may be getting close to a wall, too. The smallest process currently viewed as being practical is in the region of 10nm; we're not that far off.
"But that’s pretty much it – we currently know of no other major ways to exploit Moore’s Law for compute performance, and once these veins are exhausted it will be largely mined out."
This is an awfully big post on Moore's Law not to include any mention of memristors*
The man is discussing paradigm shifts. While memristors will be pretty awesome if they ever come online in commercial production, I'm not aware of any major paradigm shifts they will cause (or prevent)...?
I've only a casual interest in the area, but these are some things I read around the time of the announcement:
"Memristive devices could change the standard paradigm of computing by enabling calculations to be performed in the chips where data is stored rather than in a specialized central processing unit. Thus, we anticipate the ability to make more compact and power-efficient computing systems well into the future, even after it is no longer possible to make transistors smaller via the traditional Moore’s Law approach."
– R. Stanley Williams, senior fellow and director, Information and Quantum Systems Lab, HP
"Since our brains are made of memristors, the flood gate is now open for commercialization of computers that would compute like human brains, which is totally different from the von Neumann architecture underpinning all digital computers."
– Leon Chua, professor, Electrical Engineering and Computer Sciences Department, University of California at Berkeley.
It just seems odd to go into such depth on transistor density and CPU/memory architectures (and potential future architectures) without mentioning memristors.
I agree that utilization of cloud resources will be an increasingly fundamental component of modern device architecture, but - at the risk of sounding hyperbolic - if memristors live up to the promise, we're talking about supercomputers the size of the human brain*
> Note that the word “smartphone” is already a major misnomer, because a pocket device that can run apps is not primarily a phone at all. It’s primarily a general-purpose personal computer that happens to have a couple of built-in radios for cell and WiFi service […]
I love this. It explains both why locking down your customers' iPhones is evil (assuming it is Wrong™ to sever freedom 0 from a computer), and why people accept it (somehow they act as if it is not a "real" computer).
Sigh. There is a huge elephant in this room, it was the beast of Christmas past. Specifically, for years Microsoft colluded with Intel to make systems which consumed more memory and CPU power such that an 'upgrade' cycle would be required.
Have you ever wondered how a machine which is less than 1/100th the speed of the machine on your desk computed the bills, interest, and statements for millions of credit card users. The IBM 370 that computed and printed those statements for MasterCard back in the late 70's had a whole lot of I/O channels.
I would love to see it get relevant again that one consider that the computer they were targeting would be roughly the same speed and that all of their 'features' had to be implemented with no loss of speed of the overall system. There is a lot of room for optimization, nobody has seriously attacked that problem yet, no one has needed to because people who optimized were left in the dust by people who could assume the next generation of machines would be fast enough to make bloated code good enough.
We haven't done anything though about Amdahl's law, and of course the thing that gives us parallelism is an interconnect between compute nexii (nexuses?). I was hoping there would be some insights along those lines in the article but I was disappointed.
These days, the collusion is between Intel/ARM and Mozilla / Google Chrome.
I've got a closet full of ancient hardware (well, a vast underground chamber, but we'll stick to PC hardware of the past decade or two), most of which should still run a Linux kernel and, with a threshhold of somewhere between 16 an 96 MB, bootstrap a Debian installation. For basic command-line utilities, they're fine, and even a few graphical programs.
But truth is that your computer is doing much, much more, at least in terms of system demands, than it did 10 or 20 years ago. This doesn't translate to prodigious amounts of end-user productivity necessarily, but it does suck cycles and RAM (and on handhelds: battery life). Which reminds me that a lot of the interesting stuff in terms of efficiency is happening in the mobile space. Form factors, weight, and battery size and weight are all highly constrained, so efficiency gains in both hardware and software are critically important.
Back to that desktop/laptop increasing complexity: I see this as having a fractal nature -- as hardware capabilities increase, there are multiple dimensions of increasing demand complexity, very much as you'd render a fractal and, for a huge increase in the amount of computation, see only a slight refinement of the pattern.
Systems of yesteryear did what they did by keeping the tasks simple, and extraneous processing to an absolute minimum. Ask anyone who's waited an afternoon for a few precious seconds of CPU on a highly overtasked mainframe what their opinion of that experience is. These days, to a fair approximation, throwing more hardware at a problem isn't a bad first approximation -- it's cheaper than programmer time, and, well, if it doesn't work, hardware is fungible (especially in the cloud): time for plan B, work smarter.
Specifically, for years Microsoft colluded with Intel to make systems which consumed more memory and CPU power such that an 'upgrade' cycle would be required.
It does not seem right. Recent Ubuntu releases will not perform better on older machines than Windows 7, and I seriously doubt that there's a pact between Canonical and Intel as well.
I think this is a large part of it. When you look at something like Compiz and you say "well that is nice and cool and all but its not really contributing to the work being done." Its eye candy that can be done because we have a GPU sitting there otherwise idle.
There is a story, probably apocryphal at this point, that a long time ago in what seems like a different universe, an engineer working on the Xerox Star system, was looking into why it was bogging down. It was reported there were nearly 800 subroutine calls between a key stroke and a letter was rendered on the screen. They managed to cut that number in half and the performance improved by a third. Mostly it was abstractions,
People always find uses for every available CPU cycle, as predicted, now we're entering a time when you will need to optimize something to get more cycles.
Windows Vista was a large jump in resource usage. At the time Microsoft was building it, it was expected that Intel would bring the Netburst architecture to at least 4 GHz in the near future. They didn't, and the next release of Windows is not so careless on resources. Coincidence?
I would ten to agree with Knuth's suggestion that multicore stuff and the "hardware jungle" is a symptom of a lack of imagination on the part of hardware designers.
I thought the comment was going to be, the software industry has a worth of 300 billion dollars and look and the astonishing state of software, or, it needs to try harder...
I'm sure the imagination is there, but why bother creating hardware if programmers won't write code for it? Multicore and multinode systems are by far the least imaginative way around Moore's law, and they're the only thing 99% of software engineers are willing to touch. Not only that, programmers are fighting tooth and nail to keep using the same old languages no matter what the hardware architecture, because we are still sick with nostalgia for our single-core utopia. Compiling C to run on FPGAs is a thing that is actually done in the real world, for commercial purposes, which just makes me sick.
In an ideal world, it would go like this. First, hardware engineers would come up with an idea for a programmable hardware architecture. Second, if the hardware worked well, language designers would figure out a language that was appropriate for the architecture. Third, if the language design step was successful, programmers would learn it and write programs for the new architecture.
If you know ahead of time that the third step will always fail because learning strange new languages is assumed to be beyond the capability of 99% of programmers, then the second step (designing the language) is by definition impossible, which means the first step (designing a new programmable architecture) is also impossible.
That's why hardware designers don't seem to have any imagination -- programmers have told them to stuff their imagination and concentrate on building machines that programmers already know how to program.
As a hardware engineer myself, I must agree. We have only explored a tiny subset of all possible configurations with our computer designs, held back, perhaps, by the need to be binary-compatible with a hardware/software architecture that was obsolete in the mid 80's.
The thing I find most alarming about hardware design is that functional simulation is reaching the end of its useful life because the majority of simulators are single-threaded. We're trying to keep up with Moore, and yet our simulation capacity has been plateaued for a few years now. And yet I hear very few ASIC designers and verification engineers complaining about it. We have formal tools, and FPGA prototyping, but by and large, most designs are verified and debugged under one of the big 3 simulators.
I think a lot of software guys would be astounded at how backwards the ASIC design community is. To say nothing of the monstrosity that it SystemVerilog.
Out of curiosity, how complex are these simulator tools? What sorts of physics and electrical engineering knowledge (in addition to CS knowledge for taking advantage of modern programming techniques) would one require to be able to write a minimum competing product?
I know someone who programs EDA tools and this is what I have observed:
You need to know how to make an IC out of lumps of metal, or at least know someone who does. If this were simple they wouldn't pay designers so well (sweeping generalisation based on a sample size of 1).
Your real problem, though, is the extremely conservative nature of your potential clients -- understandable when you consider the huge sums at stake in manufacturing, but frustrating nonetheless. If you can create a tool that works exactly like an existing one did in the olden days, then they might be interested. If you use "modern programming techniques" for goodness sake keep stumm about it.
To be clear, I'm specifically talking about Verilog simulators here. The EDA industry produces a wide range of tools for doing simulation, timing analysis, synthesis (converting a design to a gate-level netlist), DFT (automatic insertion of in-silicon testing facilities), layout, routing, analog design, and so on. You need a lot of tools to get a chip from your brain to a fab. A simulator for doing functional design is just one piece of the puzzle, but it's the one I'm most familiar with. I swear at one every day.
To write a digital logic simulator is actually not very difficult at all. I've implemented a sketch of a simple one myself; I'm sure there are some folks here who, if they understood some digital design principles, could write something that would put my humble attempt to shame.
But the industry doesn't want a simulator - it wants a Verilog simulator. That such a thing even exists is fundamentally wrong to my mind - a simulator should simulate the function described, not the implementation of said thing in a specific language. My own toy simulator was really a language-agnostic simulation library that you could just link to. Kind of like what SystemC is. But even polished up, I don't see there being any market for such a thing. Most designers don't think there's anything wrong with Verilog, because they don't know anything else, and most verification engineers are delighted to be using SystemVerilog because it makes us feel like we're on the cutting edge of technology for using a language that looks like a severely crippled Java, rather than one that looks like a regular crippled C. Many would not understand the previous sentence.
Ironically, Verilog doesn't really lend itself to parallelization very well. The language was not designed with that in mind, if it was "designed" at all. So single-threaded seems the way to go, and that space is already too crowded.
There are a couple of multi-threaded Verilog simulators out there, but they're not widely used, and I suspect the performance is not all that great, because of the semantics of the language.
There have been attempts to push new languages. One, called Bluespec, is a Haskell front end to SystemVerilog, but doesn't seem to get much traction, as far as I can tell. And even if it did, the demographics of our industry would be a problem. I'm not sure how many of us could handily pick up Haskell.
So it seems we're stuck with Verilog, and Verilog really is a problem. Hats off to Phil Moorby for coming up with a language that designers could write some code that looks a bit like C, and have it synthesize to a bunch of equivalent gates. It was a damn cool hack in the 1980s, when it came out, but it's really not a great language, and attempts to bolt on higher-level features on the side (SystemVerilog) just emphasize, in my mind, just how flawed the foundation really is.
Our little industry is becoming more and more like software every year, but our demographics are such that we just don't have the pool of really brilliant people driving innovation. I believe I'm generally considered to be a pretty good verification engineer, but I think if I were thrown into a modern software development team, I'd probably struggle to keep up (at least at first). As for the "leaders" in my industry, I know many of them personally, and while they're certainly very intelligent, not one could be compared to luminaries of the software industry. So even amongst our leadership, I sense very little concern that something is seriously wrong. Some get darn right defensive about it.
And worse, they seem to have a vested interest in maintaining the status quo.
What you've described is the sort of problem YC has been calling for. It's not a glorified to-do list, it's a real-world problem in a multi-billion dollar industry with complacent players.
It sounds like a nontrivial task to figure out:
1. how to build the specs for a digital logic simulator that would fit in with the typical EE workflow
2. what sorts of math, physics and engineering knowledge are necessary to build the tool
3. how to build the tool to be of actual value to existing circuit design firms
4. how to sell it to them (which involves #3, but is definitely not the same)
That can't really happen. FPGAs are just programmable ASICs.
(Also, I'm not sure if you're suggesting that ARM cores are by their nature field-programmable. They're not, they're just IP cores that can be integrated into larger designs).
I think sounds is saying that a runtime-reconfigurable processor could be the next step forward, adapting itself to whatever task it has to do. I don't see much promise for speed improvements from this, but it is a cool idea.
I am not talking about x86 vs. ARM. I am talking about computers that don't have CPUs connected through a bus to a single memory space.
Someone else in this discussion mentioned the idea of fusing memristor memory with processing engines within the memory itself. That's the embryo of a very interesting idea - imagine hardware assisted memory deduplication, just to start.
I'm skeptical that different configurations could provide exponential improvements down the road. Architecture will always be limited by the speed of the clock, and in lieu of flops, the speed of the gates.
(I am also skeptical that latches and/or stateless design will replace flops any time soon)
Only five or six years ago a good chip had two cores and a clock rate of perhaps 2.4 gigahertz. Somehow the chips have been redesigned or improved to allow much lower heat loss at clock rates previously considered unreachable without special hardware. (A somewhat recent Intel chip is able to reach 4.4 gigahertz on four cores with air cooling, and AMD has eight-core server chips that produce less heat than older two-core chips.) I do not know how it was done, but apparently the engineers have found ways to improve their designs. Now the chips seem to be more powerful per clock cycle while producing less heat and reaching higher clock rates.
Naturally there are always applications that can use more raw power, but the article focuses on those segments and I think that's misguided. The real growth powered by Moore's Law in the immediate future will be processors that are lower power and cheaper, not faster. The success of the iPad has proven that faster processors are not what the public is clamoring for right now. Mainstream tasks really hit diminishing returns after two cores. Not that processors won't get faster, just that the transistor budget won't get used to that end exclusively.
True, another factor is that we can now offload more stuff to "the cloud" which is especially useful on handhelds.
As we get better connectivity everywhere it might be that we actually see a regression in performance (or at least staying static) on many mobile devices in exchange for better battery life.
I'm pessimistic for the future of the cloud on mobile devices, because we'll reach a bandwith limit. As the frequencies fill up the pipes are going to get more sluggish over time, unlike the trends we see in other tech sectors.
Has anyone looked at lock-free data structures and algorithms in projects with large amounts of concurrency? The approach looks promising but it's unclear how practical it is.
I was thinking about the same thing. My view is: building applications on top of heterogeneous nodes (in the cloud) and then coupling them all together via ZeroMQ.
No, (http://en.wikipedia.org/wiki/Lock-free) it's using different data structures and algorithms and new-to-C++ (or the applicable assemblies), really low-level operations /to avoid locking at all/. (To avoid threads blocking, not the overhead of the locks themselves.)
"For over a decade prophets have voiced the contention that the organization of a single computer has reached its limits and that truly significant advances can be made only by interconnection of a multiplicity of computers in such a manner as to permit cooperative solution."
This now pithy statement was written by the famous Gene Amdahl in the year 1968; A time when computers ran at speeds that are dwarfed by today's digital clocks, but also it gives us insight into a time when people were still dealing with the same problems that we deal with today in developing faster and faster CPUs.
The truth of statement may be something that functionalism or parallelism advocates don't want to hear - the so called Parallelism Revolution will never come, at least not in it's current incarnation.
The end of serial advancement, and thus the parallelization revolution was "supposed" to happen in the 80s, and despite the considerable advances in methodologies of parallelization, it did not come. The 90s brought us standards and technologies like MPI which standardized procedures in developing cooperative computing solutions, but still, it did not come. The 2000s sought to simplify the very act of programming by reappropriating the ideas of programming back to the realm of pure mathematics - by representing programs as a mathematical description of time and work itself, with languages like Haskell and ML we sought to build machines which model math, and thus, the parallel nature of computation within universe itself.
I feel it myself, the sublime glitter of gold that is locked in the idea of parallel computation - It is irresistible for a curious individual. To feel as if all the power of the world is in your hands in this moment (as opposed to 20 years from now), to wipe away the frailty that underlies all of computation today; We all would like to be able to lift a trillion billion bytes into the heavens.
Theres only two problems.
The first problem lies squarely within our own human inadequacies, and it could be argued that this is where parallelism fails deepest. It is certainly true that parallelization is complex, but like all things, abstractions of complexity are nessasary, and designing the abstractions in such a way they are understandable to 'mere mortals' is a greatly undervalued aspect of technology today. So, I would posit as a result of insufficient desire to establish simplified abstractions of parallelization, to most programmers, ideas like parallelism remains in the domain of machine learning and condensed solids analysis - A kind of electronic black art, only used by those with sufficient training to know what horrors they may wrought upon the world if they're to make some trivial programming mistake. As a result (ceteris paribus!) serial power will always be valued greater than parallel computational capacity, which many have claimed to be the predominant driver of commercial development of scientific ideas.
The second problem is more controversial, but I think time will prove it so -- Computer have managed and will continue to manage getting faster at an alarming rate. Regardless of our preconceptions about the mechanics of computation, I believe it is reasonable to say that computers will continue to get faster at exponential rates, even after the so called quantum limits of computation come into play. This is reasonable for the same reason the Normal distribution manifests itself in disparate natural phenomenon - Central Limit Theorem. Sutter himself admits that people have been using the exact same logic to claim the beginning of the end for the past 60-70 years (Before 'real' computers even), I fail to see where he justifies his reasoning after giving this enlightened point.
Your argument would be a lot more compelling if computers were, you know, getting faster. The idea that they might someday stop getting faster is out of date, in the sense that they stopped getting faster at least five years ago, and that's being very conservative.
That people decades in the past were wrong doesn't do anything about the fact the people one decade in the past were right. Computers have already stopped getting faster at an "alarming rate", it's a past event, it's not speculation. They're still improving and there's still some room for improvement, but we've already fallen off the exponential curve and I don't anticipate getting back on it anytime soon.
I disagree, the period in which Dr. Amdahl lived experienced an even longer 'slump' in development than we have been today. But I'll get down to brass tacks - an i7 is about twice as fast as a Core 2 Duo.
Computer speed is a story of layers of bottlenecks. CPU speed is just one of those potential bottlenecks. As it turns out, CPU speed has not been a bottleneck for overall computer speed for some time (over a decade), the modern bottlenecks are GPU performance, memory size and bandwidth, permanent storage latency, and network latency.
All of those roadblocks have been dramatically lowered in the last 10, 5, and even 1 years. DDR3, improvements in memory access architecture (sandy bridge, ivy bridge, etc.), GPU improvements, and SSDs have all bumped up average computing performance, even while the clock speed of CPUs has stagnated.
Indeed. The trouble is, even besides the question of fabrication ("Can we make good 1nm transistors?"), processors have hit a wall. Processors have stayed at roughly the same frequency for the past... how many years?
Without a huge process breakthrough to kick up the frequency, the only ways to speed up processors besides parallelism are incremental optimizations and algorithmic improvements on the part of the architects, and I'm betting they don't have a bunch of 10-fold improvements hidden up their sleeves.
I'm not sure what you're talking about, because even for single-threaded applications, my current computer is a couple orders of magnitude faster than the computer I had 5 years ago.
No, it's not. Not in the same price range it's not. Not on the same tasks it's not. 100 times faster? You need to sell that beast for some real cash because you've got something nobody else does.
The only way that can be true is if you didn't realize your 2007 computer was continuously in swap.
Edit: Oh, sorry, read further down the thread, wherein your secret definition of "orders of magnitude" is revealed. Even then it's not true; the only place you're getting 4x speed improvements in single threading for the same price is either the very bottom of the market (maybe) or in certain benchmarks that carefully test only certain aspects of the single thread performance. It's certainly not across the board.
So your computer today is 100 times faster than the computer you had 5 years ago? That seems like a stretch. What two CPUs are you talking about exactly?
Hmm, I guess if you had an Intel Pentium III Mobile 750MHz (circa June 2000) five years ago:
Then if you got an Intel Core i7-2600K @ 3.40GHz that would score 100 times higher on the PassMark CPU Mark.
But there's a 10 year gap between the release of those CPUs. Also, the Core i7-2600K has 4 cores and 8 threads, while that Pentium III had only one core and one thread.
My computer is many times faster than the one I had 5 years ago (or 3 years ago); my processor/cpu is somewhat faster. Faster memory, faster GPU - these make a big difference for me at least.
A good point, but all it has done is upped the stakes.
Who knows whether single-threaded performance has reached a real barrier or not?
Will it be like the sound barrier? Or more like the speed of light? (I realize that the problem is, literally, the speed of light, among other quantum effects.)
> The truth of statement may be something that functionalism or parallelism advocates don't want to hear - the so called Parallelism Revolution will never come, at least not in it's current incarnation.
Do you want to bet everything on a possible advancement of technology, rather than invest in developing better parallelization paradigms? Even now, the state of the art is far beyond manually managing locks and so on.
What will drive performance is human ingenuity and innovation.
How can we tell if the slower rate of performance increase has more to do with decreased marginal utility, rather than physical limits.
Computers can now using lower power and laptops can run for up to 10 hours without a charge.
Rather than investing silicon in technologies that make things harder, perhaps we can improve performance by making them easier.
Maybe computers can be more garbage collection friendly, run high level languages at full speed etc. Perhaps the pendulum needs to swing towards Lisp machine type architectures.
Azul is basiclly a modern lisp (well java ahh) maschine but its kind of the same. There is an awesome discussion between Cliff Click from Azul and Dave Moon (one of the guys who worked on the lisp maschine). http://www.azulsystems.com/blog/cliff/2008-11-18-brief-conve...
Another direction we should go in is security, we should have a trusted computingbase.
Some awesome stuff done by DARPA: http://www.crash-safe.org/papers (one of the lisp maschine guys, is working there too)
This would allow developer to focus more on alorithems and speed. That all said we still have do deal with the multicore problem :)
Yes it is a very thought-provoking article, similar his "the free lunch is over" in 2005.
The upcoming jungle sounds like a great adventure. Many interesting challenges ahead. Though it's very, very hard to get rid of the sequential mindset, we'll really have to think in new ways.
People have been warning developers to 'get ready for multithreading' for at least a decade and somehow everything is mostly the same. Mostly because of abstractions (e.g. on GPUs) and also because the USERS of our software are ALSO getting parallelised! So we're back to one thread for one user, since most of the time you really don't want to do loads of work for one user request. Cases where parallelism matters (graphics, data stores, query engines) are already pretty solid on multithreading anyway.
Because it's really hard to let go of the "cosy" single-threaded, sequential model. I expect there will always be a place for it, as it gives important guarantees. Just like mainframes still exist (the article mentions this too - "different parts of even the same application naturally want to run on different kinds of cores"). Also, it may be that the free lunch is extended with graphene or other radically different semiconductor technology (except for quantum computing as it will also need a complete rethinking of software).
Heterogeneous, parallel computing exists in addition to the sequential model and won't replace it. I do expect cases where parallelism matters to grow as AI (voice recognition, human language recognition, driverless cars, etc) becomes more prevalent.
For years, those transistors went into increasing CPU speed. Now, additional transistors go into building more CPU cores, or more execution units. Either way, Moore's Law still holds.