Most application developers won't worry about concurrent program design. Ever. More and more applications are becoming layered each year, and all of the hard work is done on the server side. Drawing an application's chrome takes little power in comparison.
Concurrent programming ends up in the data center, fussed over by the (relatively) small core of engineers and software developers. Everyone else just queries this data and makes it look good. Concurrently fetching data doesn't need a paradigm shift, just a good library. If you still need a few threads, you can use the same crummy techniques we've always used.
Cloud computing will alter Moore's Law on most devices. Devices may need twice as many transistors every 18 months, but the transistors are no longer on your desktop. They're mostly in some data center.
Someone has seen a vision of the future; and it is bright, shiny, orderly, and ... beautiful!!!
In my 30 years in and out of the software development world, I've seen _many_ visions of the future. In particular I've been reading about the death of the desktop application since, well, since before desktop applications were around. Anytime someone starts to tell me that the future is going to be X, then my response is, yes, the future may include X, but it will also include a bunch of old stuff, and a bunch of stuff that no one has foreseen. Entropy increases until a given system falls apart and is replaced by something better that works ... at least as well. Usually. (See Ted Nelson's vision of Xanadu, that which was supposed to _prevent_ the World Wide Web.)
And the duct tape and bailing wire holding it all together is ... wait for it ... faster processing speeds, more and better storage, faster networks, desktop apps, plug ins, scripts, prayer, and lots and lots of consulting fees. (edit: I forgot to add faith, hope and charity as well.)
Don't get me wrong, your vision _is_ beautiful. It's worth believing in and probably worth working toward. Some version of it will probably crawl, writhing noisily and messily, from the sea of change. Just don't bet the farm on a particular version of it.
Edit: I will also add this in direct response. My desktop computers are quickly _becoming_ my data center. I'm spending most of my face time with mobile devices: laptops, smart phone/pda music player, and, of course, my beloved beautiful iPad...
So I hereby create the new buzz term PDC. Personal Desktop Cloud. Bask in its glory and power.
It's not some silly vision of the future, but an oversimplified version of today! A brief and incomplete list of applications I use in a typical day:
Remote storage and/or processing: GMail, Google Docs, Weather.com, Reddit.com, Hacker News, Outlook web client, Google.com, DuckDuckGo, Delicious, Facebook, Github, tens of blogs/articles, online help documentation for, well, everything
Local storage and/or processing: Windows+Linux, Firefox, Chrome, Outlook, Visual Studio, Emacs, Python (or other dynamic languages), Acrobat, Amarok, random Unix utilities, various games
Most of my applications exist solely to present data stored elsewhere. I see no reason the trend won't continue: for instance, why would I compile C++ code on my machine when I can farm it out? Why would I store flat code files on my machine when I can have synthesized views of the code I need to see at one time?
Split up by time, most of my attention is spent manipulating or displaying data from somewhere else (or that could be stored somewhere else)
Even if all that is true, the server apps have to be written by somebody. They don't get written by themselves. It's not like whole program optimization of your C++ application in the cloud just magically happens. The same team that wrote the C++ app on your desktop is going to need to figure out to optimized across your application, and in some cases it is more difficult as they 'll be dealing with multithreading within a box and multiprocssing across boxes on the server. And then they have to work on optimizing data transfer, from one cloud to another (since presumably the cloud you build on isn't the cloud you debug or edit on).
The world just got more complex, not simpler for devs.
But my original point is that there's one dev team in the middle of this dealing with concurrency, and any number of remote applications that can use it through a library because someone else worried about the hard parts. There isn't a day of reckoning where developers as a group worry about efficient concurrent computation, it's the few guys in the center.
You are describing a project, not the world. We break the world up into projects so we can manage them. No one has be able to come up with a way to manage the world (so far.)
The next step in understanding "the cloud" is that it is actually "clouds." Some connected deeply, other's loosely, some held in jealous, secretive isolation. There are clouds within clouds, and some clouds are outside the light cones of other clouds.
I will describe a place known as "the pit." The pit has power wires going in, and that is only out of compelling necessity. (If they could make carrying in batteries work, they would.) Equipment, data and people go in, but only people come out. The pit is a crowded place. And there is much processing of the data; decisions are made, the world changes.
We all have our own personal pits. Or least we should.
Where is this one dev team that is dealing with concurrency? I don't see how you do this? At least not with the current state of the art in concurrency technology. They can provide some basic tools like concurrent collections, but the hard work is still app-dependent. I still need to figure out where concurrency makes sense. I still need to figure out when data races are OK and which ones are a problem. I still need to be the one to put locks in my code.
It's not like I can just push up the source code to Photoshop and say, "Make it concurrent now please".
A tangential question: why haven't chip designers decided to use the extra transistors for more complex operations? Call it Hyper-CISC, if you will. We've seen it to a certain extent with MMX extensions and the like, but how much could we speed up operations if we could take pieces of the chip, introduce redundancies for more elaborate instructions, something like the old VAXes?
Probably at least a couple of reasons:
1) You have to codegen for it. It can be more difficult to codegen for a more complex operation (presumably more work getting data to the right place, and then pulled back to the right places on output), and so is less likely to be implemented.
2) I suspect Intel has already looked at this and seen little if no perf benefit. I bet if you ask 100 companies for a great complex operation that would benefit their code you might get 100 different answers. Of course, there has been talk of using reconfigurable logic, since a given process may have a good idea of what would be useful for it.
It is happening slowly and incrementally. These changes seem architecturally dirty, but don't rock the boat too much and so are feasible to deliver on time and on budget.
Check out the population count and string compare/search instructions in Intel SSE4.2, and the upcoming Intel AVX which make SSE more useful by allowing more operands, more operator combinations, and wider vectors.
Today's GPU is a stream processor which is arguably less complex than a general purpose CPU. I think this is in the opposite direction of the spectrum to what the parent is suggesting.
I think this is not nearly as much of an issue as many people think it is. Most software hasn't been CPU bound for some time. The declining CPU performance improvements seems to be leading to more rapid advancement on the storage and memory fronts. Switching from a spinning disk to a ssd has resulted in the single largest performance improvement I've ever experienced in 20 years of computing.
Conversely, while multi-core computing doesn't deliver increased performance in the traditional way, I'd never want to work on a computer with a single processor again, no matter how fast it is. The whole system can become unresponsive from the load of a single process. Contrary to popular belief, many software applications do take advantage of concurrency, and it's not particularly hard to use concurrency to speed things up using multiple processes and messaging queues.
There is still room for plenty of improvements. I personally would rather buy a 4-core processor with more on-die cache, than an 8 or 16 core processor at this point. The major improvements I'm really looking for are faster and larger ssd drives; think equivalent to 4 256GB drives in a Raid 0 array with at least 60,000 IOPS, 1TB storage and 800MB/s read and write speeds. I also want more memory in my next workstation, say 24GB.
> The whole system can become unresponsive from the load of a single process.
Two cores is great if you have only one CPU-hogging thread, but as soon as a multithreaded app gets thrown in you have the same problem again -- more runnable threads than CPUs. If your dual core CPU doesn't "become unresponsive" when you start a 2+-threaded task then your single core problem is probably not processor-related.
On the other hand, my dual-core laptop regularly screeches to a halt when certain IO loads get involved, esp. vmware, due to poor filesystem design (ext3/ext4). I've switched to data=writeback, sacrificing some integrity for responsiveness, but ultimately I think new filesystems like btrfs will go a long way towards improving response times.
"Chip designers are under so much pressure to deliver ever-faster CPUs that they’ll risk changing the meaning of your program, and possibly break it, in order to make it run faster"
Already. The article mentions read, write ordering. That is essentially about in what order other processors see reads, writes being done by this processor: thus if procA does a bunch of writes will procB see them in the order (program order) in which they were done or in a different other order. There can be other interleavings when you introduce read into the mix.
Architectures like the late lamented Alpha (this is around late 90's) had fairly weak ordering requirements so you needed explicit memory barriers to tell the processor that you needed ordering preserved (think spinlocks for instance.) Mainstream processors (x86/x86-64) have always had strict write ordering but not read ordering. Let's see how long that lasts.
Concurrent programming ends up in the data center, fussed over by the (relatively) small core of engineers and software developers. Everyone else just queries this data and makes it look good. Concurrently fetching data doesn't need a paradigm shift, just a good library. If you still need a few threads, you can use the same crummy techniques we've always used.
Cloud computing will alter Moore's Law on most devices. Devices may need twice as many transistors every 18 months, but the transistors are no longer on your desktop. They're mostly in some data center.