"Languages can force developers to think about applying logic over their data (and thus help with parallelism as a mindset), but they will never solve the problem of automated parallelism without the VM to tune it in the background."
Unfortunately, the evidence that anything at all can save us with automated parallelism, with or without VM assistance, isn't very good. Studies on using it on real code written without thought on parallelism always discover that "normal" code simply isn't parallelizable even in theory beyond single-digit factors.
Studies on using it on real code written without thought on parallelism always discover that "normal" code simply isn't parallelizable even in theory beyond single-digit factors.
If you assume "normal code" is traditional imperative code with destructive state update, that is probably correct (although I'd be curious to see which studies you're referring to). But there are important examples of domain-specific languages that are relatively simple to automatically parallelize. For example, mere mortals can easily write SQL that can be efficiently executed on thousands of machines (given a sufficiently large data set, of course) -- the database system takes care of data partitioning, replication, and fault tolerance automatically.
I've seen them on functional programming languages, too.
If you switch programming languages, in the context of this argument you've "already lost". The point of automatic parallelization is basically to give us speedups without having to radically change how we program. Changing languages, using lots of mini-DSLs, etc. all count as radical change as much as having to manually parallelize.
Personally I absolutely agree that the only way this is going to work is with new languages that make it easier, but at the same time it's never going to be fully automatic... and those new languages aren't going to look much like C(#/++/). Haskell is closer and even that probably isn't different enough.
Let's assume a sufficiently smart compiler can parallelize 90% of our hot code. Now considering Amdahl's Law, we cannot achieve a speedup greater than 10 and are therefore stuck. Adding 10 or more cores won't do any good to our running time, because it is dominated by the sequential code.
I have a hunch that the trick to writing the sufficiently smart compiler of legend is in deriving nondeterministic parallel cousins to sequential algorithms. In simple terms, if you could quickly compute potential solutions and verify them, then you could get a speedup in the average case. I doubt NC=P (that no problem is inherently sequential), but we could perhaps tackle many real-world problems with this line of thinking.
Most people have 64bit GHz CPU's, even if you just use one core that's lot of processing power. So, I don't think most code needs to be parallel until either multiple users, AI, or large data sets show up. And better yet, those are generally the three easiest cases to do multi-threaded code.
That said, there is a lot of slow and terrible code out there but dumping processing power on a terrible algorithm is rarely the best solution.
Unfortunately, the evidence that anything at all can save us with automated parallelism, with or without VM assistance, isn't very good. Studies on using it on real code written without thought on parallelism always discover that "normal" code simply isn't parallelizable even in theory beyond single-digit factors.
There Is No Escape.