In what sense is the dichotomy between CPU and GPU contrived? Those are designed around fundamentally different use cases. For low power devices you can get CPU and GPU integrated into a single SOC.
That's a good question. I wish I could answer it succinctly.
For me, the issue is that use cases and power usage are secondary to the fundamental science of computation. So it's fine to have matrix-processing stuff like OpenGL and TensorFlow, but those should be built on general-purpose hardware or else we end up with the cookie cutter solutions we have today. Want to run a giant artificial life simulation with genetic algorithms? Sorry, you can't do that on a GPU. And it turns out that most of the next-gen stuff I'm interested in just can't be done on a GPU.
There was a lot of progress on transputers and clusters (the old Beowulf cluster jokes) in the 80s and 90s. But researchers came up against memory latency issues (Amdahl's law) and began to abandon those approaches after video cards like the 3dfx Voodoo arrived around 1997.
But there are countless other ways to implement concurrency and parallelism. If you think of all the techniques as a galaxy, then GPUs are way out at the very end of one spiral arm. We've been out on that arm for 25 years. And while video games have gotten faster (at enormous personal effort by millions of people), we've missed out on the low hanging fruit that's possible on the other arms.
For example, code can be auto-parallelized without intrinsics. It can be statically analyzed to detect contexts which don't affect others, and the instructions in those local contexts could be internally spread over many cores. Like what happens in shaders.
But IMHO the greatest travesty of the modern era is that those innovations happened (poorly) in GPUs instead of CPUs. We should be able to go to the system menu and get info on our computer and see something like 1024+ cores running at 3 GHz. We should be able to use languages like Clojure and Erlang and Go and MATLAB and even C++ that auto-parallelize to that many cores. So embarrassingly parallel stuff like affine rasterization and blitters would run in a few cycles with ordinary for-loops instead of needing loops that are unrolled by hand or whatever other tedium that distracts developers from getting real work done. Like, why do we need a completely different paradigm for shaders outside of our usual C/C++/C# workflow, where we can't access system APIs or even the memory in our main code directly? That's nonsense.
And I don't say that lightly. My words are imperfect, but I do have a computer engineering degree. I know what I'm talking about, down to a very low level. Wherever I look, I just see so much unnecessary effort where humans tailor themselves to match the whims of the hardware, which is an anti-pattern at least as bad as repeating yourself. Unfortunately, the more I talk about this, the more I come off as some kind of crackpot as the world keeps rushing headlong out on the GPU spiral arm without knowing there's no there there at the end of it.
My point is that for all the progress in AI and rendering and simulation, we could have had that 20 years ago for a tiny fraction of the effort with more inspired architecture choices. The complexity and gatekeeping we see today are artifacts of those unfortunate decisions.
I dream of a day when we can devote a paltry few billion transistors on a small $100 CPU to 1000+ cores. Instead we have stuff like the Cerebras CS-2 with a trillion transistors for many thousands of dollars, which is cool and everything, but is ultimately gatekeeping that will keep today's Anakin from building C-3PO.