GPUs have been the real source of gaming performance for PCs since the nineties, and this is true even for arm-based smartphones and tablets, which have had their own dedicated graphics chip since the first iPhone.
Even the puny Raspberry Pi (ARM v6 at 700Mhz!) is able to stream video in FullHD thanks to its dedicated GPU.
A slow CPU can still be a bottleneck if paired with an high-end GPU, but in general a cheap CPU with an expensive GPU is a much better set-up for gaming than the other way around.
(Of course there are exceptions, but this is generally true for your average AAA title).
The RasPi is a particularly unusual example here; the BCM2835 has a tiny ARM1176JZF-S taking a piggyback ride on a comparatively huge (and, despite some releases from Broadcom, still only very lightly documented) 2D vector processor, the Broadcom VideoCore IV, and it's the VC4 which actually runs the show (the proprietary firmware uses the commercial ThreadX microkernel, but fully open firmware is being developed) and boots it - it's actually a SoC made out of a GPU, and the ARM is broadly-speaking a coprocessor, posting requests to the VC4's kernel to please do stuff.
The video encoder/decoder has a little fixed-function logic, but the VC4 is rather good at vector operations itself, particularly 2D 16x16 and 32x32 ones, and probably has at least as much general compute muscle as the ARM, I'd say? It is not easy to program efficiently, however, and its pipeline doesn't seem to like branches very much. And trying to do ChaCha20 on it is tricky because I can't seem to find a way to address the diagonals...
That's not true anymore. Most of the AI "thinking" can be multithreaded, physics can be multithreaded, rendering can be multithreaded, even core systems like loading of ressources is heavily multithreaded. It's not easy, but it's a reality for most gamedev now. However, there is a limit of how much can be multithreaded and how well it'll scale. There's a lot of inter dependencies between objects and systems that force some level of serialization, just like any other multithreaded application.
if you can find a game developer who cares enough to split their game's logic down more than just "render thread" and "logic thread", then maybe an 8core would be useful
Actually the problem isn't developer laziness, it's just common sense. Most games are GPU and/or bandwidth bound, and cpus don't factor in beyond a certain point. Furthermore, if you wake up too many cores, intel cpus slow down so if you do have a monolithic render thread, threading everything else is counterproductive.
DICE is one such developer. BF4 scales pretty well across cores, and so should any game that uses that engine unless crippled artificially. There are other developers as well but most make console only games (Killzone, Uncharted, etc... are all heavily multithreaded).
Seriously though, what are you talking about?