As an example I can look at my own Sonic Visualiser application, largely written 15-18 years ago and entirely CPU-driven. Relative to then, it's now horrible on contemporary Macs for example - it feels far slower than it did a decade ago. It just isn't what the hardware expects.
(There may be an element of toolkit-platform impedance and simple poor design on my part - it uses Qt and feels quicker on other platforms - and I don't want to argue the details here, but I think the basic principle that you really want to avoid CPU in the frame update is sound. Preparing things on a non-time-critical path via CPU should be another matter however, there's quite a lot of capacity there.)
The “fast path” for macOS exists. I don’t know what is happening in Qt-land, but if you want to throw pixels at the screen really fast, you can do it through Core Animation. You can feed it a buffer of pixel data.
Good to know about. This was also my intuition, that there must still exist ways to blast pixel data quickly, but it's not the "happy path" in modern graphics API's.
Blasting pixel data from the CPU to the GPU is normally about as "happy path" as it gets. The entire architecture is designed to make that kind of operation super efficient and super fast.
It's the opposite direction (GPU to CPU) which is a pain in the ass. Still fast if you do it correctly, but it's easy to end up with a stall.
Unfortunately modern macOS renders everything at 5k and downsample, the rendering could do pretty significant stuff (like, extend bitdepth to 16-bit, color lookup to AppleRGB, trim bitdepth, with threads!) behind your back when using non-Metal, and it is buggier than before too. This stuff is way more expensive that CPU rendering itself.
As an example I can look at my own Sonic Visualiser application, largely written 15-18 years ago and entirely CPU-driven. Relative to then, it's now horrible on contemporary Macs for example - it feels far slower than it did a decade ago. It just isn't what the hardware expects.
(There may be an element of toolkit-platform impedance and simple poor design on my part - it uses Qt and feels quicker on other platforms - and I don't want to argue the details here, but I think the basic principle that you really want to avoid CPU in the frame update is sound. Preparing things on a non-time-critical path via CPU should be another matter however, there's quite a lot of capacity there.)