Pure-CPU is side effect of XFree86 being essentially "lowest common denominator" platform (and even then some stuff was accelerated) - it comes from X.Org X11 releases being "base code" for vendors to built their own.
For comparison, SGI used a wildly different architecture underneath, which heavily impacted performance - it's also why "xsgi" would default to indexed colour and you'd use 24bit/32bit visuals only for some windows - the server would use indexed colour visuals on the actual framebuffer to reduce memory bandwidth and composite everything together in hardware.
It's also why there were "X11 Overlays" for GL - it meant you could easily implement parts of the GUI in X11 and still render on-top of direct-rendered visuals that bypassed X11.
Interesting! I wonder if there are any writeups around with more details on the SGI X architecture?
But I was really thinking about going further with all or nearly all of the X server executing on the GPU with only cpu shims for input peripherals, networking etc.
The closest to this idea was less X11 and more the never actually fulfilled to my knowledge promise of NeXTDimension color board for NeXT Cube - which was supposed to implement the entire graphics stack on the embedded CPU.
X11 could be implemented similarly - essentially sticking a complete standalone "X11 terminal" on an Add-In Card - the current GPU architectures aren't necessarily fitting for implementing it directly, but combine essentially an embedded OS running on extra CPU handling the protocol interactions and I/O then use a possibly more tailored interface to talk to GPU (compare with AMD's promise of HSA, or Xbox post 360 or PS4/PS5 architectures where GPU and CPU are on common memory).
NeWS-style engine (possibly with something simpler than full postscript) would work great with that, especially if you had properly shared interface libs on it.
We could run a OS on current GPUs I think, hardware-wise. And we know from history that compilers can make up for various hardware shortcomings on OS support side. Eg multi-task scheduling and concurrency can be done at the compiler level.