I think your question is partly answered by the cudaraster work, which is well over a decade old at this point. They basically did write a software rasterizer (that ran on CUDA, but is adaptable to other GPU intermediate languages). The details are interesting, but the tl;dr is that it's approximately 2x slower than hardware. To me, that means you could build a GPU out of a fully general purpose parallel computer, but in practice the space is extremely competitive and nobody will leave that kind of performance on the table.
I think this also informs recent academic work on RISC-V based GPU work such as RV64X. These are mostly a lot of generic cores with just a little graphics hardware added. The results are not yet compelling compared with shipping GPUs, but I think it's a promising approach.
Also there was the Cell processor used in the PlayStation 3. It was a kind of small CPU cluster on a chip, very different architecturally from GPUs.
Cell was originally supposed to act as both the CPU and GPU on the PS3, but that plan didn’t pan out in the actual hardware and Sony scrambled to include an Nvidia GPU.
I think this also informs recent academic work on RISC-V based GPU work such as RV64X. These are mostly a lot of generic cores with just a little graphics hardware added. The results are not yet compelling compared with shipping GPUs, but I think it's a promising approach.
[1]: https://research.nvidia.com/publication/high-performance-sof...