Hacker News new | past | comments | ask | show | jobs | submit login

I wonder if someday GPUs will be so powerful that we will just program them with small parallelized software renderers. And I don't mean shaders or CUDA, because those are awkward and not easy to use.

Will be the case though, since I never really know how GPU are made and wired, and what their constraints are?

GPU often had a lot of "fixed" computing capabilities, meaning you have to do things in a certain way to exploit its performance, but with Vulkan being too complex, and chips getting more and more transistors, maybe a day will come where GPUs can finally become easier to program, at the expense of top performance, a bit like interpreted languages being slower to run but faster to write with. In the software industry, it was always better to use faster computers and write less than optimal programs, because chips are always cheaper than developers.

I guess most game developers would gladly want to use a GPU for a fraction of the performance, if it was just easier to program with. Modern GPU are so fast that it would feel okay to use 5 or 10% of the speed if it meant not having to deal with vulkan or shaders.




The complexity of Vulkan or shaders has very little to do with the fixed-function parts of the GPU. The complexities of Vulkan are around the realities of talking to a coprocessor over a relatively low-bandwidth, high-latency pipe. It's not unlike, say, gRPC or whatever other remote call API you want. So you end up building command queues so you can send a batch of work all at once instead of dozens or hundreds of small transmissions. And then need to deal with synchronization between those. And then memory management of all that.

Very little of that goes away as the GPU becomes more programmable / powerful. Rather it gets ever more complicated, as suddenly a texture isn't just a texture anymore. It's now a buffer. And buffers can be used for lots of things. This complexity really could only go away if the GPU & CPU merged into a single unit, which isn't entirely unlike Intel's failed Larrabee as a sibling comment mentioned.

As for shaders, those are just arbitrary programs. The complexity there is entirely the complexity of whatever your renderer does, compounded by the extreme parallelism of a GPU. So this complexity really never goes away.

For your core question, the problem is that fixed function hardware is just always faster & more efficient than a programmable one. So as long as games commonly do the same set of easily ASIC-able work (like render triangles), then you really won't ever see that fixed-function unit go away. But something like Unreal Engine 5's Nanite is kinda the productization of the idea of doing triangle rasterization in a "software" renderer instead of the fixed-function parts of the GPU: https://www.unrealengine.com/en-US/blog/understanding-nanite...


Does unified memory as in Apple’s M chips help with reducing the complexity?


Only a little. The bulk of the complexity for large data like textures isn't around the dma transfer, it's instead around things like ensuring data is properly aligned, that things like textures are swizzled if that format is even documented at all, and ensuring it's actually safe to read or write to the buffer (that is, that the GPU isn't still using it). Also in actually allocating memory in that a malloc/free isn't really provided, rather something like mmap is instead. So you want a (re)allocator on top of that.

And there's also the complexity of things like Vulkan want to work on both unified and non-unified systems.

Also unified doesn't necessarily mean coherent, so there's additional complexities there.


Yes, and Apple abstracts over it even on discrete GPUs for better or worse.

https://developer.apple.com/documentation/metal/setting_reso...


Wasn't this the idea behind Intel's Larrabee hardware?

Didn't succeed at the time. Maybe it'll happen one day. Or maybe not.

If Vulkan and co. are too difficult personally I suspect it's more fruitful to build better abstractions on top of the underlying constraints dictated by the need for massive parallelism, not trying to make the x86-style programming paradigms fast enough for graphics-type workloads.


I think your question is partly answered by the cudaraster work, which is well over a decade old at this point. They basically did write a software rasterizer (that ran on CUDA, but is adaptable to other GPU intermediate languages). The details are interesting, but the tl;dr is that it's approximately 2x slower than hardware. To me, that means you could build a GPU out of a fully general purpose parallel computer, but in practice the space is extremely competitive and nobody will leave that kind of performance on the table.

I think this also informs recent academic work on RISC-V based GPU work such as RV64X. These are mostly a lot of generic cores with just a little graphics hardware added. The results are not yet compelling compared with shipping GPUs, but I think it's a promising approach.

[1]: https://research.nvidia.com/publication/high-performance-sof...


Also there was the Cell processor used in the PlayStation 3. It was a kind of small CPU cluster on a chip, very different architecturally from GPUs.

Cell was originally supposed to act as both the CPU and GPU on the PS3, but that plan didn’t pan out in the actual hardware and Sony scrambled to include an Nvidia GPU.


CUDA seems like a fine language, although a standardized language would be better. The difficulties of CUDA come from the realities of the hardware, and other languages can't change that. I'd love to be wrong about this though, what specifically do you think could be improved about CUDA (not saying this as a challenge, just an invitation for more conversation)?


One of the reasons CUDA won over OpenCL is because it is a polyglot runtime, you are mixing concepts there.


I've heard Unreal Engine 5's Nanite tech described as a software rasterizer implemented in compute shaders. https://docs.unrealengine.com/5.0/en-US/RenderingFeatures/Na...


I don't see how shaders or CUDA are not what you talking about? You do have higher level languages like SAC or Futhark that can target GPUs but they essentially can do what CUDA can just with a different lick of paint.


You can already run C++ as shader language, and the future are some sort of mesh shaders.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: