Hacker News new | past | comments | ask | show | jobs | submit login
TinyRenderer – how OpenGL works: software rendering in 500 lines of code (github.com/ssloy)
328 points by graderjs on March 15, 2022 | hide | past | favorite | 38 comments



More resources like this if you are interested:

If you want to understand how the GPU driver thinks under the hood read through https://fgiesen.wordpress.com/2011/07/09/a-trip-through-the-...

If you want to see the OpenGL state machine in action, check out https://webglfundamentals.org/webgl/lessons/resources/webgl-...


Well I hope you are happy. Just lost like 3 hours of my time, because I couldn't stop reading.


Another great series from fyg is https://fgiesen.wordpress.com/2013/02/17/optimizing-sw-occlu...

I'd not come across the state machine site before, it looks pretty useful.


Would you be able to point me in direction of understanding diff btwn GPU and CPU from ELI5 --> [whatever]


A CPU (generally) uses one powerful core, sequentially going through a large chunk of data, while a GPU has a large number of relatively weak cores (say, hundreds) operating in parallel on small chunks of data. For instance, if you wanted to do something with every pixel on the screen (~2 million), with a CPU, you would step through each pixel in a loop, one by one. With a GPU, you would run a small program (usually called a shader for historical reasons) for each pixel in parallel on all of its cores. Hopefully that helps.


Although the GPU vendor would like you to think they're cores, they're more like ALUs or SIMD units since they all run the same instructions.


Now we are moving way out of ELI5 territory but this is a simplification that I think hurts as much as it helps. It's not that a graphics card has thousands of ALUs restricted to always executing the same instruction.

It's more like you have a fairly high number of cores each of which consists of a large number (typically 32 or 64) of ALUs executing the same instructions in parallel.

This means that while you cannot execute a thousand different instructions in parallel you can in theory run somewhere between tens and hundreds of different instructions each across 32 or 64 different sets of input data.


A key trade-off is IPC per thread vs area. The CPU needs a high IPC per thread but by achieving this, the area goes disproportionately up and IPC / mm^2 goes down.

The GPU does not need high single-thread performance so it has lots of simpler and smaller cores, and as a result the total IPC goes way up.

And then of course the GPU has dedicated hardware for operations like rasterization and texture interpolation which the CPU lacks. On the other hand, the CPU needs to run an OS so each core has support for virtual memory, interrupts and other types of instructions that are either not needed on the GPU or at least do not need to be supported by each one of the little cores and are handled in a more global manner, making these cores even smaller.


CPU = Lots of branching and complicated operations, small number of cores GPU = Lots of cores, not much branching and simple operations


Shout out to Neon Helium Productions- https://nehe.gamedev.net/tutorial/lessons_01__05/22004/

Man, it was impossible to find a decent OpenGL tutorial in 2008.


I remember in a later on tutorial he writes part of it in asm randomly because it's "faster" but it's the same thing a compiler would output, or worse.

https://nehe.gamedev.net/tutorial/playing_avi_files_in_openg...

Of course, now everyone who does SIMD writes it with compiler intrinsics or some code wrapper that generates terrible asm which really is slower, but they think it's "cleaner". (Which on Intel it isn't, because their intrinsics are even less readable than their asm.)


I don't know exactly why but I always find SSE code to be really arduous to read.


Wow, that brings back memories.

I used to go through these, and other similar sites.. Which reminded me of Sulaco [1] :(

I remember loving his stuff, especially as I too was involved in quake modding and 3d engines at the time.

RIP.

1: http://www.sulaco.co.za/news_in_loving_memory_of_jan_horn.ht...


The tutorial was "completely rewritten January 2000", seems the first version was posted in 1997!


This brings back memories. I don't think I've looked at these since somewhere before 2010.


This is how I learned OpenGL too :)


I used and extended this tutorial to write the rayvertex [1][2] package, which is a deferred parallel software rasterizer in R. Explicitly building the rendering pipeline really helped me understand the underlying data flow and rasterization techniques, which in turn made me understand GLSL much better. And extending the rasterizer with order-independent transparency, multicore rendering, antialiasing, and other niceties made me appreciate the work that goes into modern renderers.

Highly recommend.

[1] https://www.rayvertex.com/ [2] https://github.com/tylermorganwall/rayvertex


I did one in C a while ago if anyone is interested: https://github.com/glouw/gel


This is great. The other side of "how to use OpenGL" is "what is it doing in there?" and this looks like an excellent explanation of that.


This github page brings back NeheGL vibes from ~20 years ago or so. very nice. thanks


I wonder if someday GPUs will be so powerful that we will just program them with small parallelized software renderers. And I don't mean shaders or CUDA, because those are awkward and not easy to use.

Will be the case though, since I never really know how GPU are made and wired, and what their constraints are?

GPU often had a lot of "fixed" computing capabilities, meaning you have to do things in a certain way to exploit its performance, but with Vulkan being too complex, and chips getting more and more transistors, maybe a day will come where GPUs can finally become easier to program, at the expense of top performance, a bit like interpreted languages being slower to run but faster to write with. In the software industry, it was always better to use faster computers and write less than optimal programs, because chips are always cheaper than developers.

I guess most game developers would gladly want to use a GPU for a fraction of the performance, if it was just easier to program with. Modern GPU are so fast that it would feel okay to use 5 or 10% of the speed if it meant not having to deal with vulkan or shaders.


The complexity of Vulkan or shaders has very little to do with the fixed-function parts of the GPU. The complexities of Vulkan are around the realities of talking to a coprocessor over a relatively low-bandwidth, high-latency pipe. It's not unlike, say, gRPC or whatever other remote call API you want. So you end up building command queues so you can send a batch of work all at once instead of dozens or hundreds of small transmissions. And then need to deal with synchronization between those. And then memory management of all that.

Very little of that goes away as the GPU becomes more programmable / powerful. Rather it gets ever more complicated, as suddenly a texture isn't just a texture anymore. It's now a buffer. And buffers can be used for lots of things. This complexity really could only go away if the GPU & CPU merged into a single unit, which isn't entirely unlike Intel's failed Larrabee as a sibling comment mentioned.

As for shaders, those are just arbitrary programs. The complexity there is entirely the complexity of whatever your renderer does, compounded by the extreme parallelism of a GPU. So this complexity really never goes away.

For your core question, the problem is that fixed function hardware is just always faster & more efficient than a programmable one. So as long as games commonly do the same set of easily ASIC-able work (like render triangles), then you really won't ever see that fixed-function unit go away. But something like Unreal Engine 5's Nanite is kinda the productization of the idea of doing triangle rasterization in a "software" renderer instead of the fixed-function parts of the GPU: https://www.unrealengine.com/en-US/blog/understanding-nanite...


Does unified memory as in Apple’s M chips help with reducing the complexity?


Only a little. The bulk of the complexity for large data like textures isn't around the dma transfer, it's instead around things like ensuring data is properly aligned, that things like textures are swizzled if that format is even documented at all, and ensuring it's actually safe to read or write to the buffer (that is, that the GPU isn't still using it). Also in actually allocating memory in that a malloc/free isn't really provided, rather something like mmap is instead. So you want a (re)allocator on top of that.

And there's also the complexity of things like Vulkan want to work on both unified and non-unified systems.

Also unified doesn't necessarily mean coherent, so there's additional complexities there.


Yes, and Apple abstracts over it even on discrete GPUs for better or worse.

https://developer.apple.com/documentation/metal/setting_reso...


Wasn't this the idea behind Intel's Larrabee hardware?

Didn't succeed at the time. Maybe it'll happen one day. Or maybe not.

If Vulkan and co. are too difficult personally I suspect it's more fruitful to build better abstractions on top of the underlying constraints dictated by the need for massive parallelism, not trying to make the x86-style programming paradigms fast enough for graphics-type workloads.


I think your question is partly answered by the cudaraster work, which is well over a decade old at this point. They basically did write a software rasterizer (that ran on CUDA, but is adaptable to other GPU intermediate languages). The details are interesting, but the tl;dr is that it's approximately 2x slower than hardware. To me, that means you could build a GPU out of a fully general purpose parallel computer, but in practice the space is extremely competitive and nobody will leave that kind of performance on the table.

I think this also informs recent academic work on RISC-V based GPU work such as RV64X. These are mostly a lot of generic cores with just a little graphics hardware added. The results are not yet compelling compared with shipping GPUs, but I think it's a promising approach.

[1]: https://research.nvidia.com/publication/high-performance-sof...


Also there was the Cell processor used in the PlayStation 3. It was a kind of small CPU cluster on a chip, very different architecturally from GPUs.

Cell was originally supposed to act as both the CPU and GPU on the PS3, but that plan didn’t pan out in the actual hardware and Sony scrambled to include an Nvidia GPU.


CUDA seems like a fine language, although a standardized language would be better. The difficulties of CUDA come from the realities of the hardware, and other languages can't change that. I'd love to be wrong about this though, what specifically do you think could be improved about CUDA (not saying this as a challenge, just an invitation for more conversation)?


One of the reasons CUDA won over OpenCL is because it is a polyglot runtime, you are mixing concepts there.


I've heard Unreal Engine 5's Nanite tech described as a software rasterizer implemented in compute shaders. https://docs.unrealengine.com/5.0/en-US/RenderingFeatures/Na...


I don't see how shaders or CUDA are not what you talking about? You do have higher level languages like SAC or Futhark that can target GPUs but they essentially can do what CUDA can just with a different lick of paint.


You can already run C++ as shader language, and the future are some sort of mesh shaders.


Very nice. After a quick skimming of the wiki's contents, I really like the contents and step-by-by approach of all the materials shared.

Does anyone here know of something of similar good-quality OpenGL resources[1] for mathematical/scientific visualization[2]. Currently, I've started going through the source codes and documentation of manim-web[3] and CindyJs[4] to learn the basics; however, I would love to learn the fundamentals and be able to write my own visualization library (or confidently be able to extend an existing one) if/when needed. Thanks in advance.

---

[1] - Personally I prefer books but I don't mind any other high-quality resources something similar to the one OP has shared here.

[2] - https://github.com/topics/scientific-visualization

[3] - https://github.com/manim-web/manim-web

[4] - https://cindyjs.org/gallery/main/


Quote from his Bresenham algo: "It is definitely inefficient (multiple divisions, and the like), but it is short and readable."

Well, this wasn't the case for all my Amiga 500 optimization efforts back then in the early 90th. Mix that with machine language and you simply hate yourself a couple of weeks/months/years/decades later, when you simply do not understand what you were doing at the time of writing the code, even though you commented it.

While I really enjoyed pushing the technical limits for fun and visual effects, too much trickery killed understanding and I am glad to use compilers nowadays. It is like plain text vs encrypted text going though my old source code. Very hard to read, very hard to understand.


Pretty cool. Did something similar for undergraduate to write implementations of openGL and create modeler and renderer. I got a bit stuck with the maths for UV mappings and ray tracing and would love to see some of this if ever I decide to rework my old project.


Very nice learning material, thanks for sharing.


Wow, this looks fantastic :) Very inspiring!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: