Very cool! I'm a bit curious about where the performance losses are. I just made a CPU path tracer for a graphics class and was able to render similar scenes in much less time. This is running on a GPU so it should be much faster, but isn't.
I've written some CUDA code and GPU assembly for unrelated projects, but it was designed to avoid warp divergence and I don't know you would go about doing that with a ray tracer.
It's slow because it's the wrong way to use a GPU. you're asking every pixel on the screen to consider the entire scene. It's fun to see beautiful images created by a few nested functions but if you want perf it's not how you make <insert favorite beautiful AAA game>.
I find this to be an issue. Inexperienced devs see the pretty pictures on shadertoy and get mislead as to what they're really learning. They're learning how to solve a puzzle "how do I draw pretty pictures given a single function who's only input is the position of the pixel". That can be great fun to challenge yourself with. It's similar to writing to write something in brainfuck or make an interesting dwitter doodle but it's not remotely a "best practice".
Don't overgeneralize, though. Yes, using the traditional pipeline is indeed way more efficient when you're creating traditional scenes.
However, this technique of doing everything in a fragment shader can actually make sense for some types of unusual/experimental scenes. Stuff like infinite repetition or fractal detail, smooth liquid-looking geometry with constantly morphing topology, detailed translucent volume rendering. These can be considerably easier to create as a fragment shader.
So, while some people are indeed doing this as an esoteric programming exercise, others also realize that this technique can allow the exploration of different types of realtime generated images.
Heya ladberg:
* Were you doing direct light sampling (next event estimation)? If so that would explain it. It's the most naive path tracing you can imagine, so it's easier to understand.
* Or, it might be how shadertoy is vsync limited to 60 fps which means each pixel only gets 60 samples a second, when it could be getting ~64x as much before i see any slowdown.
I've written some CUDA code and GPU assembly for unrelated projects, but it was designed to avoid warp divergence and I don't know you would go about doing that with a ray tracer.