At any reasonable quality, AI is even more expensive than raytracing. A simple i...

toshinoriyagi · 2024-12-26T18:28:30 1735237710

While some very large models may need beefy hardware, there are multiple forms of deep learning used for similar purposes:

Nvidia's DLSS is a neural network that upscales images so that games may be rendered quickly at lower resolutions, and than upscaled to the display resolution in less total time than rendering natively at the display resolution.

Nvidia's DLDSR downscales a greater-than-native resolution image faster than typical downscaling algorithms used in DSR.

Nvidia's RTX HDR is a post-processing filter that takes an sRGB image and converts it to HDR.

So, it is very likely that a model that converts rasterized images to raytraced versions is possible, and fast. The most likely road block is the lack of a quality dataset for training such a model. Not all games have ray tracing, and even fewer have quality implementations.

jsheard · 2024-12-26T20:17:35 1735244255

To be clear DLSS is a very different beast than your typical AI upscaler, it uses the principle of temporal reuse where real samples from previous frames are combined with samples from the current frame in order to converge towards a higher resolution over time. It's not guessing new samples out of thin air, just guessing whether old samples are still usable, which is why DLSS is so fast and accurate compared to general purpose AI upscalers and why you can't use DLSS on images or videos.

jms55 · 2024-12-26T20:51:50 1735246310

To add to this, DLSS 2 functions exactly the same as a non-ML temporal upscaler does: it blends pixels from the previous frame with pixels from the current frame.

The ML part of DLSS is that the blend weights are determined by a neural net, rather than handwritten heuristics.

DLSS 1 _did_ try and and use neural networks to predict the new (upscaled) pixels outright, which went really poorly for a variety of reasons I don't feel like getting into, hence why they abandoned that approach.

mywittyname · 2024-12-26T20:17:03 1735244223

> So, it is very likely that a model that converts rasterized images to raytraced versions is possible, and fast.

How would this even work and not just be a DLSS derivative?

The magic of ray tracing is the ability to render light sources and reflections that are not in the scene. So where is the information coming from that the algorithm would use to place and draw the lights, shadows, reflections, etc?

I'm not asking to be snarky. I can usually "get there from here" when it comes to theoretical technology, but I can't work out how a raster image would contain enough data to allow for accurate ray tracing to be applied for objects whose effects are only included due to ray tracing.

8n4vidtmkvmk · 2024-12-26T18:33:45 1735238025

I'm not convinced. We have "hyper" and "lightning" diffusion models that run 1-4 steps and are pretty quick on consumer hardware. I really have no idea which would be quicker with some optimizations and hardware tailored for the use-case.

jsheard · 2024-12-26T18:39:50 1735238390

The hard part is keeping everything coherent over time in a dynamic scene with a dynamic camera. Hallucinating vaguely plausible lighting may be adequate for a still image, but not so much in a game if you hallucinate shadows or reflections of off-screen objects that aren't really there, or "forget" that off-screen objects exist, or invent light sources that make no sense in context.

The main benefit of raytracing in games is that it has accurate global knowledge of the scene beyond what's directly in front of the camera, as opposed to earlier approximations which tried to work with only what the camera sees. Img2img diffusion is the ultimate form of the latter approach in that it tries to infer everything from what the camera sees, and guesses the rest.

8n4vidtmkvmk · 2024-12-26T21:41:19 1735249279

Right, but I'm not actually suggesting we use diffusion. At least, not the same models we're using now. We need to incorporate a few sample rays at least so that it 'knows' what's actually off-screen, and then we just give it lots of training data of partially rendered images and fully rendered images so that it learns how to fill in the gaps. It shouldn't hallucinate very much if we do that. I don't know how to solve for temporal coherence though -- I guess we might want to train on videos instead of still images.

Also, that new Google paper where it generates entire games from a single image has up to 60 seconds of 'memory' I think they said, so I don't think the "forgetting" is actually that big of a problem since we can refresh the memory with a properly rendered image at least every that often.

I'm just spitballing here though, I think all of Unreal 5.4 or 5.5 has put this into practice already with their new lighting system.

jsheard · 2024-12-26T21:48:25 1735249705

> We need to incorporate a few sample rays at least so that it 'knows' what's actually off-screen, and then we just give it lots of training data of partially rendered images and fully rendered images so that it learns how to fill in the gaps.

That's already a thing, there's ML-driven denoisers which take a rough raytraced image and do their best to infer what the fully converged image would look like based on their training data. For example in the offline rendering world there's Nvidia's OptiX denoiser and Intel's OIDN, and in the realtime world there's Nvidia's DLSS Ray Reconstruction which uses an ML model to do both upscaling and denoising at the same time.

https://developer.nvidia.com/optix-denoiser

https://www.openimagedenoise.org

TuringTest · 2024-12-26T18:29:22 1735237762

Yeah but that has something to do with

1) commercial hardware pipelinea being improved for decades in handling 3D polygons, and

2) graphical AI models are trained on understanding natural language in addition to rendering.

I can imagine a new breed of specialized generative graphical AI that entirely skips language and is trained on stock 3D objects as input, which could potentially perform much better.