I think figuring out the fastest version of a shader at runtime is very non-triv...

hansvm · 2025-02-09T20:01:14 1739131274

The last time I did anything like this (it was for CPU linear algebra code designed to run in very heterogeneous clusters), I first came up with a parameterization that approximated how I'd expect an algorithm to perform. Then, once for each hardware combination, you sweep through the possible parameterization space. I used log-scaled quantization to make it cheap to index into an array of function pointers based on input specifics.

The important thing to note is that you can do that computation just once, like when you install the game, and it isn't that slow. Your parameterization won't be perfect, but it's not bad to create routines that are much faster than any one implementation on nearly every architecture.

ijustlovemath · 2025-02-10T13:11:36 1739193096

you'd only have to test worst/median case scenes, which you could find with a bit of profiling!