Funnily enough, this is sort of what the NVIDIA drivers do: they intercept game ...

Cieric · 2025-02-09T15:31:43 1739115103

I don't work on the nvidia side of things but it's likely to be the same. Shader replacement is only one of a whole host of things we can do to make games run faster. It's actually kind of rare for use to do them since it boats the size of the driver so much. A lot of our options do change how shaders work though, like forcing a shader to use double precision floats instead of the single it was compiled with.

chrisjj · 2025-02-09T16:36:14 1739118974

> > A lot of our options do change how shaders work though, like forcing a shader to use double precision floats instead of the single it was compiled with.

That will break code sufficienly reliant on the behaviour of sungle precision, though.

Cieric · 2025-02-10T00:43:18 1739148198

In the case that does happen, then we don't apply that setting. Most of the changes applied are extensively tested and toggles like that are more often used for already broken shaders.

chrisjj · 2025-02-11T21:43:00 1739310180

> In the case that does happen

You meant "in the case we find where that does happen"?

Little consolation for the cases you don't find because e.g. you cannot afford to play a game to the end.

Cieric · 2025-02-13T15:50:25 1739461825

Fair enough, I can't say anything I've done has ever caused an issue like that (A new ticket would have been made and sent to me.) But I also can't say that it has never happened, so I'm not really in a position to disagree. We do have a good QA team though and we have an "open" beta program that also catches a lot of issues before they become more widely public.

chrisjj · 2025-02-13T18:59:55 1739473195

I for one would be most unhappy to find my game failing on card X due to drivers perverting my float type.

david-gpu · 2025-02-09T19:35:16 1739129716

Obviously, which is the reason you don't do something like that without appropriate amounts of testing.

chrisjj · 2025-02-10T00:16:18 1739146578

Testing on platforms that perverts the precision is outside appropriate, I would say.

Dylan16807 · 2025-02-09T23:18:14 1739143094

Code depending on specific low precision is very likely to be so fragile it won't make it anywhere near release.

account42 · 2025-02-10T10:09:18 1739182158

"Fragile shader code won't make it anywhwere near release" is a pretty bold claim.

If it happens to work with the Nvidia driver it's getting shipped.

Dylan16807 · 2025-02-10T17:24:05 1739208245

But since the premise is that any difference breaks the code, will it work the exact same way across different devices? That's what I'm skeptical of.

Unless every dev and tester that touches this shader is using the same hardware, which seems like an obvious mistake to avoid...

Cieric · 2025-02-10T18:39:36 1739212776

I will note, half of the customer facing bugs I get are "works on nvidia." Only to find out that it is a problem with the game and not the driver. Nvidia allows you to ignore a lot of the spec and it causes game devs to miss a lot of obvious bugs. A few examples:

1) Nvidia allows you to write to read only textures, game devs will forget to transition them to writable and will appear as corruption on other cards.

2) Nvidia automatically work with diverging texture reads, so devs will forget to mark them as a nonuniform resource index, which shows up as corruption on other cards.

3) Floating point calculations aren't IEEE compliant, one bug I fixed was x/width*width != x, On Nvidia this ends up a little higher and on our cards a little lower. The game this happened on ended up flooring that value and doing a texture read, which as you can guess, showed up as corruption on our cards.

1 and 2 are specifically required by the microsoft directx 12 spec, but most game devs aren't reading that and bugs creep in. 3 is a difference in how the ALU is designed, our cards being a little closer to IEEE compliant. A lot of these issue are related to how the hardware works, so stays pretty consistent between the different gpus of a manufacturer.

Side note: I don't blame the devs for #3, the corruption was super minor and the full calculation was spread across multiple functions (assumed by reading the dxil). The only reason it sticks out in my brain though is because the game devs were legally unable to ever update the game again, so I had to fix it driver side. That game was also Nvidia sponsored, so it's likely our cards weren't tested till very late into the development. (I got the ticket a week before the game was to release.) That is all I'm willing to say on that, I don't want to get myself in trouble.

Cieric · 2025-02-11T16:27:17 1739291237

> Floating point calculations aren't IEEE compliant

To late to edit, but I want to half retract this statement, they are IEEE compliant, but due to optimizations that can be applied by the driver developers they aren't guaranteed to be. This is assuming that the accuracy of a multiply and divide are specified in the IEEE floating point spec, I'm seeing hints that it is, but I can't find anything concrete.

chrisjj · 2025-02-11T21:48:13 1739310493

> the game devs were legally unable to ever update the game again, so I had to fix it driver side.

Could they not intercept the calls to inject a fix?

Cieric · 2025-02-13T15:52:45 1739461965

I'm just going off what I was told there, I was forced to make the fix since the game developers no longer were partnered to the company that owned the license to the content.

chrisjj · 2025-02-11T22:30:01 1739313001

PS Why is /width*width not eliminated on compilation?

Cieric · 2025-02-13T15:56:35 1739462195

Good question, I'm assuming it's due to the calculation happen across a memory barrier of some kind or due to all the branches in between so llvm is probably avoiding the optimization. It was quite a while ago so it is something I could re-investigate and actually try and fix. I would have to wait for downtime with all the other tickets I'm getting though. It's also just something that dxc itself should be doing, but I have no control over that.

chrisjj · 2025-02-11T21:45:30 1739310330

> But since the premise is that any difference breaks the code,

Not any.

> will it work the exact same way across different devices

Yes, where they run IEEE FP format.

Dylan16807 · 2025-02-12T00:43:58 1739321038

> Not any.

Just about any. It's pretty difficult to write code where changing the rounding of the last couple bits breaks it (as happens if you use wider types during the calculation), but other changes don't break it.

What real code have you seen with that behavior?

chrisjj · 2025-02-14T13:31:18 1739539878

> but other changes don't break it.

Why should that be a requirement? Obviously the driver can make other changes that can break any code.

Dylan16807 · 2025-02-14T19:20:49 1739560849

> Why should that be a requirement?

Originally I said "the premise is that any difference breaks the code".

You replied with "Not any."

That is where the requirement comes from, your own words. This is your scenario, and you said not all differences would break the hypothetical code.

This is your choice. Are we talking about code where any change breaks it (like a seeded/reproducible RNG), or are we talking about code where there are minor changes that don't break it but using extra precision breaks it? (I expect this category to be super duper rare)

chrisjj · 2025-02-16T11:31:14 1739705474

> Originally I said "the premise is that any difference breaks the code".

> You replied with "Not any."

> That is where the requirement comes from, your own words.

I wasn't stating a requirement. I was disputing your report of the premise.

> or are we talking about code where there are minor changes that don't break it but using extra precision breaks it?

Yup.

Dylan16807 · 2025-02-19T03:04:40 1739934280

> > or are we talking about code where there are minor changes that don't break it but using extra precision breaks it?

> Yup

Then I will ask again, have you ever seen code that falls into this category? I haven't. And an RNG would not fall into this category.

chrisjj · 2025-02-10T00:14:39 1739146479

Consider an RNG.

Dylan16807 · 2025-02-10T01:32:24 1739151144

I consider a floating point RNG in a shader that is seeded for reproducibility to be a bad idea.

chrisjj · 2025-02-11T21:50:59 1739310659

Why, pray?

Dylan16807 · 2025-02-12T00:50:22 1739321422

GPUs often have weird sizes for optimal behavior and you have to depend on math-based optimizations never being applied to your code.

In my opinion, floating point shaders should be treated as a land of approximations.

You asked in another comment why /width*width isn't optimized out by the compiler. But it's changes just like that that will break an RNG!

chrisjj · 2025-02-14T13:41:39 1739540499

> In my opinion, floating point shaders should be treated as a land of approximations.

Fine, but that leaves you responsible for the breakage the shader of an author that holds the opposite opinion, as he is entitled to do. Precision =/= accuracy.

Dylan16807 · 2025-02-14T19:10:40 1739560240

But do you want /width*width to be optimized? Or associative and commutative operations in general? Then you have to reject the opposite opinion.

chrisjj · 2025-02-16T21:13:57 1739740437

> But do you want /width*width to be optimized?

I do. But I don't see such optimisation as anything to with your changing float type.

Dylan16807 · 2025-02-19T03:08:36 1739934516

Changes like that will break just as much code as adding extra precision will. Because it will change how things round, and not much else, just like adding extra precision. They're both slightly disruptive, and they tend to disrupt the same kind of thing, unlike removing precision which is very disruptive all over.

SpaghettiCthulu · 2025-02-09T16:11:21 1739117481

> A lot of our options do change how shaders work though, like forcing a shader to use double precision floats instead of the single it was compiled with.

What benefit would that give? Is double precision faster than single on modern hardware?

Cieric · 2025-02-10T00:41:15 1739148075

That's specifically because gpus aren't IEEE compliant, and calculations will drift differently on different gpus. Double precision can help avoid divide by zero errors in some shaders because most don't guard against that and NANs propagate easily and show up as visual corruption.

dahart · 2025-02-10T19:11:02 1739214662

What GPUs aren’t IEEE compliant? I don’t think that’s been true for more than a decade.

Cieric · 2025-02-11T16:23:43 1739291023

After a bunch of testing and looking around I think I should actually change my statement. GPUs do offer IEEE floating point compliance by default, but don't strictly adhere to it. Multiple optimizations that can be applied by the driver developers can massively effect the floating point accuracy.

This is all kind of on the assumption that the accuracy of floating point multiplication and division is in the IEEE spec, I was told before that it was but searching now I can't seem to find it one way or the other.

I believe one of the optimizations done by nvidia is to drop f32 variables down to f16 in a shader. Which would technically break the accuracy requirement (as before if it exists). I don't have anything I can offer as proof of that due to NDA sadly though. I will note that most of my testing and work is done in PIX for Windows, and most don't have anti-cheat so they're easy to capture.

dahart · 2025-02-12T01:27:48 1739323668

What shaders (presumably GLSL & HLSL) do precision wise isn’t an IEEE compliance issue, it’s either a DX/Vulkan spec issue, OR a user compiler settings issue. Dropping compliance is and should be allowed when the code asks for it. This is why GLSL has lowp, mediump, and highp settings. I think all GPUs are IEEE compliant and have been for a long time.

Cieric · 2025-02-13T16:10:10 1739463010

I agree on the dropping compliance when asked for aspect, the problem I'm referring to more is the driver dropping compliance without the game asking for it. If the underlying system can randomly drop compliance when ever it thinks it's fine without telling the user and without the user asking, I would not consider that compliant.

dahart · 2025-02-14T15:24:04 1739546644

That is fair, if true. But I’m very skeptical any modern drivers are dropping compliance without being asked. The one possibility I could buy is that you ran into a case of someone having dropped precision intentionally in a specific game in order to “optimize” a shader. Otherwise, precision and IEEE compliance is the prerogative of the compiler. The compiler is sometimes in the driver, but it never decides on it’s own what precision to use, it uses either default or explicit precision settings. The only reason it would not produce IEEE compliant code is if it was being asked to.

dahart · 2025-02-09T23:46:16 1739144776

Only more precision. But no, doubles are not faster. At best they’re the same instruction latency & throughput as singles, and that’s only on a few expensive pro/datacenter GPUs. Even if they are technically the same instruction speed, they’re still 2x the memory & register usage, which can compromise perf in other ways. Doubles on consumer GPUs are typically anywhere from 16 to 64 times slower than singles.

FWIW, I’ve never heard of shader replacement to force doubles. It’d be interesting to hear when that’s been used and why, and surprising to me if it was ever done for a popular game.

kevingadd · 2025-02-09T17:23:03 1739121783

I would guess it's done for compatibility or quality reasons

crazygringo · 2025-02-09T15:04:51 1739113491

Wow, how did they pick which games to optimize?

Does the studio pay them to do it? Because Nvidia wouldn't care otherwise?

Does Nvidia do it unasked, for competitive reasons? To maximize how much faster their GPU's perform than competitors' on the same games? And therefore decide purely by game popularity?

Or is it some kinda of alliance thing between Nvidia and studios, in exchange for something like the studios optimizing for Nvidia in the first place, to further benefit Nvidia's competitive lead?

Cieric · 2025-02-09T15:56:55 1739116615

I can't give details on how we do our selections (not nvidia but another gpu manufacturer). But we do have direct contacts into a lot of studios and we do try and help them fix their game if possible before ever putting something in the driver to fix it. Studios don't pay us, it's mutually benefital for us to improve the performance of the games. It also help the game run better on our cards by avoiding some of the really slow stuff.

In general if our logo is in the game, we helped them by actually writing code for them, if it's not then we might have only given them directions on how to fix issues in their game or put something in the driver to tweak how things execute. From an outside perspective (but still inside on the gpu space) nvidia does give advice to keep their competitive advantage. In my experience so far ignoring barriers that are needed as per the spec, defaulting to massive numbers when the gpu isn't known ("batman and tessellation" should be enough to find that), and then doing out right weird stuff that doesn't look like something any sane person would do in writing shaders (I have a thought in my head for that one, but it's not considered public knowledge. )

flohofwoe · 2025-02-09T15:17:45 1739114265

AFAIK NVIDIA and AMD do this unasked for popular game releases because it gives them a competitive advantage if 'popular game X' runs better on NVIDIA than AMD (and vice versa). If you're an AAA studio you typically also have a 'technical liason' at the GPU vendors though.

It's basically an arms race. This is also the reason why graphics drivers for Windows are so frigging big (also AFAIK).

diegoperini · 2025-02-09T15:23:38 1739114618

Double "AFAIK" makes me trust you more :)

I think this is very accurate. The exception is probably those block buster games. Those probably get direct consultancy from NVIDIA during the development to make them NVIDIA-ready from day 1.

account42 · 2025-02-10T10:15:57 1739182557

Even smaller games often get some amount of consultation from Nvidia/AMD - they are typically in the credits in some form.

esperent · 2025-02-09T14:26:29 1739111189

I'd love to read more about this, what kind of changes they make and how many games they do it for. Do they ever release technical articles about it?

sigmoid10 · 2025-02-09T14:56:05 1739112965

The other commenter makes it sound a bit more crazy than it is. "Intercept shaders" sounds super hacky, but in reality, games simply don't ship with compiled shaders. Instead they are compiled by your driver for your exact hardware. Naturally that allows the compiler to perform more or less aggressive optimisations, similar to how you might be able to optimise CPU programs by shipping C code and only compiling everything on the target machine once you know the exact feature sets.

flohofwoe · 2025-02-09T15:19:36 1739114376

Graphics drivers on Windows definitely do plenty of 'optimizations' for specific game executables, from replacing entire shaders to 'massaging/fixing' 3D-API calls.

And AFAIK Proton does things like this too, but for different reasons (fixing games that don't adhere to the D3D API documentation and/or obviously ignored D3D validation layer messages).

crazygringo · 2025-02-09T15:07:34 1739113654

I don't know -- if that other commenter is correct, it does sound pretty crazy.

Improving your compiler for everybody's code is one thing.

But saying, if the shader that comes in is exactly this code from this specific game, then use this specific precompiled binary, or even just apply these specific hand-tuned optimizations that aren't normally applied, that does seem pretty crazy to me.

But I don't know which it is?

Cieric · 2025-02-09T15:40:36 1739115636

Finger printing based on shaders is quite rare, really most of the time we detect things like the exe name calling us or sometime, very rarely they will give us a better name through an extension. (unreal engine does this automatically). From there all the options are simple, but full shader replacements are one. In the api I work on the shaders have a built in hash value, so that along with the game identified means we know exsactly what shader it is. Most of the replacements aren't complicated though, it's just replacing slow things with faster things for our specific hardware. In the end we are the final compiler so us tweaking things to work better should be expected to a degree.

chrisjj · 2025-02-09T16:39:47 1739119187

> most of the time we detect things like the exe name calling us

What could possibly go wrong? :)

macbr · 2025-02-09T17:53:09 1739123589

In the case of the Minecraft mod Sodium, which replaces much of Minecraft's rendering internals, Nvidia optimisations caused the game to crash. So the mod devs had to implement workarounds to stop the driver from detecting that Minecraft is running... (changing the window title among other things)

https://github.com/CaffeineMC/sodium/issues/1486

https://github.com/CaffeineMC/sodium/issues/1816

immibis · 2025-02-10T19:51:51 1739217111

And plain Minecraft did the opposite by adding -XX:JavaHeapDumpPath=MojangTricksIntelDriversForPerformance_javaw.exe_minecraft.exe.heapdump to the command line, when they changed the way the game started so that it wasn't detected as minecraft.exe any more. (Side effect: if you trigger a heap dump, it gets that name)

chrisjj · 2025-02-11T21:30:40 1739309440

Yuuuk.

Benanov · 2025-02-09T17:20:51 1739121651

quack.exe performing noticeably worse in benchmarks on certain cards than quake.exe in the late 90's/early 2000's?

magicalhippo · 2025-02-10T10:10:44 1739182244

For those who don't remember:

https://web.archive.org/web/20230819072628/https://techrepor...

chrisjj · 2025-02-10T00:17:22 1739146642

Yup. Looking at you, ATI.

Cieric · 2025-02-10T00:49:08 1739148548

I mean this already happened [1]. But it's either that or games running like crap because they're not properly tested on our cards.

[1] https://www.neowin.net/news/yandex-alleges-amds-windows-driv...

chrisjj · 2025-02-11T21:37:20 1739309840

> or games running like crap because they're not properly tested on our cards.

The decision on what game testing is proper surely lies with the game dev, not the card dev.

Plus game devs generally know that non-testing cannot cause a game to run like crap. Card devs ought to know that too.

Card devs also ought to respect that a game dev may have a good business reason for leaving his game running like crap on certain cards.

Cieric · 2025-02-13T16:27:31 1739464051

I would gauge properly testing on our card to be running the game occasionally on our cards throughout the whole development process. Currently it's been a lot of games 2 weeks out from release dropping a ton of issues on us to try and figure out. I don't even care if they don't do any optimization for our cards as long as they give us a fighting chance to figure out the issues on our side.

> Card devs also ought to respect that a game dev may have a good business reason for leaving his game running like crap on certain cards.

I can't agree with this though, business decisions getting in the way of gamers enjoying games should never be something that is settled for. If the hardware is literally to old and you just can't get it to work fine, but when a top of the line card is running below 60 fps that's just negligence.

chrisjj · 2025-02-13T18:52:57 1739472777

> business decisions getting in the way of gamers enjoying games should never be something that is settled for.

That's between the game dev and his customer. None of a card dev's business.

snicker7 · 2025-02-09T14:30:07 1739111407

Imagine being the dev competing game Y and seeing the changelog.

surajrmal · 2025-02-09T14:41:54 1739112114

It wouldn't be surprising to find out Nvidia talks directly with game developers to give them hints as to how to optimize their games

account42 · 2025-02-10T10:24:16 1739183056

It wouldn't be surprising at all seeing as they are even in the credits for most games.

chrisjj · 2025-02-09T16:33:38 1739118818

> "optimized game X, runs 40% faster"

... and looks 4O% crappier? E.g. stuttery, because the driver does not get to see the code ahead of time.