> But AMD's implementation doesn't seem to have this problem From an article: > ...

kimixa · 2024-08-16T05:40:04 1723786804

Indeed, the difference does appear to be in how AMD does the throttling.

From the linked numberworld blog:

> Thus on Zen4 and Zen5, there is no drawback to "sprinkling" small amounts of AVX512 into otherwise scalar code. They will not throttle the way that Intel does.

This is exactly the use case I'm talking about - relatively small chunks of avx512-using code spread throughout the codebase. Larger chunks of work tend to be worth passing over to the GPU already.