It is very rare for programs to be memory bandwidth bound. It usually take a lot...

MauranKilom · on Nov 13, 2021

> (such as looping through large arrays, only doing one simple calculation to each index, then doing that on many cores).

...which perfectly describes a parallelized mat-vec-mult. Yes, that's not common in most applications, but I'd have a hard time naming a more basic operation in scientific (and related) computations.

CyberDildonics · on Nov 13, 2021

We are saying the same thing here, though I think you are missing the point that this is all a response to someone asking if SMT is useful anymore since there are many cores in almost every CPU.

The answer is that it is absolutely still useful since your example is niche and most software/systems can still benefit from being able to work around memory latency with more threads.