Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We mean energy, right? The surprising fact is that the energy cost of executing an instruction (scheduling etc.) is much higher than the actual operation. Thus SIMD amortizes that per-instruction overhead over multiple elements, leading to 5-10x gains ( https://pdfs.semanticscholar.org/f464/74f6ae2dde68d6ccaf9f53...).

Even in this example, which apparently has 4x vector instructions vs 1x scalar, AVX-512 (and probably even AVX2) would reduce energy usage because they can do 8 (or 4 for AVX2) 64-bit elements per cycle.



> The surprising fact is that the energy cost of executing an instruction (scheduling etc.) is much higher than the actual operation.

Good point, decoding and scheduling are expensive and SIMD certainly eliminates a lot of that. However the alternative algorithm has even less decoding and scheduling to do, since it completely eliminates the multiplication operations without increasing the number of additions. But even then I wouldn't be surprised that it makes no difference to energy on any x86, as I said it was more of a fundamental observation - for a different application this is useful when selecting the actual hardware if you are not already constrained to a particular chip.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: