Hacker News new | past | comments | ask | show | jobs | submit login

Yeah, you can do that, but that still means you write platform-specific code. What you typically do is that you write a cross-platform scalar implementation in standard C, and then for each target you care about, you write a platform-specific vectorized implementation. Then, through some combination of compile-time feature detection, build flags, and runtime feature detection, you select which implementation to use.

(The runtime part comes in because you may want a single amd64 version of your program which uses AVX-512 if that's available but falls back to AVX-256 and/or SSE if it's not available)




> runtime feature detection

For any code that's meant to last a bit more than a year, I would say that should also include runtime benchmarking. CPUs change, compilers change. The hand-written assembly might be faster today, but might be sub-optimal in the future.


The assumption that vectorized code is faster than scalar code is a pretty universally safe assumption (assuming the algorithm lends itself to vectorization of course). I'm not talking about runtime selection of hand-written asm compared to compiler-generated code, but rather runtime selection of vector vs scalar.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: