Hacker News new | past | comments | ask | show | jobs | submit login

There are some LLVM optimization talks that show otherwise, where longer vectorized code actually runs faster than the smaller version, though.



...in microbenchmarks.

That's what created multi-kilobyte memcpy() implementations, which barely beat REP MOVSB but cause huge icache bloat.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: