Hacker News new | past | comments | ask | show | jobs | submit login

I really like this writeup. Note that it may not be worth using the SIMD in this way (horizontal SIMD) if you know you will be multiplying many matrices that are the same size. It may be better to do vertical SIMD and simply perform the scalar algorithm on 4 or 8 matrices at a time, like GPUs would do for vertex shaders. This does mean that you may have to interleave your matrices in an odd way to optimize memory access, though.



Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: