Hacker News new | past | comments | ask | show | jobs | submit login

Convenient authoring doesn't necessarily make it a good fit for the hardware. Add in enough divergence and your GPU code is going to be matched or outperformed by a competent CPU implementation (on a chip of comparable size). Branchless code can result in substantial speedups on either.

To be fair though, modern GPUs are pretty good at branching and latency hiding, while numpy-style code has poor data locality unless you have a magic compiler.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: