Hacker News new | past | comments | ask | show | jobs | submit login

Why is (1025, 1025, 1025) so much faster than (1024, 1024, 1024)?



My guess that it's happening mostly due cache conflicts. With 1024 for a simplified L1 with 32kb you can fit exactly 8 lines of the inner dimmension in the cache, which means that (0,8,0) would have the same cache location as (0, 0, 0), which is bad for tiling




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: