Hacker News new | past | comments | ask | show | jobs | submit login

Yeah on the GPU you need to get your threads to ideally load consecutive memory locations for each thread to utilize the memory bandwidth properly. Random-indexing blows this out of the water. I guess that you could pre-process on the CPU though to pack the sparse stuff for better GPU efficiency..



Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: