Hacker News new | past | comments | ask | show | jobs | submit login

Golly, another sweeping statement :) It seems to me a GPU might actually sort more slowly.

For 64-bit keys, we sort about 1 GB/s per (5 year old) Skylake core, and perhaps 5-6 parallel.

This (2018) reports 3.5 GB/s: https://benkarsin.files.wordpress.com/2018/10/dissertation.p... And a 6-year old GPU radix sort reports 2.1 GB/s: https://github.com/Bulat-Ziganshin/Compression-Research/tree...

BTW I've worked on a product that used GPUs. That typically requires everything to move to the GPU, which is not always desirable or feasible.




This (2020) reports "Despite the fact that we send the entire data array to the video card and back, sorting on GPU of 800 MB of data is performed about 25-fold faster than on the processor."

https://dev.to/tishden/computing-with-gpu-why-when-how-and-s...

This shows a approximately 20x speedup (2021, graph 1 vs 3): https://www.irjet.net/archives/V8/i7/IRJET-V8I7714.pdf


Thanks for the example! Sounds like 1.6 GB/s on an entire Tesla K80 (300W TDP). This is in fact several times slower than our results on Skylake (with half the TDP), but note that K80 is from 2014.

The "25-fold speedup", as is often the case for such reports, comes from not optimizing the CPU side.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: