Golly, another sweeping statement :) It seems to me a GPU might actually sort mo...

WithinReason · on April 16, 2022

This (2020) reports "Despite the fact that we send the entire data array to the video card and back, sorting on GPU of 800 MB of data is performed about 25-fold faster than on the processor."

https://dev.to/tishden/computing-with-gpu-why-when-how-and-s...

This shows a approximately 20x speedup (2021, graph 1 vs 3): https://www.irjet.net/archives/V8/i7/IRJET-V8I7714.pdf

janwas · on April 17, 2022

Thanks for the example! Sounds like 1.6 GB/s on an entire Tesla K80 (300W TDP). This is in fact several times slower than our results on Skylake (with half the TDP), but note that K80 is from 2014.

The "25-fold speedup", as is often the case for such reports, comes from not optimizing the CPU side.