Note that soon after I posted this, Someone noticed[1,2] that it is possible avoid a whole lot of work, but I am still glad I realized how much performance one can gain by dropping GMP in favor of `__int128`.
It seems like for numbers on this side of 1024 bits, writing purpose-built routines using `__int128` or SSE/AVX instructions rather than general functionality is likely to pay off.
Parent must have mistaken GMP for OMP, and wanted to shoot a quickie — no having to read the article! some research should be done— is it because HNers browse HN just before bed?
To be honest i made the same mistake at glance… fortunately i am not mentally exhausted…
Given the author mentions multiple cores being available, I'd guess you could use any method, including MPI, to distribute the computation. But whether you used 1 core or 10k cores, it would be nice to have a 20x speedup on each core via this arithmetic/fixed size optimization. Since that's the focus of the article, communication technologies feel pretty unrelated.
It seems like for numbers on this side of 1024 bits, writing purpose-built routines using `__int128` or SSE/AVX instructions rather than general functionality is likely to pay off.
[1]: https://news.ycombinator.com/item?id=10915461
[2]: https://www.nu42.com/2016/01/excellent-numbers-explicit-solu...