Hacker News new | past | comments | ask | show | jobs | submit login

I mean, did you see the very complicated extremely optimized C and C++ code lower on the page? Despite that, they "only" got to 10% of the performance of the ASM code.



The speed of this program is partly that it's written in assembly, but mostly because it's written by someone who is quite clever and clearly put a large amount of time into this problem. None of the other solutions spend much time trying to fit their data into the CPU cache, nor do they have to drop to using slicing for zero copies, and not one is doing anything nearly as clever as this program is to generate its numbers. All of this would be possible to mostly translate to C++ with AVX intrinsics, but real accelerator here is not choice of language, it's the person behind the code.


Now that I have seen the power of madvise + huge pages, everything looks like a nail. Author reckons 30% from less page table juggling. There are techniques here that apply outside assembly.


It's not ASM that make the code fast, it's the way he laid data and code. C/C++ should be able to approach 90% the speed of this.


most other implementations do not use splicevm, which is a huge win for this specific problem.

Of course all the AVX and cache optimizations are also exceedingly clever.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: