Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

At the time, yes, but I do not have those numbers anymore. Trying my best to remember - I would estimate that the speedup of 360 / 4 = 90x is approximately: 4x multithreading (it was pretty linear speedup), 7x the algorithm and 3x the memory inefficiencies.

EDIT: fixed my math, thank you andruby



Sorry to correct your math here, but going from 6 hours (6*60=360 minutes) to 4 minutes is a 90x improvement, not 900x.


Interestingly, a speedup of 2 orders of magnitude seems to be about what you'd expect if you only apply the algorithmic optimization (going from N^2 to Nlog(N) with N being in the range of a few thousand, like OP said)


Like I mentioned - my memory is pretty hazy about the specific speedups.

Yes, N^2 to NlogN feels like it should be a bigger factor, but remember that constants in front of this matter. You are replacing 2-level nested but very straightforward loops that rip through cached memory at blazing speed without any disruption to CPU pipelines with a single loop that involves quite a bit of condition checking and array element swaps (cache is busted, pipelines are busted - conditions are very hard to predict). You gain some, you lose some.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: