Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Do you recall the question? Site's down. I recall thinking the right answer was to use POPCNT but maybe I'm misremembering the question.


The question was "how do you do bit-counting of a bunch of numbers". The two canonical answers are "lookup table" and "using bit shifts and multiplying by magic numbers".

The fact that there's a machine instruction for it does makes it a bad question.


Honestly I would have said popcnt as well. Lookup table or bit shifts when I can have the cpu count the 1's? I guess i'd need to benchmark it to be sure. Either way I can't say its a good question.


Popcnt isn't particularly well optimized in most micro-architectural implementations.


Looks that way with a quick test. But it looks like there may be a better way with SSE3 PSHUFB: http://wm.ite.pl/articles/sse-popcount.html


Is it? It looks like on most recent Intel CPUs, it's 3 cycle latency, 1 cycle throughput on a 64-bit register. A 8-bit LUT solution is going to less than 16-bits per cycle on any recent Intel/AMD CPU (maximum of two load ports).


Hmm, much better than I remember. I guess this goes a long way to explain why this wasn't always seen in practice: http://danluu.com/assembly-intrinsics/


just prepend "cache://" to the url


Woah, I've never seen this before -- thanks for sharing!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: