Also, his answer on #9 is wrong, or at least <EDIT> his explanation of the conversation is terribly confusing </edit> With 10000 numbers, it's only efficient to create a lookup table with 8-bit integers, not with 16-bit integers.
Based on his LinkedIn profile, I don't think anyone at Google would have thought of him as a "director of engineering". Being an "R&D director" at some unknown company at 24 is entirely un-comparable to a director at Google, and since then he's worked at his own very small company. He was probably a candidate for Senior SRE.
Who's answer is wrong? Cause no-one suggested a 16bit lookup table.
His answer was to look at 64 bits at a time and do a [0] Kernighan style count. The "correct" answer was an 8-bit lookup table. Which is right is going to be highly dependent on the data and the architecture you are using.
The question was "how do you do bit-counting of a bunch of numbers". The two canonical answers are "lookup table" and "using bit shifts and multiplying by magic numbers".
The fact that there's a machine instruction for it does makes it a bad question.
Honestly I would have said popcnt as well. Lookup table or bit shifts when I can have the cpu count the 1's? I guess i'd need to benchmark it to be sure. Either way I can't say its a good question.
Is it? It looks like on most recent Intel CPUs, it's 3 cycle latency, 1 cycle throughput on a 64-bit register. A 8-bit LUT solution is going to less than 16-bits per cycle on any recent Intel/AMD CPU (maximum of two load ports).
Hmm, much better than I remember. I guess this goes a long way to explain why this wasn't always seen in practice: http://danluu.com/assembly-intrinsics/
Based on his LinkedIn profile, I don't think anyone at Google would have thought of him as a "director of engineering". Being an "R&D director" at some unknown company at 24 is entirely un-comparable to a director at Google, and since then he's worked at his own very small company. He was probably a candidate for Senior SRE.