In particular, if you find you cannot use vectorized functions in numpy or scipy and absolutely MUST index, then typing the array in Cython is a life saver. Indexed operations on numpy arrays without Cython is very slow. (eg https://stackoverflow.com/q/22239199/300539)
Agree. It's a bit surprising to see that numpy indexed operations are even slower than the built-in list in this example. It seems the idiomatic numpy way to perform the iterations is through vectorization, but that often leads to code that is not straightforward to reason about. For this example, I'd prefer the simplicity of the Cython for-loop when it comes to optimization.
That's because with generic python code the values in a numpy array need to be boxed into python objects leading to extra memory allocations (whereas they would already be boxed in the built-in list case).