Then you're lucky because when I optimized an algorithm for my diploma thesis the code grew significantly because my Python functions suddenly had to deal with different type combinations and also, as part of the optimization, I had to switch from my short numpy-based code to manual for-loops which combined several operations that were simple (but less efficient) numpy arithmetic before. I think it highly depends on what you want to do, whether you can still work with Python objects, and how much static typing gets in the way when trying to solve your particular problem.
was the performance gain worth it in the end? In my experience numpy is pretty tight on its own, but I've seen some excellent speed gains from using cython + numpy.
No, it's a custom implementation of a simple compiler. It's nothing complicated. It converts Python to C++ and compiles that with nvcc. It also supports numpy arrays. It doesn't do any complex optimization steps like a full compiler. It's more like Cython, actually (with type annotations via an @gpu decorator). This allowed me to take my Python image processing code almost literally and annotate it with @gpu. The code isn't released, yet.
I originally wanted use Copperhead and got in contact with the developer a year ago, but it was too early even for "private" beta testing, so I never got access to their code. Also, my compiler is specialized on image processing, so probably Copperhead wouldn't have worked, anyway. I'm only jealous of Copperhead's type inferencer. :) But then again, I have to get finished with my thesis and a type inferencer wouldn't help with that goal. ;)