While that may be true, my point is that it is almost certainly possible to make your code go faster than it is already, and also become more readable in the process.
And so saying that python is either slow or ugly and unreadable is perhaps an unfair characterization. I may be wrong here. I haven't benchmarked the code in question, but I think that even for the algorithm you're trying to do, with the special casing, that function could be significantly simplified.
Edit: I'd be curious to see example data that is passed into this function.
That may be the case. However, my point is that we started with a rather direct implementation of a formula in a paper. This was very easy to write but took hours on a test set (which we could extrapolate to taking weeks on real data!).
Then, I spent a few hours and ended up with that ugly code that now takes a few seconds (and is dominated by the whole analysis taking several minutes, so it would not be worth it even if you could potentially make this function take zero time).
Maybe with a few more hours, I could get both readability and speed, but that is not worth it (at this moment, at least).
*
The comment about the benchmark data being large is exactly my point: as datasets are growing faster than CPU speed, low-level performance matters more than it did a few years ago (at least if you are working, as I am, with these large data).
1. Have gotten similar performance boosts elsewhere, meaning that you wouldn't have needed to refactor this function in the first place (although the implication of a 10000x speedup means that may not be true, although I can absolutely see the potential for 100x speedups in this code, depending on exactly what the input data is)
2. Its likely that there are much more natural ways to implement the function you have in pandas more idiomatically. These would be both clearer and likely equally fast, though possibly faster. (heck, there are even ways to refactor the code you have to make it look a lot like the direct from the paper impl)
In other words, this isn't (necessarily) a case of python having weak performance, its a case of unidiomatic python having weak performance. This is true in any language though. You can write unidiomatic code in any language, and more often than not it will be slower than a similar idiomatic method (repeatedly apply `foldl` in haskell). I'm not enough of an expert in pandas multi-level indexes to say that for certain, but I'd bet there are more efficient ways to do what you're doing from within pandas that look a lot less ugly and run similarly fast.
Granted, there's an argument to be made that the idiomatic way should be more obvious. But "uncommon pandas indexing tools should be more discoverable" is not the same as "python is unworkably slow".
1. No, that function was the bottleneck, by far, and I can tell you that >10,000x was what we got between the initial version and the final one.
2. I don't care about faster at this point. The function is fast enough. Maybe there is some magic incantation of pandas that will be readable and compute the same values, but I will believe it when I see it. What I thought was more idiomatic was much slower.
I think this is more of a case of "the problem does not fit numpy/pandas' structure (because of how the duplicated indices need to be handled), so you end up with ugly code."
1. you don't get 10000x speedups by changing languages. It's likely that this optimization would be necessary in any case.
2. You don't care about improving the code, but you did care enough to write an article saying that the language didn't fit your needs without actually doing the due diligence to check and see if the language fit your needs. That's the part that gets me.