Yes, you *can* use the out arguments, but doing so 1) complicates your code, and...

a_t48 · on April 6, 2020

I agree with point (1), wholeheartedly - by default the syntax is ugly unless you wrap it in a utility. (2) isn't true as far as I can tell - you can keep reusing the same output buffer, or double buffer if the output can't be the same as the input.

Either way see https://news.ycombinator.com/item?id=22786958 where I figured out the syntax to do it in place, which shouldn't result in any allocations at all and should solve both your concerns.

quotemstr · on April 6, 2020

Regarding #2: whether you use the same buffer or a new buffer, you still have to actually write to the memory. The bottleneck at numpy scale isn't the memory allocation (mmap or whatever) but the actual memory writes (saturating the memory bus), and you need to perform the same number of writes no matter which destination array you use.

a_t48 · on April 6, 2020

It should still be a perf gain:

1. Memory allocation isn't free 2. Doing multiple loops over the buffer (once per operation) is going to be slower than doing all the operations at once, both because of caching and the opportunity to do the operations on a register and write at the end (though who knows if the python interpreter will do that).

quotemstr · on April 6, 2020

The Python interpreter is too stupid to do operation coalescing. That's the whole point of my initial comment.