I don't think there's going to be any performance difference at all compared to ...

comex · on Feb 6, 2014

It can make a difference due to weird ABIs. For example, in the ARM procedure call standard, 64-bit values can be returned in r0 and r1, but 64-bit structures can't (no matter whether they contain two 32-bit values or just one 64-bit value). However, I'd call that fairly niche - it's hard to imagine an application that would notice a significant performance difference.

com2kid · on Feb 6, 2014

This is very nice to know actually, and quite relevant to code I am working on right now!

Disappointing though.

> it's hard to imagine an application that would notice a significant performance difference.

One hopes any function call made often enough to be a perf hit from this would also be designed to be easily inlined!

clord · on Feb 6, 2014

Scientific computing will hit this sort of problem just because the kernel is so large it actually needs to be broken into sub-procedures for icache reasons. But also, the penalty is large, because there is a full store/load cycle involved, which will clear caches and increase memory bandwidth requirements.