Right. The improvement that this work brings is that it performs the function sp...

simcop2387 · on Sept 11, 2020

That'll make it more stack efficient too since it doesn't have to go through the same dance for a function call. It probably didn't do them while adhering to the C ABI but it'd still put at least a return address and I suspect some registers during the call.

saagarjha · on Sept 11, 2020

There’s no need, though: the entry point and return address are unique; it’s literally code that is sliced out, jumped to, and it jumps back to the function it was cut out of. The only thing you’d need to save is a register or two if you can’t make the jump without doing some math.

quotemstr · on Sept 11, 2020

X86 should always be able to jump directly, and ARM sets aside x16 and x17 just for this kind of math. But all jumps should be PC relative, so you shouldn't have to clobber anything anyway

loeg · on Sept 11, 2020

Might be confusing to debuggers if the address-space range of a single function is discontiguous. Does the cold portion get an independent symbol with derived name, like, e.g., "Blocks?"

quotemstr · on Sept 11, 2020

Yes --- the collection of cold basic blocks gets named "<origfunc>.cold". But it's nevertheless not really an independent function from a code POV.