Oh, I see. It's really a behavior of x86 that's causing the confusion then. The ...

zzo38computer · on March 9, 2019

I agree; that does look indeed a good reason to set it to zero. Once the end of the function code is reached, ret is either zero or undefined, so there is no need to keep track of the value of ret; just assume that it is zero.

Now I can see how x86 C ABIs is working; thank you for explaining because I did not know much about that before, but now I know, and indeed it is making sense. (On a different instruction set, something else might make sense; I don't really know.)

kazinator · on March 4, 2019

I do understand that in all cases when the tail part of the function is executed such that it's well-defined (the only cases we care about), the return value is zero, which means it's effectively a constant. Since we have to prepare that return value in %eax (dictated by the ABI), then at some point we clear %eax; that's not semantically the same as initializing ret. If we do this later in the function, like just before returning, we can use %eax for other purposes in the meanwhile.

tux3 · on March 4, 2019

>I do understand that in all cases when the tail part of the function is executed [...] then at some point we clear %eax;

>If we do this later in the function, like just before returning, we can use %eax for other purposes in the meanwhile.

Okay, I think we completely agree here!

>that's not semantically the same as initializing ret.

Hmm, so is it that you're asking why we're initializing ret? Well, that sounds like a good excuse as any for another incredibly long and boring wall of text =]

If I'm extra lucky I'll get called out as the amateur I am and learn something in the process (!)

---

So, here's the thing. Talking about the compiler initializing ret as a variable, separate from its storage in eax is not really what the compiler is trying to do, not as I understand it. We really aren't "initializing it" so much as trying to guess it's value in all possible executions (because that's just what compilers do these days).

The way the compiler works is that first it gets rid of the idea of variables that can be assigned multiple times, it switches to something called SSA form where every "variable" is initialized exactly once. Want to reassign a variable? Just declare one with a new name instead and use it going forward.

The funny part of SSA is that when there's a branch, after it joins back you end up having to define a variable that could have two possible values (branch taken, not taken). That's not something you should normally be able to do with the SSA rules, so it's represented by a special Phi "value" in the compiler's intermediate representation.

A Phi basically just says "either we came from path A and we have value Va, or from B and it's Vb".

But what the compiler really wants is to forget about the branch and the Phi business, and just deduce a plain value for ret so it can move on with cold hard numbers in mind.

Since we have undefined behavior here, we can simplify Phi<Undef, [0, 0]> into just [0, 0]. And that's how the compiler "forgets" that ret was uninitialized in the first place. It just optimizes the undefinedness away while trying to guess values, if you will.

So long story short we're not so much trying to intialize ret than we're trying to guess it's value, and forget that there were multiple branches in the first place.