Hacker News new | past | comments | ask | show | jobs | submit login

>No it isn't; it costs an extra instruction to initialize that register or memory location to zero.

If you're talking about the "xor eax, eax" that was emitted, then absolutely not. GCC is doing the most efficient thing possible with eax here.

That xor is not inserted because GCC kindly initializes the variable for you to save you from your mistakes, but because it has to initialize eax before returning (look, there's a function call before in the code example). And after deciding the return value to be 0 it knows "xor eax, eax" is always going to be faster than actually loading the real value of "ret". (In fact it's a zero-idiom, even a NOP instruction is slower than that!)

What I was really trying to talk about is the SSA semantics and not the actual emitted code. Now in fairness I'm not familiar with GCC's IR, but I'm going to blithely assume what GCC does with unitialized variables somewhat ressembles LLVM's undef semantics.

When I say that picking 0 is a good optimization, I mean that at an SSA level, the compiler is going have a Phi between Undef and the range [0, 0]. If by "leaving those variables alone" you mean that the compiler should have returned Undef as the result of that Phi, that would be a horrible miscompilation! And of course picking something else than 0 would just be silly.

>It's not correct for diagnostics to pertain to versions of the program that were logically altered by the compiler, rather than to the original program.

Agreed. As a user, I am not happy about this lack of diagnostics and I'm happy to call that a bug if you want. And furthermore, I don't think performing the optimizations above is fundamentally incompatible with good diagnostics. Probably just hard to implement for GCC.

Disclaimer: I am not actually a compiler engineer and this probably contains mistakes.




> GCC is doing the most efficient thing possible with eax here.

It seems that the most efficient thing with %eax is not to mention it in an instruction.

There may be some quirk/feature of modern Intel processors that clearing a register will dis-entangle it from considerations of prior hazards. So that is to say, when we execute 'xorl %eax, %eax', the processor knows that any prior value in `%eax` is no longer required by subsequent code. A new lifetime has begun for that register. Therefore, there is no need to execute a pipeline stall to wait for that prior value, if it so happens that that prior value is not yet ready. Internally to the processor, the ISA-level register `%eax` can be assigned to a different internal register at that point, under the discipline of "register renaming".

In the absence of any such hardware considerations, the clearing of %eax is pure waste.

> it has to initialize eax before returning

Where is that requirement coming from? Not from ISO C, certainly. The behavior is undefined. It was the programmer's job to ensure that the return value is calculated by a well-defined expression, not that of the compiler. From that standpoint alone, it's perfectly conforming to emit code that just leaves the existing content in %eax and returns.


> There may be some quirk/feature of modern Intel processors

not so much of a quirk, but basic behaviour of any OoO processor due to register renaming. This is true on the vast majority of intel (and AMD) cpus (in fact probably all the currently sold ones as even atom has acquired a limited OoO engine and the knight variants are no more).

It is so fundamental that, as described elsethread zeroing a register is almost free (it still costs decode bandwidth and icache size).

Bottom line in the vast majority of cases, zeroing the registers is almost always a win.

edit: having said that I doubt that gcc is trying to optimize the UB case, probably it the optimizer just requires that the variable has a value at that point.


>It seems that the most efficient thing with %eax is not to mention it in an instruction.

I think what you're missing is that UB is a property of the execution of the program, and not just of the code that was compiled.

If in practice the if branch is always taken, then there is NO undefined behavior (surprisingly, perhaps)!

That means GCC has to be prepared to handle that case, and has to set eax to 0 when the if branch is taken.

>Where is that requirement coming from? Not from ISO C, certainly. The behavior is undefined.

Not so! You don't know until the program is run.

-

Edit: (removed some speculation about assuming the branch is always taken, that was not relevant)

>There may be some quirk/feature of modern Intel processors that clearing a register will dis-entangle it from considerations of prior hazards. So that is to say, when we execute 'xorl %eax, %eax', the processor knows that any prior value in `%eax` is no longer required by subsequent code. A new lifetime has begun for that register.

That's absolutely right, by the way. I think this was introduced with Sandy Bridge, in 2011.


No, I mean the behavior is undefined in that case when the function returns the indeterminate value, which was initialized anyway.

Look, GCC (what I have here: 7.3.0) is doing this even for the following trivial function:

    int undef(void)
    {
      int ret;
      return ret;
    }
With -O2 this turns out:

  xorl %eax, %eax
  ret
do we still know until run-time that this has UB?


>Look, GCC (what I have here: 7.3.0) is doing this even for the following trivial function:

You're absolutely right that in this case, it's wasteful. In fact, if you try Clang you will see that it only emits a ret.

Maybe GCC is doing you a courtesy (or maybe nobody taught it to completely omit an Undef return value, so it just picks zero!). Either way, it's unconditionally UB, GCC can do whatever it likes.

>do we still know until run-time that this has UB?

Well, there is no branch here, is there? I'm clearly not going to disagree with you on this one =]

To be clear: UB is still a property of the execution of the program, but here it's not hard to statically determine that all executions of this program are Undefined Behavior.


Moreover, it still does it with "-m32 -mtune=i386". :)




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: