> My claim is simple and narrow: compilers should internally model such values a...

foltik · 2025-12-31T05:04:27 1767157467

Sorry, my earlier comments were somewhat vague and assuming we were on the same page about a few things. Let me be concrete.

The snippet is, after lowering:

  if (x)
    return { a = 13, b = undef }
  else
    return { a = undef, b = 37 }

LLVM represents this as a phi node of two aggregates:

  a = phi [13, then], [undef, else]
  b = phi [undef, then], [37, else]

Since undef isn’t “unknown”, it’s “pick any value you like, per use”, InstCombine is allowed to instantiate each undef to whatever makes the expression simplest. This is the problem.

  a = 13
  b = 37

The branch is eliminated, but only because LLVM assumes that those undefs will take specific arbitrary values chosen for convenience (fewer instructions).

Yes, the spec permits this. But at that point the program has already violated the language contract by executing undefined behavior. The read is accidental by definition: the program makes no claim about the value. Treating that absence of meaning as permission to invent specific values is a semantic choice, and precisely what I am criticizing. This “optimization” is not a win unless you willfully ignore the program and everything but instruction count.

As for utility and justification: it’s all about user experience. A good language and compiler should preserve a clear mental model between what the programmer wrote and what runs. Silent non-local behavior changes (such as the one in the article) destroy that. Bugs should fail loudly and early, not be “optimized” away.

Imagine if the spec treated type mismatches the same way. Oops, assigned a float to an int, now it’s undef. Let’s just assume it’s always 42 since that lets us eliminate a branch. That’s obviously absurd, and this is the same category of mistake.