Everybody seems to agree it is an LLVM bug, but "UB" is overloaded. There is no "C/C++ UB" in the original program, but the optimizers can introduce "LLVM UB" in IR form, which then results in something substantially equivalent to what the mock transformed source codes shown for illustration would probably be compiled to. And introducing UB when there was none in the source code means there is a compiler bug.
On your remark about char*, it is an universal type alias, but I don't think it is an universal provenance alias, sadly I don't think such a thing even exist de-facto, and it will not even exist more formally when considering the PVNI-ae-ud model that is being cooked. Probably unwise to lean as usual on the aggressive optimisations side without even proving that the perf impact is that much interesting, evaluating the security / quality and education impact, if you ask me. And even more problematic without even providing an escape hatch (1). But I know very well that state of mind has just won for now and I have to cope with it. Even more C programs will be retroactively declared completely incorrect (and not even just not-portable) because the compilers went too agressive at one point and the standard just standardized their aggressiveness instead of telling implementers to just stop being crazy.
(1) beyond, for some potential programs that would be otherwise impossible to express in C/C++, exposing all the pointers. Well exposing all the pointers would be cute but the result would not be that much different from not having that kind of "opti". Plus you would have to actually expose all your target pointers, so it is not really source compatible with unmodified existing codebases. So a per-access or at least per-pointer-doing-the-accesses universal provenance alias is needed to get a serious escape hatch.
I'm also not extremely sure we can actually implement a memory allocator in C or C++ today (with whatever the modern compiler are "optimizing" with their partly informal provenance rules), nor that we will be able to with PVNI-ae-ud (broadly same thing except slightly debugged). (Or maybe it will merely constrain the implementation? not sure anyway)
> On your remark about char*, it is an universal type alias, but I don't think it is an universal provenance alias
C doesn't have a concept of provenance in this fashion, or alternatively the compiler must assume the pointers may alias. This is why we have the restrict keyword.
The only cases where the compiler can assume things do not alias are when pointers are to different types (except for char).
Naturally the compiler is allowed to prove that some pointers cannot alias and optimize based on that. But if it messes up it's a compiler bug pure and simply.
Neither does C++, IIRC the compilers invented the notion because tons of operations are forbidden between pointers to different objects, so by tracking the provenance you can e.g. detect conditions leading to UB and trim them, because as usual why would the programmer ever write a bug (from the point of view of strict conformity to the standard)? So if it is not a bug thanks to our perfect programmer that of course knows the standard by heart even better than compiler authors apparently do, that must be that this condition is always false, code path is dead, etc.
Hilarity ensues when compiler authors themselves introduce bugs in compliant programs thanks to this line of thought that they often take way to far.
So again: of course this is a compiler bug. But it is caused by an attempt to use provenance analysis for aliasing (that could indirectly be allowed, at least in a non-buggy form, because of some arcane C or C++ rules) that was not implemented correctly. Type based aliasing is more simple because the rules lead to it slightly more directly.
On your remark about char*, it is an universal type alias, but I don't think it is an universal provenance alias, sadly I don't think such a thing even exist de-facto, and it will not even exist more formally when considering the PVNI-ae-ud model that is being cooked. Probably unwise to lean as usual on the aggressive optimisations side without even proving that the perf impact is that much interesting, evaluating the security / quality and education impact, if you ask me. And even more problematic without even providing an escape hatch (1). But I know very well that state of mind has just won for now and I have to cope with it. Even more C programs will be retroactively declared completely incorrect (and not even just not-portable) because the compilers went too agressive at one point and the standard just standardized their aggressiveness instead of telling implementers to just stop being crazy.
(1) beyond, for some potential programs that would be otherwise impossible to express in C/C++, exposing all the pointers. Well exposing all the pointers would be cute but the result would not be that much different from not having that kind of "opti". Plus you would have to actually expose all your target pointers, so it is not really source compatible with unmodified existing codebases. So a per-access or at least per-pointer-doing-the-accesses universal provenance alias is needed to get a serious escape hatch. I'm also not extremely sure we can actually implement a memory allocator in C or C++ today (with whatever the modern compiler are "optimizing" with their partly informal provenance rules), nor that we will be able to with PVNI-ae-ud (broadly same thing except slightly debugged). (Or maybe it will merely constrain the implementation? not sure anyway)