*"Strictly speaking, the C standards prohibit using a pointer to an object after...

saurik · on Sept 18, 2022

This hack of using freed memory is not what (at least as far as I can tell) the issue was about: the code calls realloc and then compares the old pointer value to the new pointer value to avoid recalculating some offsets and somehow gcc decides that this is undefined behavior and refuses to equate them; click the "Autogen" link in the article to go to the issue where they are discussing what happened... like, it is "a bug in the application", but it only seems to be a bug in the application if you have extremely detailed knowledge of undefined pointer behavior, not merely because someone was making egregious implementation assumptions about the allocator.

nyanpasu64 · on Sept 19, 2022

I'm reminded of https://www.foonathan.net/2022/08/malloc-interface/#problem-.... While initially I was skeptical of the author's criticism of C's allocation APIs, this bug seems to be one that could've been avoided by using the proposed try_expand() instead of realloc().

Of course ideally we'd move away from provenance-based UB and Rust's third-class aliased mutability, to simpler conservative aliasing semantics by default, and allow programmers to opt into non-general optimizations by manually saving unchanging values like vector::size() into locals, or have the optimizer and programmer interactively explore code invariants and optimizations, making the optimizer a performance-focused pair programmer rather than a black box.

titzer · on Sept 19, 2022

What a stupid "smart" compiler.

Gibbon1 · on Sept 19, 2022

I think it's the optimization the compiler writers insist on that developers absolutely do not want. When the compiler detects UB any code that depends on it can be 'safely' deleted.

It's the sort of arrogant attitude that's behind driving people away from C/C++

pjmlp · on Sept 19, 2022

What is driving people away from C and C++ are the CVE's that keep being the same for the last 40 years, despite everyone repeating that one only needs to be good enough developer and use the tools to avoid them.

The abuse of UB is the way how those 1980's C and C++ compilers that generated lousy code easly outperformed by Assembly coders on 8 and 16 bit home computers, finally improved their code generation quality due to the lack of strong type information for the compiler.

So where we are, trying to escape bad decisions from the past.

samus · on Sept 19, 2022

There are two target groups for a C/C++ compiler:

1) embedded or OS hackers: they want code generation to be predictable and dislike surprising optimizations. To them, undefined behavior is behavior outside the standard specific to certain compilers that they expect to not suddenly change.

2) application and especially HPC developers. They want the compiler to exploit every trick in the literature to improve performance. In return, they are aware that undefined behavior cannot be relied on.

Any compiler that is popular with both crowds would have to strike a compromise.

foota · on Sept 19, 2022

This is a misunderstanding of how compilers work. Compilers don't actively go out of their way to erase code that it thinks should be skipped, but rather as a consequence of multiple optimizations, code may end up being compiled differently than intended.

void Foo(Bar * bar) { if (bar == nullptr) { println("Bar is null!"); } return bar->ComputeFoo(); }

E.g., one pass may say "the println code is unreachable unless a null pointer is dereferenced", and the next may say "the only code that is reachable is `return bar->ComputeFoo()`", so just compile the function as that.

Imagine instead of a println the code is actually some large body of code, and that the function is inlined into another where bar is null. In that case you'd want the compiler to avoid compiling the code, but it can't without those passes.

pjmlp · on Sept 19, 2022

Compilers for languages like Ada or Eiffel will usually disregard optimization algorithms that might go into "this might crash your airplane" kind of territory, whereas in C and C++ land whatever, as long as we get a few more μs, who cares.

titzer · on Sept 19, 2022

Indeed, the mindset is the key issue. In JVM land, where I learned most of my compiler ideas, the assumption is "optimization is not observable". JVMs move heaven and earth to make it appear as if they were simple interpreters just running the bytecodes one by one. The C/C++ world is the only one I can think of where this is manifestly not the case: optimizations are routinely observable and an endless source of confusion and frustration for developers. In the presence of even a single bug in the application, suddenly UB results in the veritable gates of hell opening and the entire lower world of the guts of abstractions comes spilling out. Woe to those who would try to debug at this level, for they fight demons of every form.

logicchains · on Sept 19, 2022

It's absolutely an optimisation that people trying to achieve maximum performance from C/C++ expect. It's impossible to have the same degree of performance without such optimisations, hence why new language Zig has even more undefined behaviour than C.

AndyKelley · on Sept 20, 2022

It seems that the semantics of undefined behavior are elusive to many people in the industry, so allow me to clarify some misconceptions.

1. Zig does not have more undefined behavior than C. Much like there are two kinds of complexity, accidental, and essential, C has multiple kinds of undefined behavior: accidental, and essential. Accidental UB is stupid shit like "if your file doesn't end with a newline, UB occurs". Essential UB is things like, if the memory of a local variable is changed by /proc/mem by another process, while a function is evaluated, UB occurs. Essential UB allows basic, essential optimisations to take place that everyone expects every language & compiler to be able to perform. C has a large amount of accidental UB; Zig has none.

2. A Zig application decides what to do when a safety check triggers by overriding the panic handler. Zig's default panic handler crashes with a helpful stack trace. This is a killer feature.

3. I see a lot of people talking ignorantly about safety critical applications. Let's talk about Level A Clearance. This is software that is licensed to run on airplanes and other safety critical components in the United States. Here's how it works: you have to test every error condition and every branch at the machine code layer. This makes a simpler language such as C or Zig more well-suited than language with hidden control flow such as C++ or Rust, because it causes problems for testing every branch at the machine code level. Furthermore, such components are redundant, so that when one fails, the readings of the others are used. So, crashing or otherwise indicating a faulty reading is absolutely what you want safety-critical software to do, as opposed to giving a well-defined, incorrect reading due to, for example, an integer overflow, which can happen in "safe" Rust.

abainbridge · on Sept 19, 2022

From the Zig language reference: Zig has many instances of undefined behavior. If undefined behavior is detected at compile-time, Zig emits a compile error and refuses to continue. Most undefined behavior that cannot be detected at compile-time can be detected at runtime. In these cases, Zig has safety checks. [...] When a safety check fails, Zig crashes with a stack trace

Gibbon1 · on Sept 20, 2022

> Zig crashes with a stack trace

I hope the people behind Zig understand that in critical applications that's totally unacceptable.

kristoff_it · on Sept 20, 2022

What exactly is the language supposed to do then. Let's say that you caused an integer overflow, or tried to perform an out of bounds access in a dynamic array. What's the program supposed to do if not crash.

If your point is that some software shouldn't crash, then yes, for sure. But that's on you to not make programming errors in your code.

In fact Zig does help you create software that doesn't crash like not many other programming languages do, for example by not having language features that rely on implicit memory allocations. This gives you the opportunity to always have a fallback strategy if a memory allocation fails.

kazinator · on Sept 19, 2022

You can use memcmp to check whether the pointer has not changed.

   void *oldptr = malloc(73);
   void *newptr = realloc(oldptr, 42);

   if (memcmp(&oldptr, &newptr, sizeof oldptr) == 0) {
     // not changed
   }

The value of oldptr is indeterminate if it is used as a pointer; it can still be accessed as an array of bytes.

amadvance · on Sept 19, 2022

Is a cast to uintptr_t also OK ?

  void *oldptr = malloc(73);
  uintptr_t savedptr = (uintptr_t)oldptr;
  void *newptr = realloc(oldptr, 42);
  if (savedptr == (uintptr_t)newptr) {
    // not changed
  }

munch117 · on Sept 19, 2022

That is OK, at least in the sense that there's no undefined behaviour.

I'm not sure it's also OK, because the memcmp suggestion you are replying to seems a bit suspect.

There's still the problem that comparing the uintptr_t's is not guaranteed to yield the same result as the pointers they were cast from. But that's merely implementation-specific behaviour, not undefined.

kazinator · on Sept 19, 2022

I think it's fine on the assumption that a uintptr_t converted from a pointer cannot be a trap representation, which is the case on mainstream platforms.

(An unsigned type can only have trap patterns if it has padding bits. Every combination of the value bits is a valid value according to the pure binary encoding.)

amag · on Sept 19, 2022

ASFAIU the problem was that the application in question kept using oldptr, e.g.:

   int *oldptr = (int *)malloc(42);
   int *newptr = (int *)realloc(oldptr, 73);

   if (oldptr == newptr) {
     newptr[70] = 10; // ok because the "object" newptr can contain 73 ints
     oldptr[70] = 10; // SIGABRT here because the "object" oldptr can only contain 42 ints even though the memory block oldptr points to can hold 73
   }

rurban · on Sept 19, 2022

wrong. you need to save away old ptr and assign to the realloc 1st arg. memcmp is also wrong, compare the values, so that the compiler knows about it.

    void *ptr = malloc(73);

    void *old = ptr;
    void *ptr = realloc(ptr, 42);
    bool realloced = old != ptr;

kazinator · on Sept 19, 2022

I don't quite see what you're getting at, but I do know you can't declare ptr twice in the same scope in C; did you miss a curly brace somewhere to open a new scope, or else want to use different names?

Not comparing the values is the point. Your code uses the pointer that was passed to realloc, and that is undefined behavior according to ISO C.

The game you're playing with the identifiers is pointless; there is no difference between what you're trying to do and just:

   void *oldptr = malloc(73);
   void *newptr = realloc(oldptr, 42);
   bool realloced = oldptr != newptr; // undefined behavior

That's what the article is referring to, and that I'm specifically addressing with the memcmp. Accessing the pointer as an array of bytes doesn't use its value as a pointer. Bytes cannot be indeterminate; they are not allowed to have trap representations.

(There could be a false negative: the address didn't change, but the pointer bit pattern did. On 64 bit systems, the C library could easily put a tag into the upper bits of the pointer, and have realloc change the tag even if the address is the same.)

The problem described in the article is that the compiler generated a false negative even when the pointer didn't change, due to the undefined behavior. The idea is something like that since oldptr was passed to realloc, it is garbage. The newptr is good, and we need not compare garbage to non-garbage; we can just declare them to be unequal.

That's what you might get if your follow your advice of "compare the values, so that the compiler knows about it".