From the article: > Dereferencing a nullptr gives a segfault (which is not a sec...

_wmd · on April 23, 2019

Linux hit a related situation: a harmless null pointer dereference was treated by GCC as a signal that a subsequent isnull test could not be true, causing the test to be optimized away. https://lwn.net/Articles/575563/

mjevans · on April 23, 2019

My opinion on that, is that such code MUST NOT be optimized away. Instead it should be a compile error.

raphlinus · on April 23, 2019

You might wish for that, but the ship has sailed. Undefined behavior means that the implementation can do whatever it can. That said, I do expect tools, both sanitizers and static analyzers to improve to detect more of these kinds of cases.

lmm · on April 23, 2019

The original intention of standardization was that compilers would gradually reach consensus on what the behaviour in certain cases should be, and once that happened the standard would be updated to standardize that behaivour. Compilers are allowed - indeed encouraged - to provide additional guarantees beyond the minimum defined in the standard (indeed part of the point of UB is that a compiler is allowed to specify what the behaviour is in that case).

umanwizard · on April 23, 2019

Well, not exactly. There are things that are UB according to the standard but that particular compilers give an option to make defined: see `-fwrapv`, for example.

AnimalMuppet · on April 23, 2019

There have been static analyzers that will detect this for years. They report "check for null after use" or some such.

umanwizard · on April 23, 2019

The problem, as far as I understand it (though I’m a layman), is that by the time the dead code optimization pass runs, the code has been transformed so much that there’s no obvious way for the compiler to tell the difference between “obvious programmer-intended null check that we shouldn’t optimize out” and “spurious dead code introduced by macro expansion” or (in C++) “by template instantiation”.

mjevans · on April 23, 2019

Couldn't user defined branches be tagged by such a compiler and if a tagged branch is eliminated the error generated with a reference to the tagged line in question?

umanwizard · on April 23, 2019

That is a good idea and I’ll admit that I’m not sure why it isn’t implemented.

int_19h · on April 23, 2019

Why should it be a compile error? The pointer may be null, but is not guaranteed to be.

If you mean that C++ should require a null check before dereferencing any pointer that is not guaranteed to be non-null, then that would break most existing C++ code out there, so it's a non-starter.

leetcrew · on April 24, 2019

in the particular situation they're talking about, you have a pointer to a struct, which you dereference by accessing one of its fields. the null check happens after the dereference, almost certainly a mistake.

kccqzy · on April 23, 2019

Absolutely. In many experience if clang can deduce a function will definitely trigger UB such as definitely dereferencing a null pointer, it generally optimizes the entire function after the reference into a single ud2 instruction. (Which raises the #UD exception in the CPU).

This is something really hardwired into the C and C++ language. Even if the underlying operating system perfectly supports dereferencing null pointers, compilers will always treat them as undefined behavior. (In Linux root can mmap a page of memory at address 0, and certain linker options can cause the linker to place the text section starting at address 0 as well.)

tedunangst · on April 23, 2019

The irony is it's mostly unsafe if you test for the null, such that the compiler can omit a test, but if there's no evidence the pointer can be null you just get a normal memory access. The optimizer is not optimized for most intuitive behavior.

hermitdev · on April 23, 2019

The null checks are only optimized away if you've already derefenced the pointer before the null check within a scope. Optimizer rationale being youve already derefenced it, so it must not be null, therefore the null check is unnecessary.

Also, you can "safely" dereference nullptr, just so long as you dont attempt to actually access the memory. C++ references are nothing more than a fancy pointer with syntactic sugar.

For example: int* foo = nullptr; int& bar = *foo; // no blow up std::cout << bar << std::end; // blowup here

My personal $0.02 is that the C++ standard falls short with language like "undefined/unspecified behavior, no diagnostic required." A lot of problems could be prevented if diagnostics (read: warnings) were required, assuming devs pay attention to the warnings, which doesnt always happen. For example: Google ProtoBuf has chosen to ignore at their own and clients' peril potential over/underflow errors and vulnerabilities by ignoring signed/unsigned comparison warnings.

tomjakubowski · on April 23, 2019

Dereferencing a null pointer to convert it to a reference causes undefined behavior, there's nothing safe about it!

"Note: in particular, a null reference cannot exist in a well-defined program, because the only way to create such a reference would be to bind it to the “object” obtained by dereferencing a null pointer, which causes undefined behavior."

_pd19 · on April 23, 2019

UB isn't "safe" so I'm unsure what your comment is getting at

hermitdev · on April 23, 2019

I guess the point I was trying to make is that what is referred to colloquially as dereferencing is different than how the compiler sees it. We see "foo" (can't get HN to emit the asterisk for pointer dereference here, no matter what I try ), and we know that to be UB, but the compiler doesnt really see it until the load. Until its actually used, its effectively a dead store and will be eliminated, anyway.

    int& bar = *foo;

Doesnt actually deference foo. No load is issued from the address stored by foo. Until you either load or store using bar, no null dereference has occurred.

Further if bar is never used, no actual dereference has occurred. In fact, there will be no assembly instructions emitted for the above statement because it is pure syntactic sugar. Pointers and references in C++ are the same, except with different syntax and the secret club handshake that references are presumed to never be null (but there are ways they can become null and thus the UB).

Edit: formatting, at least attempted

_pd19 · on April 23, 2019

The problem is that we don't know what the compiler might think..

If I write something along the lines of

  int& bar = *foo

  if(!foo) {
    // do something
  }

The compiler very well might (and would be perfectly within its rights to) completely eliminate everything inside of the if(!foo) since it can assume the pointer is non-null because it is being dereferenced.

ncmncm · on April 23, 2019

This is very definitely false. That is totally UB, launch-the-missiles stuff. Check your references before you repeat this silliness.

kevin_thibedeau · on April 23, 2019

Definitely not true. Consider an IoT device without an MMU.

AnimalMuppet · on April 23, 2019

Most of the ones of those I am familiar with had 0 as a non-writable address, so you'd still crash. [Edit: Though that's probably hardware specific, and the hardware was usually custom.] It might be called "bus error" or some such instead of "segfault", but it was pretty much the same behavior.

kevin_thibedeau · on April 23, 2019

Plenty of microcontrollers have a vector table at address 0. Best place to start injecting code.

AnimalMuppet · on April 23, 2019

Sure. The 68000 series did. But address 0 held the starting program counter, and address 4 held the starting stack pointer (or vice versa - it's been a while). Those two were usually mapped to ROM, because they had to have the right values even on cold boot. But that also meant that they weren't writable. So if you had a null pointer, you could read through it, but an attempt to write through it would give you a bus error.