Hacker News new | past | comments | ask | show | jobs | submit login

From the article:

> Dereferencing a nullptr gives a segfault (which is not a security issue, except in older kernels).

I know a lot of people make that assumption, and compilers used to work that way pretty reliably, but I'm pretty confident it's not true. With undefined behavior, anything is possible.




Linux hit a related situation: a harmless null pointer dereference was treated by GCC as a signal that a subsequent isnull test could not be true, causing the test to be optimized away. https://lwn.net/Articles/575563/


My opinion on that, is that such code MUST NOT be optimized away. Instead it should be a compile error.


You might wish for that, but the ship has sailed. Undefined behavior means that the implementation can do whatever it can. That said, I do expect tools, both sanitizers and static analyzers to improve to detect more of these kinds of cases.


The original intention of standardization was that compilers would gradually reach consensus on what the behaviour in certain cases should be, and once that happened the standard would be updated to standardize that behaivour. Compilers are allowed - indeed encouraged - to provide additional guarantees beyond the minimum defined in the standard (indeed part of the point of UB is that a compiler is allowed to specify what the behaviour is in that case).


Well, not exactly. There are things that are UB according to the standard but that particular compilers give an option to make defined: see `-fwrapv`, for example.


There have been static analyzers that will detect this for years. They report "check for null after use" or some such.


The problem, as far as I understand it (though I’m a layman), is that by the time the dead code optimization pass runs, the code has been transformed so much that there’s no obvious way for the compiler to tell the difference between “obvious programmer-intended null check that we shouldn’t optimize out” and “spurious dead code introduced by macro expansion” or (in C++) “by template instantiation”.


Couldn't user defined branches be tagged by such a compiler and if a tagged branch is eliminated the error generated with a reference to the tagged line in question?


That is a good idea and I’ll admit that I’m not sure why it isn’t implemented.


Why should it be a compile error? The pointer may be null, but is not guaranteed to be.

If you mean that C++ should require a null check before dereferencing any pointer that is not guaranteed to be non-null, then that would break most existing C++ code out there, so it's a non-starter.


in the particular situation they're talking about, you have a pointer to a struct, which you dereference by accessing one of its fields. the null check happens after the dereference, almost certainly a mistake.


Absolutely. In many experience if clang can deduce a function will definitely trigger UB such as definitely dereferencing a null pointer, it generally optimizes the entire function after the reference into a single ud2 instruction. (Which raises the #UD exception in the CPU).

This is something really hardwired into the C and C++ language. Even if the underlying operating system perfectly supports dereferencing null pointers, compilers will always treat them as undefined behavior. (In Linux root can mmap a page of memory at address 0, and certain linker options can cause the linker to place the text section starting at address 0 as well.)


The irony is it's mostly unsafe if you test for the null, such that the compiler can omit a test, but if there's no evidence the pointer can be null you just get a normal memory access. The optimizer is not optimized for most intuitive behavior.


The null checks are only optimized away if you've already derefenced the pointer before the null check within a scope. Optimizer rationale being youve already derefenced it, so it must not be null, therefore the null check is unnecessary.

Also, you can "safely" dereference nullptr, just so long as you dont attempt to actually access the memory. C++ references are nothing more than a fancy pointer with syntactic sugar.

For example: int* foo = nullptr; int& bar = *foo; // no blow up std::cout << bar << std::end; // blowup here

My personal $0.02 is that the C++ standard falls short with language like "undefined/unspecified behavior, no diagnostic required." A lot of problems could be prevented if diagnostics (read: warnings) were required, assuming devs pay attention to the warnings, which doesnt always happen. For example: Google ProtoBuf has chosen to ignore at their own and clients' peril potential over/underflow errors and vulnerabilities by ignoring signed/unsigned comparison warnings.


Dereferencing a null pointer to convert it to a reference causes undefined behavior, there's nothing safe about it!

"Note: in particular, a null reference cannot exist in a well-defined program, because the only way to create such a reference would be to bind it to the “object” obtained by dereferencing a null pointer, which causes undefined behavior."


UB isn't "safe" so I'm unsure what your comment is getting at


I guess the point I was trying to make is that what is referred to colloquially as dereferencing is different than how the compiler sees it. We see "foo" (can't get HN to emit the asterisk for pointer dereference here, no matter what I try ), and we know that to be UB, but the compiler doesnt really see it until the load. Until its actually used, its effectively a dead store and will be eliminated, anyway.

    int& bar = *foo;
Doesnt actually deference foo. No load is issued from the address stored by foo. Until you either load or store using bar, no null dereference has occurred.

Further if bar is never used, no actual dereference has occurred. In fact, there will be no assembly instructions emitted for the above statement because it is pure syntactic sugar. Pointers and references in C++ are the same, except with different syntax and the secret club handshake that references are presumed to never be null (but there are ways they can become null and thus the UB).

Edit: formatting, at least attempted


The problem is that we don't know what the compiler might think..

If I write something along the lines of

  int& bar = *foo

  if(!foo) {
    // do something
  }

The compiler very well might (and would be perfectly within its rights to) completely eliminate everything inside of the if(!foo) since it can assume the pointer is non-null because it is being dereferenced.


This is very definitely false. That is totally UB, launch-the-missiles stuff. Check your references before you repeat this silliness.


Definitely not true. Consider an IoT device without an MMU.


Most of the ones of those I am familiar with had 0 as a non-writable address, so you'd still crash. [Edit: Though that's probably hardware specific, and the hardware was usually custom.] It might be called "bus error" or some such instead of "segfault", but it was pretty much the same behavior.


Plenty of microcontrollers have a vector table at address 0. Best place to start injecting code.


Sure. The 68000 series did. But address 0 held the starting program counter, and address 4 held the starting stack pointer (or vice versa - it's been a while). Those two were usually mapped to ROM, because they had to have the right values even on cold boot. But that also meant that they weren't writable. So if you had a null pointer, you could read through it, but an attempt to write through it would give you a bus error.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: