The C++ Iceberg

camel-cdr · 2024-05-08T06:31:02.000000Z

I love how crazy C and C++ can be sometimes.

We did a C preprocessor meta programming iceberg a while back: https://jadlevesque.github.io/PPMP-Iceberg/

mog_dev · 2024-05-08T06:36:45.000000Z

I really like this interactive iceberg ! Have you made any other?

camel-cdr · 2024-05-08T18:09:10.000000Z

No we haven't, but it was largely inspired by https://suricrasia.online/iceberg/

bingo3131 · 2024-05-08T08:35:15.000000Z

For those who missed it: each item on the iceberg is actually a link that explains the topic in more detail.

H8crilA · 2024-05-08T07:30:48.000000Z

The most terrifying concept is still the good old undefined behavior. How is it possible that a simple `gets()` call can pop a reverse shell and empty your bank account? Or steal national secrets and potentially cause a war? Undefined truly means undefined, though few people realize the scope at first when they read those innocent sounding words:

The behavior is undefined

wolletd · 2024-05-08T07:42:04.000000Z

What I don't get is, why are we still adding more undefined behaviour to the standard?

For example, C++23 got std::unreachable. From my dips into Rust, I expected something similar to std::abort, terminating the program in some sane and possibly helpful way.

However, in C++, std::unreachable just invokes undefined behaviour when called. It's not usable as a guard from programming errors, it's just an optimization hint. You still have to write the guard yourself.

I'm left wondering about the use-cases for this.

tialaramex · 2024-05-08T08:18:03.000000Z

Rust has core::hint::unreachable_unchecked, which is an unsafe function that, since we promised never to call it, also promises never to return. This has Undefined Behaviour if you call it, since you formally promised not to (the consequence of unsafely promising something and then going back on it is usually Undefined Behaviour in Rust).

Rust does also have the safe macro core::unreachable which will panic if reached, but while the compiler can assume this probably won't happen and optimise accordingly, it usually can't know that it won't happen (and if it can your decision to add the macro may be unwise)

One reason C++ gets more and more Undefined Behaviour is consistency. The committee will, unless prompted hard not to, draw analogies to existing UB in the language and use that to justify making your new thing Undefined in the tricky cases. The results are pretty embarrassing.

flohofwoe · 2024-05-08T07:52:22.000000Z

std::unreachable is actually a quite useful optimisation tool. For instance if you are sure that a switch-case default branch is actually unreachable you can put a std::unreachable into the default-branch to hint the compiler that it may remove a range check in front of the switch-case jump table access (since you promised to the compiler that the default branch is never reached).

It's a double edged sword of course. If control flow actually hits the default branch, then all bets are off (because that means the code will access an out-of-range jump table slot and jump somewhere into the wilderness).

AFAIK compilers are free to perform something like Rust's panic when std::unreachable is actually hit, but that makes only sense in debug mode. Because in the above example, the compiler would need to add a range check to figure out if panic needs to be called and that would completely defeat the idea of removing that range check in the first place.

planede · 2024-05-08T08:24:42.000000Z

I recommend omitting the default case and putting std::unreachable() outside the switch and omitting the default label for the sole reason that compilers are more likely to warn for a missed case label this way (-Wswitch vs -Wswitch-enum in gcc/clang, the former is included in -Wall, the latter isn't included even in -Wextra).

This also allows expressing intent: no default label means that I meant to handle all cases, and having a default means that I opted into a fallback, please don't warn. That's probably why -Wswitch-enum isn't enabled by default, too many false positives without a convenient way to suppress the warning.

flohofwoe · 2024-05-08T08:36:01.000000Z

Hmm, how would that look like? Like this?

    if ((val < min) || (val >= max)) {
        __builtin_unreachable();
    }
    switch (val) {
        ...
    }

I haven't actually tried that, but if it works as intended it would actually be better yeah (not really in the case where I'm using it, because I'm switching on an integer, not an enum).

planede · 2024-05-08T09:22:51.000000Z

More like:

  switch (e) {
    case A:
      foo(); break;
    case B:
      bar(); break;
  }
  std::unreachable();

instead of:

  switch (e) {
    case A:
      foo(); break;
    case B:
      bar(); break;
    default:
      std::unreachable();
  }

The former is more likely to produce a warning if there is an enumeration C that you forgot to handle, or you added an enumeration C and missed a switch-case to update.

edit:

duh, it's supposed to be returns here instead of breaks.

flohofwoe · 2024-05-08T09:32:15.000000Z

But then all case branches hit the std::unreachable. How can that work in practice?

gravescale · 2024-05-08T10:13:18.000000Z

I think it's supposed to be a return not a break.

planede · 2024-05-08T10:16:32.000000Z

yeah, thanks! I added an edit.

H8crilA · 2024-05-08T22:03:54.000000Z

This is so funny that you have just demonstrated how stupid this construction is in practice by making a mistake.

BenFrantzDale · 2024-05-08T10:41:36.000000Z

Having it available means we can use it explicitly. For example, I could see a compiler flag making `std::vector<T>::operator[]` be checked and then if profiling warrants, remove the check by explicitly checking if my index is out of bounds and invoking UB. Not saying that’s the pattern people will use, but having an escape hatch makes safer-by-default behavior more approachable.

dailykoder · 2024-05-08T09:08:25.000000Z

>For instance if you are sure that a switch-case default branch is actually unreachable you can put a std::unreachable into the default-branch to hint the compiler that it may remove a range check in front of the switch-case jump table access

I guess they stole this idea from Terry Davis who implemented this in holyC. When you use square brackets it just doesn't do any range check. Terry, as always, been ahead of the times. Like so:

switch[abc] { [...] }

flohofwoe · 2024-05-08T11:31:50.000000Z

Tbf it's a fairly obvious idea when looking at the compiler output for a large switch-case statement.

shultays · 2024-05-08T08:27:28.000000Z

  What I don't get is, why are we still adding more undefined behaviour to the standard?

Because it allows flexibility for compilers to implement features efficiently on various platforms.

You can define undefined behavior if you wish. Make your own unreachable that prints an error and aborts gracefully when reached in debug builds, and calls std::unreachable in release builds.

chipdart · 2024-05-08T16:30:23.000000Z

> What I don't get is, why are we still adding more undefined behaviour to the standard?

Why do you believe this is some kind of problem? Can you explain in concrete terms what issue you have with undefined behavior?

> However, in C++, std::unreachable just invokes undefined behaviour when called.

It does, by design and as a testament to great design choices.

> It's not usable as a guard from programming errors, it's just an optimization hint. You still have to write the guard yourself.

I don't understand what point you tried to make. Nothing in your comment has any relation with undefined behavior. Instead, you're complaining that in your personal opinion different languages have similar-sounding features that work differently.

Oddly, you should really read up on std::unreachable, because one of the reasons explicitly stated in it's references is to "trap them to prevent further execution".

To explain what this means in no ambiguous terms, the standard ensures that a) std::unreachable is valid C++ and made available to the world to have the semantics to specify that a code path is unreachable, and b) allow everyone to handle that as they see fit by flipping flags in the build system.

Think about it for a second. You want to call std::abort when std::unreachable is hit. Great, go with that. I don't, and instead I want the compiler to optimize away that code path and anything it would touch, but on debug builds I want it to output a warning and trigger a breakpoint. Have you noticed that all these cases introduce behavior that's entirely different, arbitrary, and implementation-defined? How do you, as a standards committee, get to cover all possible use cases?

By specifying it as undefined behavior.

https://en.cppreference.com/w/cpp/utility/unreachable

ahahahahah · 2024-05-16T06:27:09.000000Z

> entirely different, arbitrary, and implementation-defined? > By specifying it as undefined behavior.

No. If you wanted implementation-defined behavior, you'd specify it as implementation defined.

elsjaako · 2024-05-08T08:24:02.000000Z

You might be interested in the compiler option -fsanitize=undefined. I think it works for gcc and clang. I don't think it catches all undefined behavior, but it catches some.

planede · 2024-05-08T08:29:35.000000Z

On libstdc++ std::unreachable() also reliably crashes if you define either _GLIBCXX_DEBUG or _GLIBCXX_ASSERTIONS. libc++ should have a similar macro. I expect MS STL to also reliably crash on debug builds here, as it's quite heavy on debug assertions in the standard library anyway by default (and debug and release builds are explicitly not ABI compatible there).

bun_terminator · 2024-05-08T07:43:29.000000Z

It's faster and/or easier to implement. Sometimes a lot

H8crilA · 2024-05-08T07:46:37.000000Z

And if you ever reach it then maybe the program will crash, or maybe demons will start flying out of your nose. No one really knows, and that's what makes undefined behavior exciting!

gpderetta · 2024-05-08T08:24:44.000000Z

-funreachable-traps

Xeamek · 2024-05-08T08:08:52.000000Z

It doesn't 'invoke' undefined behavior in the sense that it calls some function 'doRandomShit()'.

What happens is: Compilers sees unreachable and assumes that this codepath will never run, so it just yeet it out of existence. For example, maybe it will remove the preceding if-statement and rather then check its condition, just insert true or false. Afterall, why waste computing on checking condition if we know it will never resolve to the unreachable path.

In Rust, the same if-statement won't be optimized away, and if the code happens to go for the unreachable path, it will call panic.

Now, the example is a simplification, but it's just to demonstrate that UB comes from compilers (verry aggressively) assuming UB will never happen, and not because someone decided to "add it".

flohofwoe · 2024-05-08T07:41:59.000000Z

The UB mess only started when compiler writers decided it is ok to exploit UB for optimisations which then leads to wildly unpredictable behaviour when you actually hit UB.

But also: turn warnings to 11, use UBSAN, ASAN, TSAN and static analysers. These are all easy steps to make C and C++ code a lot more robust.

H8crilA · 2024-05-08T08:08:46.000000Z

No, UB has always been just undefined. `gets()`, one of the most exploited functions in the history of computers, does not require any clever malice from the compiler. And there are a lot of such examples. In fact practice has shown that it has always been very, very hard to write a program that cannot reach UB.

Thinking that you know what `undefined` really is results in undefined behavior.

mpweiher · 2024-05-08T09:10:40.000000Z

> No, UB has always been just undefined

This turns out not to be the case.

There actually is text in the current standard as to what acceptable behavior is when encountering undefined behavior.

This was turned into a "note" in later versions of the standard, but initially it was just as much a normative part of the standard as everything else.

The text is still there.

Compiler writers want to exploit UB, hence they say that this text effectively doesn't exist.

https://blog.metaobject.com/2018/07/a-one-word-change-to-c-s...

skitter · 2024-05-08T12:59:26.000000Z

Compilers don't claim that part of the standart doesn't exist, they just go eith option 1: Ignoring the situation. E.g. if one branch of an if-statement is UB, compilers ignore that that branch can happen and only need to compile the branch they know exists.

mpweiher · 2024-05-08T16:54:02.000000Z

No, that's not ignoring the situation. Ignoring the situation is compiling the code and performing whatever operation happens, for example an out of bounds access.

H8crilA · 2024-05-08T10:26:09.000000Z

What's the behavior of this piece of code according to the spec? You can choose any C/C++ spec:

   void f() {
      char x[100];
      gets(x);
   }

gpderetta · 2024-05-08T11:02:26.000000Z

The key here is the meaning of "ignoring" in "ignoring the situation completely with unpredictable results".

Most programmers (and pretty much all compiler writers) interpret it as meaning that the compiler can assume it can't happen (ignore the possibility) and translate accordingly.

Some, like mpweiher interpret it as meaning that the compiler must implement a trivial (but somehow unspecified) translation to the underlying hardware and let the behavior be what the hardware provides.

edit: We went through this many many times.

tialaramex · 2024-05-08T13:57:56.000000Z

This craziness (a belief that if you don't specify what something means it must have some "obvious" behaviour that is coincidentally always whatever the programmer wanted) is endemic in C and C++ communities.

Right now, notionally serious people are making the case that C++ should enshrine the "bag-of-bits" model of pointers which is exactly this sort of stupidity. At first it looks like they want pointers are addresses which is kinda dumb but whatever, and you keep reading and they're like aha, also I should be able to dereference my bag-of-bits and have that work too... Full blown Provenance Via Integers, expect to spend the next forty years trying to specify how any of the optimisations you rely on can survive the resulting tangle, say hi to anybody who wanted Consume ordering and is still waiting.

marssaxman · 2024-05-08T18:00:28.000000Z

> This craziness (a belief that if you don't specify what something means it must have some "obvious" behaviour that is coincidentally always whatever the programmer wanted) is endemic in C and C++ communities.

That belief is endemic because that is the way things actually worked for many years! Compilers used to give us the obvious behavior, because they weren't smart enough to outfox us by doing anything else, because the CPUs we used were not fast enough to let them do the kinds of comprehensive optimization analyses which are now routine.

gpderetta · 2024-05-08T19:12:12.000000Z

No. The reality is that every new generation of programmers is completely comfortable with optimizations that existed before it started programming and surprised by optimizations added after that. Today we take for granted SSA, extensive interprocedural optimizations, value-range propagation and LTO. Before that was aggressive inlining and SRA; before that was smart register allocation and stack optimizations. Even before that it was instruction scheduling.

Each one of these optimizations have broken someone's code.

mpweiher · 2024-05-08T16:59:35.000000Z

The idea that "undefined behavior can't happen" is ludicrous, because it happens all the time. The idea that most programmers also believe this nonsense is entirely unsupported.

It is very popular in a small range of very influential companies.

dwattttt · 2024-05-08T13:27:21.000000Z

> Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution ...

As much as I'm not a fan of the status quo, the argument that changing "Possible" back to "Permissible" isn't particularly solid; the rest of the paragraph makes it clear it's not an exhaustive list of behaviours, merely examples from a "range".

mpweiher · 2024-05-08T16:55:30.000000Z

Yes, I agree that there are better solutions, this one is just minimal.

The "range" clearly indicates that you can do something that's semantically in-between the options provided.

What current compilers do is nowhere near that range.

flohofwoe · 2024-05-08T08:27:24.000000Z

Just because 'undefined behaviour' means 'theoretically anything can happen' doesn't mean that it will happen in practice. You also don't need to summon the UB ghosts to delete all your files, a regular logic bug works just as well, even in a language without any UB.

In older compilers, UB behaviour was a lot more predictable because the code transformations done by optimizing compilers were by far not as complex as today.

In the C standard, 'undefined behaviour' is basically a catch-all phrase for a lot of different things: from badly designed stdlib function that miss obvious security checks (like gets()) to the compiler generating non-sensical code.

The latter is what's much more dangerous. It's trivial to just not use the C stdlib functions, but code generation problems caused by UB are much harder to catch.

H8crilA · 2024-05-08T08:33:49.000000Z

No, it's the former that's more dangerous, and it has been there from the beginning. Every time you dereference a bad pointer (already free'd, corrupted, badly casted, ...), every time you go over the bounds of some array, every time you read memory that hasn't been initialized, every time you (write) access the same memory from two threads without synchronization, etc. - this is the bread and butter of binary exploitation. Exploits are fully expected under C/C++ spec, after all the behavior is undefined.

flohofwoe · 2024-05-08T08:41:48.000000Z

But at the same time it is totally expected in C and C++ that range checking is the responsibility of the programmer, and most of those problems are trivial to catch with Valgrind or ASAN. E.g. yes, it is a problem, yes, 70% of all CVEs are caused by memory corruption, but the problem is totally obvious and trivial to debug. It's basically part of the contract when deciding to use C or C++.

Compared to such 'obvious' memory corruption problems caused by invalid pointers or missing range checks, obscure code generation issues caused by UB are much harder to identify and the behaviour may also change randomly between compiler versions - this is the actually scary thing about UB, and it's also a relatively "new" thing.

H8crilA · 2024-05-08T09:00:13.000000Z

The point is exactly that "it is totally expected", and it has caused massive losses over the decades. It has been abused for everything ranging from turning your friend's PC off, to ransomware, to state-on-state espionage and even causing nuclear incidents in Iran's enrichment facilities.

The behavior is undefined.

flohofwoe · 2024-05-08T09:14:11.000000Z

Sheesh, the sky didn't fall, the world didn't end, and humanity survived. UB won't end the world. Relax.

H8crilA · 2024-05-08T10:22:49.000000Z

Perhaps we're not in a discussion that's worth continuing, but I'm talking about programming language design. There are languages with UB, and there are languages without UB - and I think in 2024 the verdict is clear.

flohofwoe · 2024-05-08T11:28:36.000000Z

Maybe from your point of view. But from elsewhere UB can be a useful tool because it allows wiggle room for compilers to support a wide range of platforms without sacrificing efficiency. It doesn't have to be as crazy as in modern C/C++ compilers, but some sane UB definitely makes sense (even Rust has that inside unsafe blocks).

One could also argue that most UB in C/C++ should actually be IB, but that's a different discussion.

tialaramex · 2024-05-08T14:30:00.000000Z

This "Wiggle room" would be at most implementation defined and in some cases just unspecified values, these probably aren't desirable but yes they're less dangerous than UB.

Rust has UB if you don't obey the rules (Safe Rust doesn't because the rules are not your responsibility in Safe Rust, the unsafe superpowers allow you to do things the enforcement can't check but now all rules are your responsibility) but it's not somehow magically "more sane" than C++ it's ultimately lowering to the same LLVM IR that Clang uses. It's tolerable because you're only confronting the risk when you explicitly choose to do so. If you don't need to squeeze that last few drops of performance you can leave it. If you don't have the confidence, if your experts are too busy to review it, if you haven't had a proper night's sleep, you can write safe Rust today.

lolc · 2024-05-08T14:53:33.000000Z

> the problem is totally obvious and trivial to debug.

We live in parallel universes that must have intersected in this thread.

flohofwoe · 2024-05-08T16:56:55.000000Z

Debugging tools got a lot better in the last two decades, memory corruption issues and leaks are by far not as scary anymore as in the 90s and early 2ks, for instance check out the realtime memory debugging panel in Visual Studio that's been there since VS2015. Also as mentioned at the top of the thread: compiler warnings, ASAN, UBSAN, TSAN and static analyzers are essential these days when writing C or C++ code.

The only barrier is to actually know about and use those tools in the development workflow.

Rust is still superior of course (for memory safety at least), because it actually prevents most of those issues in the first place, but it's not like the rest of the world has been twiddling thumbs in the meantime.

Leherenn · 2024-05-08T22:03:36.000000Z

It certainly got better, but it's still not trivial. With the sanitizers, you need to have code that actually triggers the UB, coverage is not enough. If your UTF-8 parsing code UB only on some very specific input, then unless your tests or own environment happen to have this input, you'll be none the wiser.

Fuzz testing helps, but it's not commonly used in my experience.

Xeamek · 2024-05-08T08:15:13.000000Z

It's not like UB is some magic functions that gets called when encountering UB.

Every UB could verry well be wrapped in error checking and we could have 0 UB out there. We don't do that, but that's because of (our) choice, not some superficial entity that poses machine when running into UB.

gpderetta · 2024-05-08T08:27:23.000000Z

> Every UB could very well be wrapped in error checking

Not in C and C++. How would you check that a pointer is safe to dereference for example?

Xeamek · 2024-05-08T08:36:27.000000Z

What do you mean? Just check if it's not null?

gpderetta · 2024-05-08T08:55:03.000000Z

Not all non-null pointers are safe to dereference.

Xeamek · 2024-05-08T09:01:11.000000Z

Like?

flohofwoe · 2024-05-08T09:29:42.000000Z

On the hardware and assembly level, all unmapped memory regions are illegal to access and will cause a segfault.

In high level languages it gets a lot more complicated (look up "pointer provenance").

planede · 2024-05-08T09:36:12.000000Z

Like one that is free'd or point to a stack object that is no longer in scope. Or one that points one past the last element of an array.

Xeamek · 2024-05-08T09:42:17.000000Z

It's unsafe to do that, but it's not UB

planede · 2024-05-08T10:01:27.000000Z

C23:

> The unary * operator denotes indirection. If the operand points to a function, the result is a function designator; if it points to an object, the result is an lvalue designating the object. If the operand has type "pointer to type", the result has type "type". If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.

below in a note (emphasis mine):

> Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer, an address inappropriately aligned for the type of object pointed to, and the address of an object after the end of its lifetime.

Now the note is not normative, but I assume there are normative wording for defining what the "invalid" pointer values are, scattered around in the standard.

C++:

https://eel.is/c++draft/expr.unary.op#1.sentence-4

C++ is very particular in what it means for a pointer to point to an object, and this is also UB there.

flohofwoe · 2024-05-08T09:56:27.000000Z

I don't have all ca 200 UBs in my head (see https://gist.github.com/Earnestly/7c903f481ff9d29a3dd1), but I'm pretty sure all types of illegal memory accesses count as UB.

For instance from skimming the list:

- An object is referred to outside of its lifetime (6.2.4).

- The value of a pointer to an object whose lifetime has ended is used (6.2.4).

- An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression a[1][7] given the declaration int a[4][5]) (6.5.6).

planede · 2024-05-08T10:10:54.000000Z

Technically the array subscript example is UB even before the dereference. a[1][7] is equivalent to `*(a[1] + 7)` and a[1] + 7 itself is UB.

(6.5.7) Additive operators

> If the pointer operand and the result do not point to elements of the same array object or one past the last element of the array object, the behavior is undefined.

Interestingly a[1][5] is also UB, but for the dereferencing a past-the-end pointer, not for the arithmetic.

gpderetta · 2024-05-08T10:18:19.000000Z

Yes, that's exactly what I meant.

Also, dereferencing a pointer with the wrong dynamic type.

> all ca 200 UBs in my head

Technically there is no enumerated set of all possible UB. Anything not explicitly defined in the standard is UB.

/extremely-pedantic

tialaramex · 2024-05-08T14:39:45.000000Z

Effort to try to write an appendix listing all the UB and IFNDR is an ambition of WG21 but there's no specific timeline.

It's a bit like that "Migrate off obsolete DB server" card that's been sat in your team's (well most teams) backlog for a couple of years. Everybody agrees it should be done... But like not this sprint.

My assessment is that at their current rate WG21 is adding new UB to the language slightly faster than it's being documented for the appendix, so this won't actually finish.

mpweiher · 2024-05-08T09:12:53.000000Z

This turns out not to be the case.

Compilers have been known to remove exactly that sort of error checking code, as there was undefined behavior present.

It's a mess.

Xeamek · 2024-05-08T09:22:41.000000Z

Yes, I know it's not possible for a specific c++ dev to do such checks, as compilers can optimize them away.

What I meant is “we (as in) the c/c++ community (or rather the specification committee)” could tomorrow decide that, lets say, all null pointer dereferences should lead to program exit. Existence of UB is a choice.

bigcheesegs · 2024-05-08T10:19:43.000000Z

UB is a choice in that we could require symbolic execution of of all code. This is incredibly slow, so we don't.

There are many cases of UB that would be cheap to check, but there are many more that are incredibly expensive to check.

chipdart · 2024-05-08T16:09:33.000000Z

> The most terrifying concept is still the good old undefined behavior.

I think people tend to parrot undefined behavior as if it's some kind of gotcha when in practice this only means two things:

* Don't write code that you didn't bothered to understand what it does,

* You bothered to check what your implementation does and decided to write implementation-specific code for your own personal reasons.

By definition, each and every single programming language in existence which is not specified in an official reference document is comprised exclusively of undefined behavior. Why? Because none of it's expected behavior is defined. That's what these nonsensical discussions boil down to.

So why is this somehow expected to be such an insurmountable gotcha with C++ although the rest of the world uses languages which are entirely undefined without any concern?

Boggles the mind.

rstuart4133 · 2024-05-08T22:47:59.000000Z

> I think people tend to parrot undefined behavior as if it's some kind of gotcha when in practice this only means two things:

No, not really. In fact an excellent counter example can be found on this very iceberg: https://blog.regehr.org/archives/161

C++ at the some optimisation levels says the example program disproved Fermat's Last Theorem, which should come as a surprise as their are no known counter examples. The program is also valid C. When compiled with good C compilers at all optimisation levels (both gnu and clang count as good), the program never exits because it doesn't find a counter example. But the gnu and clang c++ compilers do behave as described. The difference is caused by what the C and C++ language standards consider to be UB.

Why? C++ (but not C) defines an infinite loop as undefined behaviour, and at high enough optimisation levels the compiler spots the infinite loop (which is the correct behaviour if the theorem is true) and goes with the other option. (As an aside, it's impressive it does detect this particular infinite loop.)

Why is that a problem? Well lots of programs are deliberate infinite loops. In practice many don't qualify as UB because of various conditions listed in the article, but you would have to be a language lawyer to know them. The author gives real life examples of how he was caught by the surprise removal of his loop.

Deliberate infinite loops are actually fairly common and are perfect correct code. Proving they are infinite is famously undecidable, so you can usually get away with it safe in the knowledge the compiler won't spot it. But compilers keep chipping away at the edge cases so one day you can find your program changes its behaviour drastically just because it was compiled with a new version of the compiler, or even just different optimisation options.

chipdart · 2024-05-09T11:11:55.000000Z

> No, not really. In fact an excellent counter example can be found on this very iceberg: https://blog.regehr.org/archives/161

I'm not sure you read the blog post you cited, because it states exactly this:

> In C, overflowing a signed integer has undefined behavior, and a program that does this has no meaning at all. It is ill-formed.

I think this is quite clear. This is exactly what I stated in my first example. I'm not sure why you missed that. It's the whole point of undefined behavior, and what people to confuse what it means.

Then there's the nuance that those who mindlessly parrot undefined behavior as some kind of gotcha clearly fail to understand, which is the fact that implementations (meaning, compiler vendors) are given free reign to implement whatever implementation-specific behavior they see fit. Why? Because the standard left tha behavior undefined, which also means the standard does not prevent implementations from defining their own behavior?

Do you understand the whole purpose of this? The blog post you cited clearly shows examples of C programs what use data types that are system- and implementation-specific, and thus overflow is left undefined because, just like the integer types, its behavior can and does differ among systems. It would be absurd to specify overflows should wrap around/saturate values/stay MAX/throw exceptions/terminate program because systems implemebt this differently. The role of a programming language is to write programs that machines run, and thus the programs need to express wha each machine does, nit what some other machine does.

I recommend you re-read what I wrote in my post to see the whole point of undefined behavior and its critical importance. It is high time that this misconception is put to rest.

rstuart4133 · 2024-05-13T00:50:29.000000Z

> I recommend you re-read what I wrote in my post to see the whole point of undefined behavior and its critical importance. It is high time that this misconception is put to rest.

Uhmmm, did you read what I said? I ask because it looks like a red haze descend after descended after you read the first few words. I've been writing C for 40 years. I know why C made integer overflows UB. It seemed perfectly sensible to me. That's why I didn't mention it. I'm not sure why you brought it up. In the article it says this:

    Third, there are no integer overflow games going on here, as long as the code is compiled for a platform where an int is at least 32 bits. This is easy to see by inspecting the program. The termination problems are totally unrelated to integer overflow.

The thing I was discussing was C++ defining a infinite loop as UB. I'm not alone in thinking that was a bad idea. In the C standardisation group apparently also thinks that, because they are UB in C. You can check that for yourself: compile the Fermat program with gnu C and it behaves sanely. Compile it with gnu C++ is it gets the wrong answer. clang behaves identically. The author of the article also thinks it's amazing:

    In other words, from the point of view of the language semantics, a loop of this form is no better than an out-of-bounds array access or use-after-free of a heap cell. This somewhat amazing fact ...

So yes, I'm firmly in the camp the C++ committee lost the plot here.

I think the way C and C++ handles UB belongs to a bygone era. In their defence when I was writing C 40 years ago there was no choice. Checking if every integer operation overflowed imposed an unacceptable performance penality. While today in languages like Rust compilers statically check for UB at compile time, back then C compilers were barely more than overblown assemblers. UB simply meant "you get what the hardware gives you". Notice that means on a given arch there was no real undefined behaviour. I knew perfectly well what would happen on x86 if an integer overflowed, I know whether a char was signed and I wrote my programs accordingly, sometimes deliberately exploiting what I knew would happen. UB only bit you hard when you ported your program.

What happened is compiler writers seeking to squeeze every last bit of performance pushed "undefined" to mean "since it's undefined, I can do whatever I damned well please", and with C++ and infinite loops they've managed to push that beyond absurdity. Meanwhile other languages have gone in opposite direction. Rather than giving programmers more rope to hang themselves with they've got runtime and compile time checks that either outright band undefined behaviour, or warn you about it.

qiqitori · 2024-05-08T08:14:07.000000Z

I don't think the standard (C99) uses the term "undefined" in its description of the gets() function. (Also gets has been removed from the standard in C11.) But maybe it uses the term to describe what happens when you change memory that shouldn't be changed. Well, what do you think the C standard should say in such a case, without knowledge of the system you're running on?

There are of course some functions where it would have been nice to allow passing in an invalid value and the function just returning its normal error or "no" return value. E.g. isalnum might be nicer if it didn't crash for input > 255. (But perhaps it's not possible in all implementations without adding some other constraints that didn't exist before.)

tialaramex · 2024-05-08T08:08:27.000000Z

Forget gets() - that's obviously a terrible idea, and most people see immediately why, it is no longer in modern C for example.

Consider that abs() of signed integers has UB! Because the signed integers aren't balanced it's tricky to decide what should happen if we abs() the most negative value of a particular size of signed integer. In a language like Rust you get to specify what you meant here, if you don't then you get a panic when it happens. But C++ instead gives this... Undefined Behaviour!

Or take division by zero. It's easy enough to say that you think the signed integers should have wrapping arithmetic. Doesn't make much sense, but hey, it's defined at least. However, even if you define overflow as wrapping, division by zero is a different problem. In Rust they can explicitly say this can't happen using NonZero types (e.g. NonZeroI16 is a 16-bit signed integer, but it can't be zero) and get the high performance machine code, but in C++ to get that same code they just say division by zero is Undefined Behaviour.

planede · 2024-05-08T09:39:10.000000Z

How does arithmetic work with NonZeroI16 and remain efficient? What's the codegen of adding/subtracting two NonZeroI16 values?

Measter · 2024-05-08T13:51:07.000000Z

They don't support general arithmetic. They do have member functions for abs, multiplication, negation and power operations, but they are all checked/saturating if they can result in 0.

If you want to do actual arithmetic, you have to get the internal i16, perform the arithmetic, then rewrap it.

https://godbolt.org/z/T5e36zzM5

tialaramex · 2024-05-08T10:48:43.000000Z

Whether we can efficiently do other arithmetic depends on what we're doing, but that's not necessarily on our critical path even if the division is.

nneonneo · 2024-05-08T08:23:44.000000Z

gets() might be (mostly) gone, but scanf(“%s”, buf) still exists and produces no warnings or errors from the compiler.

DonHopkins · 2024-05-08T06:34:27.000000Z

The PHP Hammer would be the perfect tool to chip away at the C++ Iceberg with!

https://blog.codinghorror.com/the-php-singularity/

fsloth · 2024-05-08T10:21:16.000000Z

Popularized articles about C++ are very hit or miss. It’s partly because the language is so hit and miss in it’s non-design.

The most glaring error in the iceberg IMO is the claim that shared_ptr is an antipattern.

Shared_ptr is brilliant for immutable data in parallel programs. It’s not an antipattern. It’s useless for sure as a building block for data structures and other linked data with potential for circular references.

But really, if a resource is a) immutable and b) needed in a shared context then a thin wrapper around a map of shared pointers or such saves the pain of needing to implement a much more complex resource management scheme.

BenFrantzDale · 2024-05-08T10:35:34.000000Z

As a C++ user, shared_ptr is great for some things, but it is an anti-pattern. Shared_ptr<const T> is much much better. The problem is that shared_ptr<T> isn’t value-semantic and so destroys local reasoning. That said, there are places for it, but it’s very easy to make a mess with it.

I’m a huge gain of stlab::copy_on_write<T>, which is fundamentally very similar but which is value-semantic, doesn’t let you make loops, and gives You local reasoning.

Deeg9rie9usi · 2024-05-08T07:29:19.000000Z

Dupes: https://news.ycombinator.com/item?id=40273141 https://news.ycombinator.com/item?id=39384093

jrpelkonen · 2024-05-08T17:46:33.000000Z

Maybe I’m missing a joke here but to me “godbolt is a real person” seems way out of place. Compiler Explorer is a great contribution to the community and he gives informative and humorous presentations. I would like to know why he got dragged into this list.

bingo3131 · 2024-05-08T18:03:14.000000Z

A lot of programmers refer to the compiler explorer as Godbolt due to the domain name, such as saying "Godbolt link", "try it on Godbolt", etc. They do not realise that the domain name is Godbolt.org not because "Godbolt" is the name for the compiler explorer, but that the creator/maintainer's name is... Matthew Godbolt.

I personally found out the domain was named after the creator when I saw a CppCon talk about the explorer from him!

linhns · 2024-05-12T18:29:50.000000Z

No seriously, my colleagues when I told them though he was a comic book character.

bun_terminator · 2024-05-08T06:26:55.000000Z

A lot of classic c++isms, some good ones. constexpr weirdness is among my top choices

adrian_b · 2024-05-08T07:51:35.000000Z

I agree, but the "constexpr" weirdness links to a video with a clickbait title.

You are expected to watch most of the video, before learning that the purpose of the video is to argue that in almost all cases one should use "static constexpr" and not just "constexpr".

The argumentation can be interesting, but it would have been nicer to just state the conclusion from the beginning, allowing a decision of whether it is desirable to spend time to also know the reasons for this.

WalterBright · 2024-05-08T07:48:18.000000Z

> A good way to think about constant expressions is that `constexpr` is a hint that lets the compiler know that the variable or function CAN be evaluated at compile time, but it is not required

In D, CTFE (Compile Time Function Evaluation) is guaranteed to run at compile time. CTFE happens whenever there's a constant-expression in the grammar, such as the initialization of an enum:

    enum A = func(3);

func(3) evaluates at compile time.

bingo3131 · 2024-05-08T08:30:16.000000Z

Calling it a hint is very much over-simplifying things as there are also restrictions on what a constexpr function is and is not allowed to do, although those restrictions become fewer and fewer as time goes on as compilers are so advanced that they can emulate more system level things at compile time.

Also, constexpr on a variable is not a hint - it must be evaluated at compile time, where-as a constexpr function will be evaluated at either compile time or runtime depending on the context and arguments passed to it. For a function that you either want to work at compile time or not at all (compile error), consteval is the hardened version.

Yes, sadly this means "constexpr" variables and "consteval" functions must be calculated at compile time, while "constexpr" functions can be used both at compile time and runtime. An annoying distinction.

WalterBright · 2024-05-08T16:15:17.000000Z

In D, any function can be used to evaluate at compile time. If it is used for a constant-expression and cannot be evaluated at compile time, it is a compilation error. Also, the entire function needn't be evaluatable at compile time - only the path taken through it.

Simply no need for tagging them one way or another.

axegon_ · 2024-05-08T07:49:10.000000Z

Kudos to whoever made this. No, really, thanks for making me laugh, I needed this.

yu3zhou4 · 2024-05-08T07:41:32.000000Z

It would be great to get an explanation for each of items here for non-C++ devs

QuackyTheDuck · 2024-05-08T08:31:53.000000Z

Just click on a term to be redirected to an explanation.

bmoxb · 2024-05-08T11:48:31.000000Z

I don't see why std::optional being a monad is implied to be a bad/confusing thing? It would be significantly more awkward to use without any monadic operations on offer.

ben-schaaf · 2024-05-08T07:57:43.000000Z

There's also the Static Initialization Order Fiasco.

dailykoder · 2024-05-08T07:01:08.000000Z

Damn, I should write more C++ again. I am missing so much fun. (Edit: I am not sure myself if I mean this sarcastic or not)

fransje26 · 2024-05-08T11:25:00.000000Z

A form of Stockholm syndrome, perhaps? ;-)

mgaunard · 2024-05-08T06:45:33.000000Z

inline definitely means inline, as in the definition is provided in-line with the declaration.

People are just confusing it with the inlining optimization that a compiler might do.

tylerhou · 2024-05-08T06:52:17.000000Z

It’s more precise to say that inline means “this definition might be duplicated.” Non-inline functions also are allowed to have their definitions provided inline with the declaration; the difference is that the linker will error if it encounters a duplicate non-inline definition.

josemanuel · 2024-05-08T07:06:04.000000Z

Are you thinking of "weak" rather than "inline"? I always thought of "inline" as a hint to the compiler -- the compiler can inline, but it may not, depending on some compiler-specific optimisation objective (execution speed, code size, ...).

tylerhou · 2024-05-08T07:47:06.000000Z

No, I am not thinking of weak; this is why “inline does not mean inline” is on the iceberg :). Although practically speaking inline is also used as a hint. But chiefly it refers to inline linkage.

I’m actually also not familiar with weak, but a quick glance suggests that it might not mean the same thing as inline. Can weak symbols be defined multiple times? EDIT: yes, according to an SO post compilers often use weak to implement inline.

mgaunard · 2024-05-08T07:09:02.000000Z

You thought wrong.

Marking the symbol as weak is what inline indeed does from a technical point of view.

mgaunard · 2024-05-08T07:08:33.000000Z

I fail to see how it's more precise. Obviously if the definition is inline, it's duplicated in every point where the declaration is imported.

Beginners fail to understand for example that providing a class member function definition within the class definition makes it inline, but that's already what you're doing textually, and that's what the keyword means.

wolletd · 2024-05-08T07:35:03.000000Z

You can declare and inline-define a function in a CPP file and you don't need inline because there are no other places that can import it.

It's only needed for functions defined in header files where multiple definitions may exist after compiling object files.

gpderetta · 2024-05-08T09:00:18.000000Z

If you textually define a function "inline" in an header file yet do not mark it inline, you are duplicating it in every point where the declaration is imported, yet, not being inline for linkage purposes, you will get compilation errors.

Member and template functions are special cased to be implicitly inline.

mgaunard · 2024-05-08T09:21:14.000000Z

That's not an inline definition, that's a normal definition, which should never be in a header file.

Non-inline definitions need to be in their own translation unit.

gpderetta · 2024-05-08T10:15:16.000000Z

Of course, but that disproves that an "inline definition is a definition that is provided inline with a declaration" [in an header file]. An inline definition is a definition that is marked inline, implicitly or explicitly. Whether it is defined together with the declaration or in an header vs translation unit is immaterial.

mgaunard · 2024-05-08T11:44:37.000000Z

I don't understand how that disproves it; your code is not well-formed.

kiitos · 2024-05-12T22:01:27.000000Z

Are you saying that code defined in header files is "not well-formed"?

112233 · 2024-05-08T07:49:36.000000Z

Definition of inline members definitely does not have to appear with declaration (and usually does not for legibility reasons).

    struct X { inline void f(); };

is a perfectly valid inline.

mgaunard · 2024-05-08T09:26:08.000000Z

that's a forward declaration, you'll need to provide the inline definition in the same translation unit.