Hacker News new | past | comments | ask | show | jobs | submit login
ISO C became unusable for operating systems development (arxiv.org)
203 points by pcw888 on Jan 21, 2022 | hide | past | favorite | 475 comments



I had an online discussion some years back where I suggested that C nail the size of char to 8 bits. He responded that there was a CPU that had chars be 32 bits, and wasn't that great that a C compiler for it would be Standard compliant?

I replied by pointing out that nearly every non-trivial C program would have to be recoded to work on that architecture. So what purpose did the Standard allowing that actually achieve?

I also see no problem for a vendor of a C compiler for that architecture making a reasonable dialect of C for it. After all, to accommodate the memory architecture of the x86, nearly all C compilers in the 80's adopted near/far pointers, and while not Standard compliant, it really didn't matter, and was tremendously successful.

D made some decisions early on that worked out very well:

1. 2's complement wraparound arithmetic

2. sizes of basic integer types are fixed at 1 byte for chars, 2 for shorts, 4 for integers, 8 for longs. This worked out very well

3. floating point is IEEE

4. char's are UTF-8 code units (*)

5. chars are unsigned

These 5 points make for tremendous simplicity gains for D programmers, and ironically increase portability of D code.

After reading the paper, I'm inclined to change the definition of UB in D to not mean it can be assumed to not happen and not be unintended.

(*) thanks for the correction


> So what purpose did the Standard allowing that actually achieve?

I believe the situation was that there were C implementations for DSPs (32-bit-addressable only) and IBM mainframes (36-bit addressable only), and when ANSI/ISO C was established, they naturally wanted their implementations to be able to conform to that new standard. So the standard was made flexible enough to accommodate such implementations.

Similarly why signed overflow is undefined behavior. There were existing implementations that trapped (CPU interrupt) on signed overflow.

I might have gotten the details wrong, but that's what I remember from reading comp.std.c (Usenet) in the 90s.


I had another such discussion where I suggested that C abandon support for EBCDIC. I was told it was great that C supported any character set! I said C certainly does not, and gave RADIX50 as an example.

How many C programs today would work with EBCDIC? Zero? There's no modern point in C not requiring ASCII, at a minimum.


FWIW, C++ dropped support for EBCDIC when it dropped trigraphs. IBM complained till the last minute.


What is it about RADIX50 that makes it incompatible with the C standard? The lack of a reserved null value?


https://en.wikipedia.org/wiki/DEC_RADIX_50

Note the lack of lower case, characters like { } ( ) [ ] * : & | # !, and on and on.


Oh, as a source file encoding. Well, it lacks the characters needed for trigraphs, so maybe we could add support with a quadgraphs extension...


> I was told it was great that C supported any character set

I'm all for extensibility mechanisms to shut down the people like this. You want EBCDIC? It's now a github repository; go ahead and maintain it, or shut up.


> char's are UTF-8 code points

I'm guessing you mean that char is a UTF-8 code unit as you keep saying they're only one byte and a code point is far too large to fit in a byte / octet.

But that still seems very weird because a UTF-8 code unit is almost but not quite the same as a byte, so that users might be astounded when they can't put a byte into a char in this scheme (because it isn't a valid UTF-8 code unit) and yet almost no useful value is accrued by such a rule.


Ah, code unit is correct. I am always getting code unit and code point conflated. Sorry about that.

> a UTF-8 code unit is almost but not quite the same as a byte

D dodges that problem by having `byte` for signed 8 bits, and `ubyte` for unsigned 8 bits. That gets rid of the "is char unsigned or signed" confusion and bugs.


Aha, so if my D code doesn't care about text at all, I only need to worry about ubyte (or if I want them signed, byte) and char is irrelevant to me. That makes some sense.

Still it seems weird to me to separate the concerns "This byte isn't a valid UTF-8 code unit" and "This string isn't valid UTF-8". I don't know what flavour of error handling D has, but I can't conceive of too many places where I want to handle those cases differently, so if they're separate I have to do the same work twice.

Also, one very natural response to invalid UTF-8 is to emit U+FFFD (the replacement character �) but of course that doesn't fit into a UTF-8 code unit, so an API which takes a ubyte and gives back a char must error.


I think you'll find that once you get used to the idea that char is for UTF-8, and ubyte is for other representations, it not only feels natural but the code becomes clearer.

As for invalid code points, you get to choose the behavior:

1. throw exception

2. use the replacement char

3. ignore it

All are valid in real life, and so D is accommodating.


What is D's type for a unicode character (code point?)


A `string`, which is a typedef for `const(char)[]`, or a readonly view of an array of characters.


Note that because of the above definition of char (any UTF-8 code unit) these "strings" aren't necessarily valid UTF-8 and you might be better off with its dchar type (which is UTF-32 and thus can hold an entire Unicode code point)

It's unclear to me whether there's actually enforcement to ensure char can't become say 0xC0 (this byte can't occur in UTF-8 because it claims to be the prefix of a multi-byte sequence, yet it also implies that sequence should only be one byte long... which isn't multi-byte)

I spent a while reading the D website and came away if anything more confused on this topic.


Checking to see if the string is valid UTF-8 requires code to execute, which gets in the way of performance. Therefore, you can check with functions like:

https://dlang.org/phobos/std_utf.html#validate

when it is convenient in your code to do so.


I don't see much value in a distinct char type which has no real semantics beyond being an unsigned byte.

Calling validate on a whole string makes at least some sense, I don't love it, but it's not crazy. But it's obviously never worth "validating" a single char, so why not just call them bytes (or ubytes if that's the preferred D nomenclature)


There are some extra semantics that strings have in D, but you have a point. Why make it special?

It's because strings are such a basic thing in programming, that baking them into language has a way of standardizing them. Instead of everyone having their own string type, there is one string type.

One of the problems with strings not being a builtin type is what does one do with string tokens? (C++ has this problem with its library string type.) Being an array of bytes in a function signature gives no clue if it is intended to be a string, or an array of other data.

It's just a big advantage to building it in, and making it interact smoothly with the other core language features.


> I had an online discussion some years back where I suggested that C nail the size of char to 8 bits. He responded that there was a CPU that had chars be 32 bits, and wasn't that great that a C compiler for it would be Standard compliant?

Back in C infancy days, there had existed architectures where a byte could hold 9 bits that C compilers had to be written for. The 36-bit PDP-10 architecture springs to mind, and some Burroughs or Honeywell mainframes had those – I remember reading a little C reference book authored by Kernighan, Ritchie and somebody else explicitely calling out the fact that a C implementation could not rely on the fact of the byte always being 8 bits long and also stressing that the «sizeof» operator was reporting the number of bytes in a type irrespective of the bit width of the byte.

9 bit byte architectures have all but perished, however, C has carried the legacy of creative days of the computer architecture design along.


I know. I've programmed on 36 bit machines (the amazing PDP-10), and machines with 10 bit bytes (Mattel Intellivision).

But that was over 40 years ago.

Time to move on.


> After reading the paper, I'm inclined to change the definition of UB in D to not mean it can be assumed to not happen and not be unintended.

What's the current definition of UB in D?


Same as in C.


I don't think it would be a very good outcome if people forked C such that everyone working on DSP platforms and new platforms that you just haven't heard of had to use a fork with flexible CHAR_BIT while the standard defined it to be 8. Who is served by this forking? Plenty of software works fine with different CHAR_BIT values, although some poorly-written programs do need to be fixed.


It'd almost certainly have to be a fork anyway.

> Plenty of software works fine with different CHAR_BIT values,

Most won't. And good luck writing a diff program (for example) that works with CHAR_BIT == 32.

> although some poorly-written programs do need to be fixed.

Blaming all the problems on poorly-written programs does not work out well in real life. Good language design makes errors detectable and unlikely, rather than blaming the programmer.


Somehow it never occurred to me to run `diff` on a SHARC. Horse for courses.


> Plenty of software works fine with different CHAR_BIT values, although some poorly-written programs do need to be fixed.

Since C traditionally doesn’t have much of a testing culture I’m not sure you can trust any decent sized project here. I’d even be surprised if you could change the size of ‘int’. And moving off 2s complement is definitely out.


`long` is such a mess in C I never use it anymore. I use `int` and `long long`. `long` is so useless today that it's a code smell and the Standard should just delete it.


Both Linux (in C) and Rust choose to name types based on the physical size so as to be unambiguous where possible, although they don't entirely agree on the resulting names

Linux and Rust agree u32 is the right name for a 32-bit unsigned integer type and u64 is the name for the 64-bit unsigned integer type, but Rust calls the signed 32-bit integer i32 while Linux names that s32 for example.

It would of course be very difficult for C itself to declare that they're now naming the 32-bit unsigned integer u32 - due to compatibility. But Rust would actually be able to adopt the Linux names (if it wanted to) without issue, because of the Edition system, simply say OK in the 2023 edition, we're aliasing i32 as s32, and it's done. All your old code still works, and raw identifiers even let any maniacs who named something important "s32" still access that from modern "Rust 2023" code.


I didn't choose the integer suffixes for types because:

1. they are slower to type

2. they just aren't pretty to look at

3. saying and hearing `int` is so much easier than `eye thirty-two`. Just imagine relying on a screen reader and listening to all those eye thirty two's

4. 5 minutes with the language, and you know what size `int` is, as there is no ambiguity. I haven't run across anyone confused by it in 20 years :-)


If I was designing a new language I'd rather specify integer types by their range of expected values, and let the storage size be an implementation detail.

(not sure if this would ever come up, but if a variable had eg values 1000-1003, then technically you could optimize it to a 4-bit value.)


Depending on what your language is for you probably want both.

WUFFS type base.u8[0 ..= 42] is a type which fits in one byte but only has values zero through 42 inclusive, and it's a different type from base.u32[0 ..= 42] because that one takes four bytes.

Rust cares about this stuff to some extent but only internally, for types with a "niche" where it can squirrel away the None case of Option in an invalid value without needing more space. e.g. Rust's optional pointer is the same size as a traditional C-style pointer, because NULL isn't a valid pointer value so that means None, and Rust's optional NonZeroU64 is the same size as a u64 (64-bits) because zero isn't valid so that means None.


Ada is what you're looking for!


Don't. Every type has its place in a well-defined structure, passing through main architecture changes. Use 'int' if the expected range of values fit in 16 bits, use 'long' if it fits in 32 bits and use 'long long' otherwise. 'int' is guaranteed to be the fastest at-least-16 bit type and 'long' is guaranteed to be the fastest at-least-32 bit type for any architecture (matching the register width on most 32/64 bit platforms - except Windows, and being 32-bit wide even on 8/16 bit architectures), and each of these types have their own promotion class.

Don't use fixed width types by default as printing them or converting to string correctly will get ugly: printf("... %" PRId32 " ...");

Generally avoid the use of fixed-width 8-bit or 16-bit types in calculations or you might shoot yourself in the foot due to integer promotion rules.

Use fixed width types only for things that pass the boundary of a system, like in HW registers, serializing, database storage or network packets, but cast only at the end when converting to the type (optionally followed by bitwise manipulations) and cast immediately to other type when processing (after bitfield extraction and such things, of course).


Ralf Jung has a blog post looking at some of the claims in this paper [0]. Some hopefully representative quotes:

> The paper makes many good points, but I think the author is throwing out the baby with the bathwater by concluding that we should entirely get rid of this kind of Undefined Behavior. The point of this blog post is to argue that we do need UB by showing that even some of the most basic optimizations that all compilers perform require this far-reaching notion of Undefined Behavior.

<snip>

> I honestly think trying to write a highly optimizing compiler based on a different interpretation of UB would be a worthwhile experiment. We sorely lack data on how big the performance gain of exploiting UB actually is. However, I strongly doubt that the result would even come close to the most widely used compilers today—and programmers that can accept such a big performance hit would probably not use C to begin with. Certainly, any proposal for requiring compilers to curtail their exploitation of UB must come with evidence that this would even be possible while keeping C a viable language for performance-sensitive code.

> To conclude, I fully agree with Yodaiken that C has a problem, and that reliably writing C has become incredibly hard since undefined behavior is so difficult to avoid. It is certainly worth reducing the amount of things that can cause UB in C, and developing practical tools to detect more advanced kinds of UB such as strict aliasing violations.

<snip>

> However, I do not think this problem can be solved with a platform-specific interpretation of UB. That would declare all but the most basic C compilers as non-compliant. We need to find some middle ground that actually permits compilers to meaningfully optimize the code, while also enabling programmers to actually write standards-compliant programs.

[0]: https://www.ralfj.de/blog/2021/11/24/ub-necessary.html



I had the same basic thought while reading this: the only alternative here is to interpret this sort of undefined behavior in a "different" way (such as the way the K&R originally interpreted it) and leave it up the programmer to intuit what other assumptions the compiler is making on their behalf. Like... I can assume that the compiler will interpret:

    if (i++ < i)
as

    mov ax, i
    inc ax
    cmp ax, i
    jge else
... but that's still me making a (reasonable?) assumption. The C spec essentially just says "don't program that way".


Odd intuition, since you never store i back after incrementing. Also, I believe i++ might increment after the comparison rather than before, vs. ++i incrementing first


Oh wow I totally missed the i++ < i Don't mind me


++ is itself a store.


You know what I mean.


I actually don't, but maybe I just lack enough background knowledge on the discussion to understand your intent


The article talks about a similar code snippet being optimized out by the compiler because i++ can never be less than i originally was (unless you take into account the actual behavior of computers).


I think they're saying that the assembly translation doesn't store back `i`, whereas the C version does, so it's a not straightforward to assume that the assembly compiled from the C won't do that.


I really wish we got

cc: warning: UB line: 123 file: foo.c


How many warnings do you want for this small function?

  void oh_the_humanity(int *ptr, int val) {
    *ptr = val + 1;
  }
Off the top of my head:

* UB: ptr may be pointing a float variable. (It's not illegal to assign a float* to an int*, it's only UB when you actually dereference it with the wrong type.)

* UB: val + 1 may overflow.

* UB: potential data race on writing *ptr.

* UB: ptr may be a one-past-the-end-of-the-array pointer, which can be validly constructed, but may not be dereferenced.

* UB: ptr may be pointing to an object whose lifetime has expired.

* UB: ptr may be uninitialized.

* UB: val may be uninitialized.

As you can see, UB is intensely specific to the actual data values; it's not really possible to catch even a large fraction of UB statically without severe false positive rates.


Yeah I know I get it, it's me more being wishful, but I more seriously wish at least compilers could emit a warning when they optimize something after UB:

  % cat foo.c
  #include <stdlib.h>
   
  int
  foo(int *bar)
  {
   int baz = *bar;
  
   if (bar == NULL) exit(2);
  
   return (baz);
  }
  % cc -O3 -Wall -Wextra -c foo.c
  % objdump -dr foo.o            
  
  foo.o: file format Mach-O 64-bit x86-64
  
  
  Disassembly of section __TEXT,__text:
  
  0000000000000000 _foo:
         0: 55                            pushq %rbp
         1: 48 89 e5                      movq %rsp, %rbp
         4: 8b 07                         movl (%rdi), %eax
         6: 5d                            popq %rbp
         7: c3                            retq
  % cc -O0 -Wall -Wextra -c foo.c
  % objdump -dr foo.o            
  
  foo.o: file format Mach-O 64-bit x86-64
  
  
  Disassembly of section __TEXT,__text:
  
  0000000000000000 _foo:
         0: 55                            pushq %rbp
         1: 48 89 e5                      movq %rsp, %rbp
         4: 48 83 ec 10                   subq $16, %rsp
         8: 48 89 7d f8                   movq %rdi, -8(%rbp)
         c: 48 8b 45 f8                   movq -8(%rbp), %rax
        10: 8b 08                         movl (%rax), %ecx
        12: 89 4d f4                      movl %ecx, -12(%rbp)
        15: 48 83 7d f8 00                cmpq $0, -8(%rbp)
        1a: 0f 85 0a 00 00 00             jne 10 <_foo+0x2a>
        20: bf 02 00 00 00                movl $2, %edi
        25: e8 00 00 00 00                callq 0 <_foo+0x2a>
    0000000000000026:  X86_64_RELOC_BRANCH _exit
        2a: 8b 45 f4                      movl -12(%rbp), %eax
        2d: 48 83 c4 10                   addq $16, %rsp
        31: 5d                            popq %rbp
        32: c3                            retq
  % 
That's very similar to something that bit me in embedded except it was with pointer to structure. Compiler realizes I've derefed NULL and that's UB anyway so no need to do the NULL check later and merrily scribble exc vectors or whatever.


That's a nice example. It'd definitely be nice to have a warning for this one.

Fwiw GCC does have a related warning flag (-Wnull-dereference) but I'm not sure it's made exactly for this. I believe it works based on functions being annotated for possibly returning NULL, e.g. malloc. It's also not enabled by -Wall or -W because apparently there were too many false positives: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96554

I imagine patches would be welcome. I'm guessing there are more people who like to wish for more compiler features than there are people who like to develop compilers :)


edit: Thanks, I just checked and the warning doesn't work

https://godbolt.org/z/4TP1hfx4j

But your hint found -fno-delete-null-pointer-checks which does the trick

https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

And -fno-delete-null-pointer-checks made it into LLVM too. It's a good to know but a little late for when I needed it, cheers :)

  % cc -O3 -fno-delete-null-pointer-checks -Wall -Wextra -c foo.c
  % objdump -dr foo.o                                            
  
  foo.o: file format Mach-O 64-bit x86-64
  
  
  Disassembly of section __TEXT,__text:
  
  0000000000000000 _foo:
         0: 55                            pushq %rbp
         1: 48 89 e5                      movq %rsp, %rbp
         4: 48 85 ff                      testq %rdi, %rdi
         7: 74 04                         je 4 <_foo+0xd>
         9: 8b 07                         movl (%rdi), %eax
         b: 5d                            popq %rbp
         c: c3                            retq
         d: bf 02 00 00 00                movl $2, %edi
        12: e8 00 00 00 00                callq 0 <_foo+0x17>
    0000000000000013:  X86_64_RELOC_BRANCH _exit


If you want a general approach you can turn on ubsan in trap-only mode and see what traps have ended up in your output.


But such code might be generated by macros (or some code generator), in which case silent elimination of unnecessary code is expected and wanted behavior.


In this particular case the code is wrong and can be fixed easily by swapping the assignment and the comparison.

Likewise, code generators shouldn’t be generating this faulty code.

Raising compiler error here is the only right thing to do.

There are of course more ambiguous examples, though obvious examples like the one above are sadly way too common.


Why can't we say that the original code is wrong then? The whole point of having something be UB rather than implementation defined is because the language committee believes that it represents a bug in your program.


(Or templates, in the C++ case. Think of all the inlined iterator functions...)


Well if we're not super worried about implementation difficulty right now:

* 1 and 4-7: Don't worry about where the pointer goes, as long as you treat it like an int in this function. If there are going to be warnings about improper pointers, they should be at the call sites.

* 2 If overflow will either trap, wrap, or behave like a bignum, then it's not the dangerous kind of UB, so no warning by default. Consider extending C so the coder can more easily control integer overflow.

* 3 Anything could race. Out of scope, don't worry about it.


You can't detect most UB at compile time. LLVM has a system to detect it a runtime. There is a significant performance penalty, but it can be useful during testing.

See https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html


It’s not actually that significant compared to something like asan. You can ship software with ubsan enabled if you want to.


Compilers will routinely warn if they can detect UB at compile time.

The problem is that except in a few trivial cases it is impossible to detect UB at compile time. Even whole program static analysis can only catch a small subset.


Yeah, it'd be nice if we could solve the halting problem.


>> We sorely lack data on how big the performance gain of exploiting UB actually is. However, I strongly doubt that the result would even come close to the most widely used compilers today—and programmers that can accept such a big performance hit would probably not use C to begin with. Certainly, any proposal for requiring compilers to curtail their exploitation of UB must come with evidence that this would even be possible while keeping C a viable language for performance-sensitive code.

There actually is data, [1] and it goes against intuition because it shows that UB makes code slower.

It turns out that when programmers are faced with the possibility of UB, they (quite rightly) try to work around it. (In my case, I implemented signed two's-complement arithmetic entirely with unsigned types. I also implemented my own array types, including bounds checks when indexing the arrays.) These workarounds make their code slower, in general.

When you think about it that way, it makes sense because programmers almost never want UB in their software, so on average, the software that people care about will work around UB and become slower.

UB in C was defensible when platforms were not homogenous and when compiler writers did not abuse the spirit of it. But nowadays, most of it is indefensible because platforms are mostly homogenous, and compiler writers have been abusing UB for "performance."

We saw the same thing happen with speculative execution in chips. For years, chip manufacturers got away with lots of tricks to increase performance. Then Spectre and Meltdown were discovered. As a result, a lot of software had to be changed, which resulted in the software slowing down.

(Yes, there are now mitigations in chips, but I think those mitigations will be successfully attacked too. I think that it will continue that way until most or all speculative execution is removed.)

Likewise, with compilers exploiting UB against the original spirit of UB, which was just to make C easy to port to any architecture, we are probably going to have a reckoning about it in the same way we did Spectre and Meltdown. In a way, we already do, but it's spread out like a thousand paper cuts. Maybe that means that it will stay invisible enough that programmers never wake up; I hope not.

tl;dr: compilers exploiting UB actually slows down good code in much the same way that Spectre and Meltdown do.

[1]: https://www.complang.tuwien.ac.at/kps2015/proceedings/KPS_20...


That data is bullshit. It measures -fno-strict-overflow -fno-strict-aliasing. I in fact agree flipping defaults on these UBs is reasonable, but that is very far from "portable assembler" or UB-free C.

THE primary UB of C is out-of-bounds write. Any proposal for UB-free C should address how to compile out-of-bounds write.

If you insist out-of-bounds write should compile in "portable assembler" way and reliably corrupt memory instead of allowing compilers to assume it doesn't happen, compilers can't allocate local variables in register. If it's on stack, out-of-bounds write may reliably corrupt stack, so how can it be on register?

"Portable assembler" people invariably say "that's not what I meant!" when compiler people raise this issue, but the question is "then what do you mean?" And I am yet to see any good answer.


Could you you explain your position here more fully? I'm a C programmer mostly in the "portable assembler" camp, and I don't see why insisting on out-of-bound writes to be performed prevents compilers from keeping local variables in registers. I think the difference might be your use of the word "reliably".

I don't think anyone in my camp is insisting that the results be reliable in the sense you seem to take for granted. What we want is for the compiler to attempt to do what it's instructed to do, without eliding code based on counterfactual reasoning that assumes the absence of undefined behavior. In the absence of an explicit read that is being ignored, we're generally fine with cached values of local variables being out of date.

Maybe you could give an example that makes your point more clearly, and we can see if we actually disagree?



Unfortunately, these don't help much.

I thought about responding to jcranmer's post instead of yours, but wasn't sure how to approach it. It's a good comment, and I appreciated his attempt, but I feel like he's thoroughly missing the point of the complaints about UB. The complaint isn't that too much UB exists in C, but that compiler writers seem to use the presence of UB as an excuse for being allowed to break code that has worked historically. And the complaint isn't just that the code is broken, but that no apparent care is taken to avoid doing so despite the small amount of gain. It's a worldview mismatch, and I don't know how to bridge it.

Your comment seems about the same as the one I responded to. You seem to assume that people who complain about UB in C would have a problem with keeping local variables in registers, but I've never seen anyone actually make this complaint. Take for example the Arxiv paper we are discussing: he doesn't bring this up. This makes me suspect that your mental model of the people who are complaining about UB in C is probably flawed. I understand the technical issue you are gesturing toward, I just don't see it as being something that refutes any of the issues brought up in the Arxiv paper.

My hope was that a concrete example might help to clarify, but I do realize this might not be the right forum for that.


There's not a bright line between optimization passes being aware of UB in unreasonable vs "exploitative" ways. The principled way to think about optimization is that an optimization pass can assume certain invariants about the code but must preserve other invariants through the transformation as long as the assumptions hold. I think basically any invariant you assume in a memory-unsafe language C could be invalidated by real code.

A lot of the things people complain about are cases where the compiler got better at reasoning through valid transformations to the code, not cases where the compiler . E.g. I'm sure that even very old compilers have had optimization passes that remove redundant NULL checks that logically depend on dereferencing NULL pointers being undefined. But these probably were more ad-hoc and could only reason about relatively simple cases and constrained scopes. Another example is integer overflow. I'd bet a lot of older compilers have loop optimizations that somehow depend on the loop counter not overflowing, or pointer addresses not wrapping around.

I think it's perfectly reasonable to say that C's definition of undefined behaviour is too expansive and too difficult to avoid in real code.

But I think that a lot of the complaints end up being equivalent to "don't depend on UB in ways that I notice in my code" or "don't depend on UB in ways that compilers from the 90s didn't". That's not something you can practically apply or verify when building a compiler because there isn't a bright line between assuming an invariant in reasonable vs unreasonable ways, unless you can define a narrow invariant that distinguishes them.


No, I don't assume people are against register allocation, but any concrete proposal I have seen kind of implies such conclusion. I am trying to understand what people actually want, since they seem clearly different from what people say they want.

Okay let's discuss a concrete example.

    *x = 12345678;
    f();
    return *x; // can you copy propagate 12345678 to here?
f() does this:

    for (int *p = 0; p++; p < MEMSIZE)
        if (*p == 12345678)
            *p = 12345679.
That is, f scans the memory for 12345678 and replace all instances with 12345679. There is no doubt this actually works that way in assembly. Things like cheat engines do this! C compilers assume this doesn't happen, because it is UB.

Hence, portable assembly C compiler can't omit any load. Now I understand there are minority of people who will answer "that's what I want!", but like register allocation, I think people generally want this to optimize. But that necessarily implies memory search-and-replace can't compile in portable assembly manner.


I can't really speak to the "portable assembler" point of view here, but if I was trying to make UB less dangerous I would say that the code had better return either 12345678 or 12345679, as long as no other memory addresses have 12345678 stored in them. Or it could trap.


> I can't really speak to the "portable assembler" point of view here, but if I was trying to make UB less dangerous I would say that the code had better return either 12345678 or 12345679

If 12345678 is acceptable to you then the language specification is already doing what you want. The alternative is to require arbitrary havocs on every memory address upon any function call to a different translation unit. Nightmare.

> Or it could trap.

UBSan exists and works very well. But introducing runtime checks for all of this stuff is not acceptable in the C or C++ communities, outside of very small niches regarding cryptography, because of the incredible overhead required. Imagine if your hardware doesn't support signed integer overflow detection. Now suddenly every single arithmetic operation is followed by a branch to check for overflow in software. Slow.


> If 12345678 is acceptable to you then the language specification is already doing what you want.

No it's not.

The compiler is allowed to look at this code and make it print 5. Or make it delete my files. This code is undefined and the compiler could do anything without violating the C standard.


> The compiler is allowed to look at this code and make it print 5. Or make it delete my files.

It is allowed to do this but it won't. "The compiler will maximally abuse me if I put any undefined behavior in my program" is not a concern that is actually based in any reality. In the above program the compiler cannot meaningfully prove that undefined behavior exists and if it could it would yell at you and raise an error rather than filling your hard drives with pictures of cats.

This meme has done so much damage to the discussion around UB. The gcc and clang maintainers aren't licking their lips just waiting to delete hard drives whenever people dereference null.

Go compile that program. You can stick it in compiler explorer. It is going to print 12345678.


> It is allowed to do this but it won't.

It is very possible for a non-malicious compiler to end up eliminating this code as dead.

That's the biggest risk. I only mentioned "delete my files" to demonstrate how big the gap in the spec is, because you were saying the spec is already "doing what I want", which happens long before we get to what compilers will or won't do.


A programmer wrote things in a specific order for a specific reason.

Lets instead assume that the variable assignments above are to some global configuration variables and then f() also references those and the behavior of f() changes based on the previously written code.

The objections from the 'C as portable assembler' camp are:

* Re-ordering the order of operations across context switch bounds (curly braces and function calls). -- re-ordering non-volatile store / loads within a context is fine, and shouldn't generate warnings.

* Eliminating written instructions (not calling f()) based on optimizations. -- Modification to computed work should always generate a warning so the optimization can be applied to the source code, or bugs corrected.


> A programmer wrote things in a specific order for a specific reason.

Is it not possible that the programmer introduced a bug?

Consider the bug that caused the 911 glitch in Android phones recently. An unstable comparator was defined in a type, violating the contract that Comparable has with Java's sorting algorithms. When Java detects that this implementation violates the assumptions its sorting algorithms make, it throws an exception. Should it not do this and instead say that the programmer wrote that specific comparator on purpose and it should loop forever or produce an incorrect sort? I think most people would say "no". So why is the contract around pointer dereferencing meaningfully different?


> Modification to computed work should always generate a warning so the optimization can be applied to the source code, or bugs corrected.

This only works out in very, very limited cases. What if this opportunity only presents itself after inlining? What if it's the result of a macro? Or a C++ template?

Just because the compiler can optimize something out in one case doesn't mean you can just delete it in the code...


Globals and locals are different. All compilers will give a global a specific memory location and load and store from it. Locals by contrast can be escape analyzed.


The example didn't show where X was defined; it could be anything.


No it could not have been for copy propagation to be valid. It had to be a local except under some very special conditions.


How about circumstances such as opting in to a semantically new version of C?

  #ifndef __CC_VOLATILE_FUNCTIONS
  /\* No volatile function support, remove code */
  #define VOLFUNC
  #else
/* This compiler supports volatile functions. Only volatile functions may cause external side effects without likely bugs. */

  #define VOLFUNC volatile
  #endif
https://www.postgresql.org/docs/current/xfunc-volatility.htm...

Similarly stable functions could have results cached. You might also note that PostgreSQL assumes any undeclared function is volatile.


> any concrete proposal I have seen kind of implies such conclusion.

No it does not. In your example, I personally would prefer it did not propagate the 12345678. Good grief, I wrote the deref there.

> C compilers assume this doesn't happen, because it is UB.

Incorrectly. IMHO.

> but like register allocation,

You are silently equating int x; with a memory deref. There is no need to do this.

Anyway, here is part of the rationale for C:

"Although it strove to give programmers the opportunity to write truly portable programs, the Committee did not want to force programmers into writing portably, to preclude the use of C as a ``high-level assembler'': the ability to write machine-specific code is one of the strengths of C. It is this principle which largely motivates drawing the distinction between strictly conforming program and conforming program"

http://port70.net/~nsz/c/c89/rationale/a.html#1-5

So the idea of C as a portable assembler is not some strange idea proposed by ignorant people who don't understand C, but an idea that was fundamental to C and fundamental to the people who created the ANSI/ISO C standard.

But hey, what do they know?


Thanks for making an example!

I'm against compiling this particular example to return a constant. I presumably wrote the awkward and unnatural construction return *x because I want to to force a load from x at return. If I wanted to return a constant, I'd have written it differently! I'm odd though in that I occasionally do optimizations to the level where I intentionally need to force a reload to get the assembly that I want.

Philosophically, I think our difference might be that you want to conclude that one answer to this question directly implies that the "compiler can't omit any load", while I'd probably argue that it's actually OK for the compiler to treat cases differently based on the apparent intent of the programmer. Or maybe it's OK to treat things differently if f() can be analyzed by the compiler than if it's in a shared library.

It would be interesting to see whether your prediction holds: do a majority actually want to return a constant here? My instinct is that C programmers who complain about the compiler treatment of UB behavior will agree with me, but that C++ programmers dependent on optimizations of third party templates might be more likely to agree with you.


Oh, so you are on "that's what I want!" camp. But I am pretty sure you are in minority, or at the very least economic minority. Slowdown implied by this semantics is large, and easily costs millions of dollars.

> while I'd probably argue that it's actually OK for the compiler to treat cases differently based on the apparent intent of the programmer.

This is actually what I am looking for, i.e. answer to "then what do you mean?". Standard should define how to divine the apparent intent of the programmer, so that compilers can do divination consistently. So far, proposals have been lacking in detailed instruction of how to do this divination.


> and easily costs millions of dollars

Looks like UB bugs can cost more. It's a new age of UB sanitizers as a reaction to a clear problem with UB.


Bugs based on optimizations that compilers make based on assumptions enabled by undefined behavior (like the null check issue from 2009 in the Linux kernel) actually don't cost very much. They get a disproportionate amount of scrutiny relative to their importance.


My experience with a lot of the UB-is-bad crowd is that they don't have much of an appreciation for semantics in general. That is to say, they tend to react to particular compiler behavior, and they don't have any suggestions for how to rectify the situation in a way that preserves consistent semantics. And when you try to pin down the semantics, you usually end up with a compiler that has to "do what I mean, not what I said."

A salient example that people often try on me, that I don't find persuasive, is the null pointer check:

  void foo(int *x) {
    int val = *x;
    if (!x) return;
    /* do real stuff */
  }
"The compiler is stupid for eliminating that null check!" "So you want the code to crash because of a null pointer dereference then." "No, no, no! Do the check first, and dereference only when it's not null!" "That's not what you said..."


> "No, no, no! Do the check first, and dereference only when it's not null!"

I don't think I've heard anyone express this preference. If they did, I'd agree with you that they are being unreasonable. My impression is that practically everyone on the "portable assembler" team would think it's perfectly reasonable to attempt the read, take the SIGSEGV if x is NULL, and crash if it's not handled. Most would also be happy skipping the read if the value can be shown to be unused. None would be happy with skipping the conditional return.

Where it gets thornier is when there is "time travel" involved. What if the unprotected read occurs later, and instead of just returning on NULL, we want to log the error and fall through:

  void foo(int *x) {
    if (!x) /* log error */;
    int val = *x;
    return;
  }
Is it reasonable to have the later unconditional read of x cause the compiler to skip the earlier check of whether x is NULL? In the absence of an exit() or return in the error handler, I think it would be legal for the compiler to skip both the error logging and the unused load and silently return, but I don't want the compiler to do this. I want it to log the error I asked it to (and then optionally crash) so I can identify the problem. Alternatively, I'd probably be happy with some compile time warning that tells me what I did wrong.


How do you feel about:

  void foo1(int *p) {
    *p = 7;
    printf("%d\n", *p);
    free(p);
  }
May the compiler replace that with puts("7") and free()? Recall that free() is a no-op when the pointer is NULL.

Are you arguing that each function may reduce pointer accesses down to one but not down to zero? What about

  int foo2() {
    int *p = malloc(400);
    *p = 0;
    /* ... code here was inlined then optimized away ... */
    ret = 7;
    free(p);
    return ret;
  }
? May we remove '*p = 0;', whether we remove the malloc+free or not?

Generally speaking the compiler tries not to reason about the whole function but just to look at the smallest possible collection of instructions, like how add(x, x) can be pattern matched to mul(x, 2) and so on. Having to reduce to one but not zero memory accesses per function is not easily compatible with that model. We would have to model branches that make the access conditional, the length of accesses may differ (what if 'p' is cast to char* and loaded), read vs. write, multiple accesses with different alignment, and maybe other things I haven't considered.

Both gcc and clang provide -fsanitize=null which checks all pointer accesses for null-ness before performing them. These can be surprising, your code (libraries you didn't write, headers you included) may be dereferencing NULL and relying on the optimizer to remove the invalid access. IMO there should be a "pointer is null-checked after use", it's a clear example where the programmer wrote something other than what they intended.


Quick responses before I go to bed:

I think compilers should emit code that make writes to memory when the program asks them to, even if the pointer is then freed. Not doing so too easily leads to security issues. If the pointer turns out to be NULL at runtime, then let it crash. Using 'volatile' often works as a workaround, though.

I have no strong opinion on substituting puts() for printf(). I think incorporating the standard library into the spec was unhelpful, but isn't a major issue either way. It occasionally makes tracing slightly more difficult, but presumably helps with performance enough to offset this, and it's never bitten me as badly as the UB optimizations have.


The existence of volatile is a strong argument that was never the intention that C should map every pointer dereference at the source level to load and stores at the asm level.


Not at all.

It was a workaround (unreliable at that, IIRC) for compilers getting more aggressive in ignoring the intentions of C, including the early standards.

"The Committee kept as a major goal to preserve the traditional spirit of C. There are many facets of the spirit of C, but the essence is a community sentiment of the underlying principles upon which the C language is based. Some of the facets of the spirit of C can be summarized in phrases like

- Trust the programmer.

- Don't prevent the programmer from doing what needs to be done. ..."

TRUST THE PROGRAMMER.

http://port70.net/~nsz/c/c89/rationale/a.html#1-5


> It was a workaround (unreliable at that, IIRC) for compilers getting more aggressive in ignoring the intentions of C, including the early standards.

But volatile was in C89? So how can it be a response to compilers "ignoring the intentions of C" in the early standards?

If anything, an example in the C89 standard makes it very explicit that an implementation may do exactly what gpderetta said (emphasis added):

> An implementation might define a one-to-one correspondence between abstract and actual semantics: at every sequence point, the values of the actual objects would agree with those specified by the abstract semantics. The keyword volatile would then be redundant.

> Alternatively, an implementation might perform various optimizations within each translation unit, such that the actual semantics would agree with the abstract semantics only when making function calls across translation unit boundaries. In such an implementation, at the time of each function entry and function return where the calling function and the called function are in different translation units, the values of all externally linked objects and of all objects accessible via pointers therein would agree with the abstract semantics. Furthermore, at the time of each such function entry the values of the parameters of the called function and of all objects accessible via pointers therein would agree with the abstract semantics. In this type of implementation, objects referred to by interrupt service routines activated by the signal function would require explicit specification of volatile storage, as well as other implementation-defined restrictions.

[0]: https://port70.net/~nsz/c/c89/c89-draft.html


Going from trust the programmer to requiring volatile semantics for each access is quite a stretch. I would say you are part of an extremely small minority.


Don't compilers already have ways to mark variables and dereferences in a way to say 'I really want access to this value happen'?

They are free to optimize away access to any result that is not used based on code elimination, and these annotations limit what can be eliminated.

However, basing code elimination on UB, as pointed out earlier, would basically eliminate all code if you were rigorous enough because you basically cannot eliminate UB in C code, which is not in any way useful.


> Don't compilers already have ways to mark variables and dereferences in a way to say 'I really want access to this value happen'?

The standard defines it, it's `volatile` I believe. But it does not help with the examples above as far as I understand (removing the log, removing the early return, time-travel...).


I believe it completely solves the question

? May we remove '*p = 0;', whether we remove the malloc+free or not?

Sure, it does not solve the question when arbitrarily removing NULL pointer checks is OK.

It is true that when the compiler is inlining code or expanding a macro it may have a NULL check that is spurious in environments that do not map page 0 based on the observation that the pointer was dereferenced previously.

And this assumption is incorrect in environments that do map page 0 causing wrong code generation.


On a lot of systems load from address zero doesn't segfault, I'm fine with the cpu loading it. I'm disappointed that the new compiler removed the check someone wrote a dozen years ago to prevent the cpu from overwriting something important though.


The null pointer need not map to address zero exactly to cater to this sort of corner cases.


In the case where loads from address zero are legal, the dereference-means-non-null-pointer assumption is disabled.



That case wasn't one of the cases where null pointers are valid memory, judging from the commit message. (There was a case where gcc had a bug where it didn't disable that check properly, but this doesn't appear to be that case.)


Actually, yes, I do want that code to crash if x is NULL.


Then you should logically be OK with deleting the null pointer check, as that check is unreachable.


Are you familiar with this classic example? http://blog.llvm.org/2011/05/what-every-c-programmer-should-...

The problem (if I understand it right) is that "Redundant Null Check Elimination" might be run first, and will get rid of the safety return. But then the "Dead Code Elimination" can be run, which gets rid of the unused read, and thus removes the crash. Which means that rather than being logically equivalent, you can end up with a situation where /* do real stuff */ (aka /* launch missiles */) can be run despite the programmer's clear intentions.


Right, each optimization is fine on its own but the combination is dangerous.


Not really. My understanding of the opinion (which I don't share, FWIW, and you probably know this) is that the null dereference should not be deleted, and it should be marked as an undeleteable trap similar to what an out-of-bounds access might be in Rust. That is, not unreachable in the sense of "code can never reach here", but unreachable in the sense that if it is hit then no other code executes.


The compiler can't know if page 0 is mapped.


Nope.


I would expect the code to crash when passed a null pointer, absolutely! And the if statement is there to let me recover after using "continue" in the debugger. And if I run on an ISA without protected memory, that load will not actually crash, and I'm OK with that. That's a level of UB differing-behavior (more like ID, really) that's reasonable.

I know of no experienced C programmer who would expect the compiler to re-order the statements. That sounds very much like a strawman argument to me!

I'd also be OK with a compiler that emitted a warning: "This comparison looks dumb because the rules say it should do nothing." Emitting that warning is helpful to me, the programmer, to understand where I may have made a mistake. Silently hiding the mistake and cutting out parts of code I wrote, is not generally helpful, even if there exist some benchmark where some inner loop will gain 0.01% of performance if you do so.

After all: The end goal is to produce and operate programs that run reliably and robustly with good performance, with minimum programmer cost. Saying that the possible performance improvements of code that nobody will run because it's buggy absolutely trump every other concern in software development, would be a statement I don't think reflects the needs of the software development profession.


> ...compiler writers seem to use the presence of UB as an excuse for being allowed to break code that has worked historically.

I may be dense, but I just don't understand this. Programmer compiles old C with a new compiler with certain optimizations set to on, and then is disappointed when the new compiler optimizes UB into some unsafe UB goofiness. I think expecting different behavior re: UB is folly. The language is explicit that this is the greyest of grey areas.

An aside -- what is so funny to me, not re: you, but re: some C programmers and Rust, is I've heard, just this week, 1) "Why do you need Rust? Unsafe memory behavior -- that's actually the programmers fault", 2) "Rust doesn't have a spec, a language must have a spec, a spec is the only thing between us and anarchy" (yet C has a spec and this is the suck), and 3) "Rust is too fast moving to use for anything because I can't compile something I wrote 3 months ago on a new compiler and have it work" (which seems, mostly, apocryphal.)

> It's a worldview mismatch, and I don't know how to bridge it.

I made the aside above only because -- maybe C and its values are the problem.


The concept of "friendly C" is not new, and this is one of the straw men that it knocked down, because it is actually possible to define "what I mean."

It's totally OK to say that an out-of-bounds write may or may not be detectable by reading any particular other local or global variable or heap value. That doesn't in turn translate to compilers being allowed to turn a bug in a SPECint kernel into an empty infinite loop (a case that happened!)


So what is the definition? That's the entire issue. If the definition is -fno-strict-overflow -fno-strict-aliasing, we are in violent agreement. We should consider making it standard. If the definition is for C to be free of UB, we are in violent disagreement, that would completely ruin C's performance.


Not surprising that is the only way it managed to win the performance game against safer systems programming languages.

"Oh, it was quite a while ago. I kind of stopped when C came out. That was a big blow. We were making so much good progress on optimizations and transformations. We were getting rid of just one nice problem after another. When C came out, at one of the SIGPLAN compiler conferences, there was a debate between Steve Johnson from Bell Labs, who was supporting C, and one of our people, Bill Harrison, who was working on a project that I had at that time supporting automatic optimization...The nubbin of the debate was Steve's defense of not having to build optimizers anymore because the programmer would take care of it. That it was really a programmer's issue.... Seibel: Do you think C is a reasonable language if they had restricted its use to operating-system kernels? Allen: Oh, yeah. That would have been fine. And, in fact, you need to have something like that, something where experts can really fine-tune without big bottlenecks because those are key problems to solve. By 1960, we had a long list of amazing languages: Lisp, APL, Fortran, COBOL, Algol 60. These are higher-level than C. We have seriously regressed, since C developed. C has destroyed our ability to advance the state of the art in automatic optimization, automatic parallelization, automatic mapping of a high-level language to the machine. This is one of the reasons compilers are ... basically not taught much anymore in the colleges and universities."

-- Fran Allen interview, Excerpted from: Peter Seibel. Coders at Work: Reflections on the Craft of Programming

So it was time to embrace UB was means to recover ground, and here we are back at hardware memory tagging and sanitizers as ultimate mitigations for a patient that will never recover from its bad habits.


> and reliably corrupt memory

Who said memory corruption had to occur "reliably"??

"Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results"

> compilers can't allocate local variables in register.

Why not?

> If it's on stack, out-of-bounds write may reliably corrupt stack, so how can it be on register?

"Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results"

> And I am yet to see any good answer.

What would qualify an answer as "good" IYHO?


>> and reliably corrupt memory > Who said memory corruption had to occur "reliably"??

Usually when I write code that corrupts memory, it reliably corrupts memory an unreliable way.


What is that word salad supposed to mean?


Consistent inconsistency or inconsistent consistency. Similar, but slightly different.


I want a safe C. If that means I have to deal with fewer available optimizations, I'm all for that! That's why I wrote two's-complement arithmetic with unsigned types and why I actually have bounds checks on my array accesses in my code.

That means I would gladly take a compiler that could not allocate local variables in registers.

Or even better, just do a bounds check. The standard does not preclude storing a raw pointer and a length together. Why couldn't the compiler do something like that and do a bounds check?

(Yes, the compiler would have to use its own version of libc to make that happen, but it could.)

In other words, I want correct code first, performant code second. I'll take the performance hit for correct code.

People claim that exploiting UB gives 1-2% speedup and that that's important. What they conveniently leave out is that you only get that speedup when your code has UB, which means that you only get that speedup when your code has a bug in it.

Fast wrong code is bad. Very bad. Correct code, even if slower, is an absolute must.


> People claim that exploiting UB gives 1-2% speedup and that that's important. What they conveniently leave out is that you only get that speedup when your code has UB, which means that you only get that speedup when your code has a bug in it.

Assuming there is no UB and optimizing under that assumption gives a speedup. You get the speedup whether that assumption is wrong or not. If that assumption is wrong, your program might also blow up in hilarious ways. But in no way does making the assumption that your code has no undefined behavior rely on your code having undefined behavior.


> Assuming there is no UB and optimizing under that assumption gives a speedup.

Sure, but to make sure you do not have any UB, at least for things like signed arithmetic, you basically have to either give up signed arithmetic, which is what I've done by using only unsigned, or ensure your arithmetic is always within range, which requires formal methods. Otherwise, you have UB, and that 1-2% speedup is because of that UB. If you do what I do, you get no speedup. If you use formal methods to ensure ranges, you do, but at an enormous cost.

> But in no way does making the assumption that your code has no undefined behavior rely on your code having undefined behavior.

But it does. Because the optimizations that assume UB only kick in when UB could be present. And if UB could be present, it most likely is.

Once again, those optimizations will never kick in on my code that uses purely unsigned arithmetic. So yes, those optimizations do require UB.


Is it formal methods to conclude that the vast majority of arithmetic, which for the software I typically write consists of iterators and buffer offsets dealing with fixed size buffers, can never overflow an int? Is it formal methods when I conclude that the 12-bit output from an ADC cannot overflow a 32-bit int when squared? Is it formal methods to conclude that a system with 256 kilobytes of RAM is not going to have more than 2^31 objects in memory? I guess I do formal methods then, at an enormous cost. Sometimes I have runtime checks with early bail outs that allow code later down the stream to be optimized.


There are always exceptions, of course.

But then again, there's also the Ariane 5.


> I would gladly take a compiler that could not allocate local variables in registers

That sounds rather extreme. I imagine it would be ruinous to performance. To my knowledge the various safe alternatives to C have no trouble with register allocation. Safe Rust and Ada SPARK, for instance, or even Java.

> The standard does not preclude storing a raw pointer and a length together. Why couldn't the compiler do something like that and do a bounds check?

That technique is called fat pointers. It's been well researched. Walter Bright, who just posted a comment elsewhere in this thread, has a blog post on it. [0] I imagine ABI incompatibility is one of the main reasons it hasn't caught on.

C++ compilers do something similar, storing array lengths in an address like myArray[-1]. They need to store array lengths at runtime because of the delete[] operator (unless they can optimise it away of course). C++ still allows all sorts of pointer arithmetic though, so it wouldn't be easy for a compiler to offer Java-like guarantees against out-of-bounds access. Doing so takes a sophisticated tool like Valgrind rather than just another flag to gcc.

[0] https://digitalmars.com/articles/C-biggest-mistake.html


Why not compile at -O0? That gives you everything that you want today, including no register allocation.


-O0 doesn't change what the compiler code generation is allowed to do around UB. Optimizing tends to expose UB by producing undesired operation, but the optimizer is only allowed to make transformations that result in the code still complying with the standard. So code generation without the optimizer could still produce undesired operation, and turning the optimizer off is no guarantee that it'll do what you want.


If you want code to be run line by line as written in source file I think you want -Og. O0 means optimize compilation speed.

Wouldn’t recommend relying on either of them for making your build correct. It breeds the false accusation that it’s the optimizations that break your code, when the code is in fact already broken and might fail for other reasons.


At least for GCC/Clang this isn't what O0 means. Excerpt from the GCC manual:

> Most optimizations are completely disabled at -O0 or if an -O level is not set on the command line, even if individual optimization flags are specified.

And

> Optimize debugging experience. -Og should be the optimization level of choice for the standard edit-compile-debug cycle, offering a reasonable level of optimization while maintaining fast compilation and a good debugging experience. It is a better choice than -O0 for producing debuggable code because some compiler passes that collect debug information are disabled at -O0.

(Og isn't really implemented in clang yet, but the mindset is the same)


Yes, this is a consistent position, but bounds checking all indexing and making pointers larger by including length will cause slowdown much larger than 1-2%. My estimate is at least 10% slowdown. So citing 1-2% slowdown is misleading.


I agree 1-2% might be misleading, but Dr. John Regehr puts overflow checking overhead at 5%. [1]

I'll take 5% or even 10% for correctness because, again, fast and wrong is very bad.

[1]: https://blog.regehr.org/archives/1154



> the original spirit of UB, which was just to make C easy to port to any architecture

I thought the original spirit of undefined behavior was 'we have competing compiler implementations and don't have the political will to pick one'.


I don't think compiler implementations are responsible for the standard to refuse to endorse 2-complement (which is the root cause of signed overflow being UB originally if I understand correctly).


I think those are one and the same; I was just more polite about it, I guess. I think yours is closer to the truth than mine.


I pretty strongly disagree with these quotes you have here. You don't need UB to do optimization, in fact because of UB in C many forms of optimization are impossible in C because of aliasing uncertainty. If things are defined to not alias then even though things are fully defined there is a lot more optimization that can be performed by not reloading variables from memory and instead using local cached-in-register copies of variables. This is why naive Rust can in some cases be faster than naive C (without using of restrict keyword) and this will only improve with time as compilers better support this.


> in fact because of UB in C many forms of optimization are impossible in C because of aliasing uncertainty.

This doesn't seem right? Specific types of aliasing make optimization more difficult/impossible because they are specifically not UB. Rust's aliasing optimizations can (theoretically?) occur specifically because same-type aliasing is UB in Rust. I'm not sure how "UB prevents optimization" falls out of that?


> If things are defined to not alias

... What do you call it then when you put aliasing pointers into pointer variables "defined not to alias"? Can you explain how this reduces UB?


I think they're trying to say that if your language prevents aliasing (i.e. pointers by definition do not alias and it is impossible to make them alias), then the compiler doesn't need to worry about it. C doesn't prevent aliasing, therefore the compiler has to worry about every case of aliasing that isn't undefined.

But it's not the UB here that prevents optimization, it's precisely the fact that aliasing is (sometimes) defined. If it were always UB, the compiler wouldn't ever have to worry about it... but that'd require some interesting changes to the language to keep it usable. Definitely not backwards compatible changes.


> pointers by definition do not alias and it is impossible to make them alias

This is equivalent to saying that pointer arithmetic is disallowed. Pointers are by their nature in the C virtual machine offsets into a linear memory space, so for any two pointers, x and y, there exists a c such that (ptr_t)x+c == (ptr_t)y, and thus there can always be aliasing.


> Pointers are by their nature in the C virtual machine offsets into a linear memory space

Historically, many platforms - such as Multics or Windows 3.x - didn’t have a linear memory space, they had some kind of memory segmentation. The industry has largely moved away from that towards the flat address space model. Go back to the 1980s, it was still a much bigger thing, and people used C on those platforms, and the standard was written to support them. The actual detailed control of memory segmentation is inherently non-portable so cannot be addressed by the standard, but the standard defines pointer arithmetic in such a way to support those platforms - pointer arithmetic on unrelated pointers is undefined, because if the pointers belong to different memory segments the results can be meaningless and useless.


Even in cases of segmented memory architectures there is still a requirement that a void pointer be able to cast (reversably) into an integral type (7.18.1.4 in C99, 3.3.4 in C90, the first ISO standard).


This is not true in either of those C versions. C99§7.18.1.4 describes (u)intptr_t as optional, which (per §7.18) means that <stdint.h> need not provide those typedefs if the (compiler) implementation doesn't provide an integer type that allows reversible casting from a void pointer. Similarly, it's not clear in C90§3.3.4 that the implementation has to implement these casts in a reversible manner, although that is the obvious way of implementing it.

That being said, I can't think of an implementation that didn't support this, even if they did have to pack segment and offset information into a larger integer.


It wouldn’t work on capability-based architectures where pointer validity is enforced by tag bits. Cast the pointer to an integer, lose the tag; cast it back, tag is missing and pointer is invalid (except for privileged system software which has the authority to enable the tag bit on an arbitrary pointer.)

Do such architectures still exist? Do/did they support C? Well, 128-bit MI pointers on IBM i (fka AS/400) are like this - a hardware tag bit protects them against forgery - and ILE C lets you manipulate such pointers (it calls them “system pointers”, _SYSPTR), so that would be a real world example of a pointer in C which can be cast to an integer but cannot be cast back. (IBM i also has 64-bit pointers which aren’t capabilities and hence aren’t tag-protected and can be cast to/from integers - but they don’t point into the main operating system address space, which is a single-level store single address space shared by all non-Unix processes, they only point into per-process private address spaces, so-called “teraspaces”.)

I think some UB in C is motivated by allowing C to be used on these kinds of architectures, even if they are now exceptionally rare. When C was initially being standardised in the 1980s, many people thought these kinds of architectures were the future, I think they were surprised by the fact they’ve never gone mainstream


ARM Morello (on the front page earlier this week: https://news.ycombinator.com/item?id=30007474) is a capability-based architecture, with 129-bit pointers. Compilers for it provide a uintptr_t that is appropriate, but it is far stricter about the kinds of operations that can be done in the reverse direction.


Does that matter? If you change the integer then you are not allowed to cast the altered value to a pointer.

And a compiler could enforce this if it really wanted to.

When you only have to worry about pointer arithmetic on actual pointers, it's straightforward to make sure they never alias.


Actually no, the C abstract machine is not contiguous. If x and y point inside distinct objects (recursively) not part of the same object, there is no valid c that you can add to x to reach y.

edit: allowing that would prevent even basic optimizations like register allocation.

edit: s/virtual/abstract/


The C abstract machine requires reversible transformations from pointer to integral types (7.18.1.4 in C99, 3.3.4 in C90, the first ISO standard).

Practically speaking modern devices are in fact mostly flat, so you can of course do this, although you do brush up against undefined behavior to get there.


Reversible transformations don't imply a flat address space. All it means is that there's an integral type with enough bits to record any address in them.


Pointer provenance is more complex and subtle.


What is the c virtual machine? I thought there wasn't one


They mean the abstract machine, in terms of which the semantics are defined.


See also decades of Fortran performance dominance, only overcome by throwing a gazillion C programmers at the problem.


In theory, because those gazillion C programmers happen to misuse restrict, so in the end it mostly ends up in flames.


but doesn't Fortran get away with it exactly because argument aliasing is UB? You can use restrict to get Fortran semantics.


Actually yeah, you are right. I somehow misread it as a defense of UB because of the last sentence. Total a brain-fart on my part, sorry.


I'm afraid the author as usual confuses the cases when UB produces strange results and when the compiler actively and deliberately exacerbates the results. He also claims gcc interpretation of UB provides a lot of performance. Ironically gcc or clang already do experiments on different interpretation of UB like signed integer overflow and implicit initialization.


Can you elaborate on the signed integer overflow and the implicit initialization differences you're referring to?



But these are options, it's not a big deal to me that compiler offers special options for special use cases. It's not clear to me if you are saying that the *default* for clang and GCC differs, aren't they both using `fno-wrapv` by default?


These options provide a different interpretation of UB like signed integer overflow and implicit initialization. It's in reference to Ralf's blog post:

>I honestly think trying to write a highly optimizing compiler based on a different interpretation of UB would be a worthwhile experiment. We sorely lack data on how big the performance gain of exploiting UB actually is. However, I strongly doubt that the result would even come close to the most widely used compilers today—and programmers that can accept such a big performance hit would probably not use C to begin with. Certainly, any proposal for requiring compilers to curtail their exploitation of UB must come with evidence that this would even be possible while keeping C a viable language for performance-sensitive code.

He doesn't know such interpretations are implemented by widely used C compilers.


He doesn't really address any of the claims made, and makes some startling claims himself. For example:

> This example demonstrates that even ICC with -O1 already requires unrestricted UB.

The example demonstrates nothing of the sort. It demonstrates that ICC uses unrestricted undefined behaviour, not that this is required in any way shape or form. (The only way the word "requires" is reasonable here is that the behavior seen "requires" use of this kind of UB to be present. But that's something very different, and doesn't match with the rest of his use).

> writing C has become incredibly hard since undefined behavior is so difficult to avoid

No, it has become difficult because compilers exploit UB in insane ways. The platform specific UB that he claims is "not an option" is, incidentally, exactly how UB is defined in the standard:

Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

This was made non-binding in later versions of the standard, so optimiser engineers act as if these words don't exist.

And of course there is no reason given for this interpretation being "not an option". Except for "but I wanna". Yes, it's not an option if you really want to exploit UB in the insane ways that compilers these days want to exploit it.

But that's not necessary in any way shape or form.

> That would declare all but the most basic C compilers as non-compliant.

Yes, because by any sane interpretation of the standard they are non-compliant.

On a more general note, I find this idea that things are necessary because I want to do them really bizarre.


> This was made non-binding in later versions of the standard, so optimiser engineers act as if these words don't exist.

This is reminiscent of the argument in "How One Word Broke C" [0, HN discussion at 1]. I'm not particularly convinced this argument is correct. In my opinion, it's "ignoring the situation completely" that's the phrase of interest; after all, what is assuming that UB cannot occur but "ignoring the situation completely"?

I'm a nobody, though, so take that with an appropriate grain of salt.

[0]: https://news.quelsolaar.com/2020/03/16/how-one-word-broke-c/ (currently broken; archive at https://web.archive.org/web/20210307213745/https://news.quel...

[1]: https://news.ycombinator.com/item?id=22589657


Apologies, but "ignoring the situation completely" is the exact opposite of "assuming the situation cannot occur and acting on that assumption".


I'm not sure how "acting on an assumption that the situation cannot occur" is distinguishable from "choosing to ignore situations completely whenever they come up". The former is a blanket description of how you treat individual instances.

For example:

    int f(int* i) {
        int val = *i;
        if (i == NULL) { return 0; }
        return val;
    }
I submit that there are two situations here:

1. i is NULL. Program flow will be caught by the NULL check and the return value is 0.

2. i is not NULL. The NULL check cannot be hit, and the return value is val.

As allowed by the standard, I'll just ignore the situation with UB, leaving

> i is not NULL. The NULL check cannot be hit, and the return value is val.

Alternatively, I can assume that UB cannot occur, which eliminates option 1, leaving

> i is not NULL. The NULL check cannot be hit, and the return value is val.

You get the same result either way.

And that gets to the root of the problem: What exactly does "ignoring the situation completely" mean? In particular, what is the scope of a "situation"?


Not sure what such an absurd code example is supposed to show.

However, ignoring the situation completely in this case is emitting the code as written. This is not hard at all, despite all the mental gymnastics expended to pretend that it were hard.

That is the compiler's job: emit the code that the programmer wrote. Even if that code is stupid, as in this case. It is not the compiler's job to second guess the programmer and generate the code that it believes the programmer meant to write.

In this case:

   1. Dereference i.
   2. Check i.
   3. If i is null return 0.
   4. return the value from line 1.
If you want, I can also write it down as assembly code.

Again, this isn't hard.

"Ignoring the situation completely" means ignoring the fact that this is UB and just carrying on.


> Not sure what such an absurd code example is supposed to show.

I had a few things in mind:

1. Highlight that it's not the wording change from "permissible" to "possible" that enables UB-based optimization - it's the interpretation of one of the behaviors the standard lists

2. It's the vagueness of "the situation" that's at the heart of the issue.

3. "Ignoring the situation completely" can produce the same result as "assuming UB never happens", contrary to your assertion (albeit subject to an interpretation that you disagree with)

(And absurd or not, similar code does show up in real life [0]. It's part of why -fno-delete-null-pointer-checks exists, after all).

> However, ignoring the situation completely in this case is emitting the code as written.

> "Ignoring the situation completely" means ignoring the fact that this is UB and just carrying on.

Wouldn't "emitting the code as written" or "ignoring the fact that this is UB and just carrying on" fall under the "to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message)"? If it does, why have the "ignoring the situation completely with unpredictable results" wording in the first place?

I'm not really convinced your definition of "ignoring the situation completely" is the only valid definition, either. Why wouldn't the standard's wording allow the interpretation in the example?

> That is the compiler's job: emit the code that the programmer wrote.

Why bother with optimizations, then?

[0]: https://lwn.net/Articles/342330/


>> That is the compiler's job: emit the code that the programmer wrote.

> Why bother with optimizations, then?

Why not?

If you can optimise without compromising the intended semantics of the code, go ahead. If you cannot, do not.

Note: you will have to apply judgement, because the C standard in particular happens to be incomplete.


> Why not?

Because "That is the compiler's job: emit the code that the programmer wrote", and optimizing means the compiler is allowed to emit code the programmer did not write.

> If you can optimise without compromising the intended semantics of the code, go ahead.

And how does the compiler know what the "intended semantics" of the code are, if it isn't precisely what the programmer wrote?

> you will have to apply judgement

How is that compatible with "It is not the compiler's job to second guess the programmer and generate the code that it believes the programmer meant to write"? Applying judgement to guess what the programmer intended sounds exactly like "generating the code that [the compiler] believes the programmer meant to write".


> It is not the compiler's job to second guess the programmer and generate the code that it believes the programmer meant to write.

Not if the compiler is run at some (hopefully available) "absolute vanilla", translate-this-code-literally-into-machine-language-and-do-absolutely-nothing-else settings, no. But if you add some optimise-this or optimise-that or optimise-everything switch? Then second-guessing the programmer and generating the code it believes the programmer meant to write (if only they knew all these nifty optimisation techniques) is exactly the compiler's job.

And isn't that the subject under discussion here; Undefined Behaviour and compiler optimisation?

> "Ignoring the situation completely" means ignoring the fact that this is UB and just carrying on.

Q: How?

A: In an undefined way. (Right, since this is "Undefined Behaviour"?)

So... Whatever the heck happens, happens. It's, like, not defined. If you want it defined (as "Do exactly what I wrote anyway!"), then don't be so hung up on the Undefined Behaviour bit; just switch off all the optimisation bits in stead.


> Not if the compiler is run at some (hopefully available) "absolute vanilla"

Not this straw man again, please!

You can optimise, as long as you keep the semantics of the code.

When what the programmer wrote conflicts with an optimisation the compiler would like to perform:

"- Trust the programmer."

http://port70.net/~nsz/c/c89/rationale/a.html

So if the compiler wanted to put a 4 element array in registers as an optimisation, but the programmer wrote code that accesses element 6, guess what? You can't do that optimisation!

[You can also emit a warning/error if you want]

Also, if your fantastic optimisation breaks tons of existing code, it's no good.

"Existing code is important, existing implementations are not. A large body of C code exists of considerable commercial value. "

http://port70.net/~nsz/c/c89/rationale/a.html

>> "Ignoring the situation completely" means ignoring the fact that this is UB and just carrying on. >Q: How? >A: In an undefined way. (Right, since this is "Undefined Behaviour"?)

Nope. There are bounds. Again, this isn't hard, it says so right there.


> You can optimise, as long as you keep the semantics of the code.

In order to do this, we need to define the semantics. Actually doing this is a much harder exercise than you might think, and largely involves creating a PDP-11 emulator since modern machines simply do not behave like the systems that C exposes. We don't even have a flat buffer of memory to address.

Things get way way worse when you start considering nontraditional architectures and suddenly the only way to match the proposed semantics is to introduce a huge amount of software overhead. For example, we might decide to define signed integer overflow as the normal thing that exists on x86 machines. Now when compiling to a weird architecture that does not have hardware overflow detection you need to introduce software checks at every arithmetic operation so you can match the desired semantics appropriately.


sorry, I don't understand, it seems to me that current optimizing compilers are fully compliant with the option of "ignoring the situation completely with unpredictable results". That's how UB exploiting optimizations work: the compiler ignores the possibility that the erroneous case can ever happen and takes into account only all valid states.


Apologies, but "ignoring the situation completely with undpredictable results" is the exact opposite of "assuming the situation cannot occur and acting on that assumption".

If my program attempts to write outside the bounds of declared array, ignoring the situation (that this is UB) is letting that write happen, and letting the chips fall where they might.

How is assuming it cannot/must not happen, and then optimising it away because it did happen "ignoring the situation"??


So a compiler that is allowed to ignore a situation would still be required to generate code for that situation? I don't even know how that would be possible.


It ignores the fact that this is UB.

It is not just "possible", but exceedingly simple to comply with what is written in the standard and generate code in that situation. For example:

Writing beyond the bounds of an array whose length is known is apparently UB.

   int a[4];
   a[2]=2;
   a[6]=10;
To generate code for the last line, use the same algorithm you used to generate code for the second line, just parameterised with 6 instead of 2.

What was impossible about that?


If each of a[0...3] is stored only inside registers, which register should the compiler pick for a[6]?


Who is forcing the compiler to store fracking arrays in registers?

Once again, “incompatible with weird optimizations I want to do” is not close to the same thing as “not possible”.

If the two are incompatible, it is obviously the weird optimization that has to go.

How do you pass the address of that array to another function?


> Who is forcing the compiler to store fracking arrays in registers?

The people who want their program to be fast. "Just do everything in main memory" means your program will be extremely slow.


1. Arrays ≠ Everything. Please stop with the straw men.

2. "Compiler Advances Double Computing Power Every 50 Years, And The Interval Keeps Growing". (i.e. even Probsting was wildly optimistic when it comes to the benefits of compiler research). The benefits of these shenanigans are much less than even the critics thought, never mind what the advocates claim. They are actually small and shrinking.

https://zeux.io/2022/01/08/on-proebstings-law/


Sorry, we are going in circle. I thought we were discussing what it means for the compiler to ignore UB.

Also calling scalar replacement of aggregates a weird optimization is very strange.


It means ignoring the situation and emitting the code that the programmer wrote. The programmer did not write "put this array in registers". The programmer did write "access element 6 of this array".

This isn't hard.

"Keep the spirit of C. The Committee kept as a major goal to preserve the traditional spirit of C. There are many facets of the spirit of C, but the essence is a community sentiment of the underlying principles upon which the C language is based. Some of the facets of the spirit of C can be summarized in phrases like

- Trust the programmer.

- Don't prevent the programmer from doing what needs to be done.

- Keep the language small and simple.

- Provide only one way to do an operation.

- Make it fast, even if it is not guaranteed to be portable.

The last proverb needs a little explanation. The potential for efficient code generation is one of the most important strengths of C. To help ensure that no code explosion occurs for what appears to be a very simple operation, many operations are defined to be how the target machine's hardware does it rather than by a general abstract rule."

http://port70.net/~nsz/c/c89/rationale/a.html

> calling scalar replacement of aggregates a weird optimization is very strange.

No it's not, certainly for arrays. It's a little less weird for structs than it is for arrays. But the fact that you consider calling it weird "very strange" is telling, and to me the heart of this disconnect.

The Optimiser-über-alles community feels that the needs of the optimiser far outweigh the explicit instructions of the programmer. Many programmers beg to differ.


At this point it is obvious that you wouldn't accept anything except a trivial translation to ASM. That's a perfectly reasonable thing to want, but you'll also understand that's not what 99.99% of C and C++ programmers want; even assuming (but not conceding) that might have been the original intent of the language, 50 year later user expectations have changed and there is no reason compilers authors should be bound by some questionable dogma.


> Apologies, but "ignoring the situation completely with undpredictable results" is the exact opposite of "assuming the situation cannot occur and acting on that assumption".

Seems to me it's not the opposite but the exact same thing.


Not sure how you can say that.

   int a[4];
   a[2]=2;
   a[6]=3;
"Ignore the situation": emit the code the code for a[6]=3; in the same way you emitted the code for a[2]=2. You've ignored the fact that this is UB.

"Assume the situation cannot occur": don't know, but according to the UB extremists the compiler can now do anything it wants to do, including not emitting any code for this fragment at all (which appears to happen) or formatting your hard disk (which doesn't, last I checked).

Assuming that a[6]=3 does not occur, despite the fact that it is written right there, would also allow putting a into registers, which "ignoring the situation" would not.


To me, the "situation" is that `a[6]` is being assigned to. So I'd make the compiler ignore the situation, and omit line 3.

Then the compiler could optimize the remaining program, possibly by putting `a` into registers.

I think you have a different view on what the "situation" is, so that your view of ignoring it is different from mine (and CRConrads, etc).


Torvalds was a strong advocate of GCC 2.95 (iirc), early on in Linux history, because he knew the kind of code it would emit and didn't trust the newer compilers to produce code that was correct in those circumstances.

The workarounds and effort required to tell a compiler today that no, you really did want to do the thing you said might well be insupportable. I figure they started going astray about the time self modifying code became frowned upon.


To be fair, the backend in the early GCC 3.x series was just kind of stupid sometimes. Even now I find strange if cheap and harmless heisengremlins in GCC-produced x86 code (like MOV R, S; MOV S, R; MOV R, S) from time to time, while the Clang output, even if not always good, is at least reasonable. This is not to diss the GCC team—the effort required to port a whole compiler to a completely new IR with a completely different organizing principle while keeping it working most of that time boggles the mind, frankly. But the result does occasionally behave in weird ways.


Can Linux compile under clang nowadays?


While sibling comment is correct that the simple answer is "yes", I'd like to add that Google uses Clang as the preferred compiler for Android. There's a little bit of drift between Android's version of Linux and mainline, but the effect is still that building Linux with Cland is being extensively used and tested.



Yes, when that Linux variant is called Android.

Since almost 5 years by now.


One thing that is not mentioned in the article, is that next to undefined behavior, there is also implementation defined behavior.

For example, if signed integer overflow would be implementation defined behavior, then any weirdness would be limited to just the integer operation that overflows.

Lots of other stuff can be expressed as implementation defined behavior. That would probably kill some optimizations.

So the question is more, do we want a portable assembler? In that case as many C constructs as possible need have defined behavior. Either defined by the standard or as part of the compiler documentation.

Another possibily is to have standards for C on x86, amd64, arm, etc. Then we can strictly define signed integer overflow, etc. And say that on x86, pointers don't have alignment, so a pointer that points to storage of suitable size can be used to stored an object of different type, etc.

If the goal is to run SPEC as fast as possible, then making sure every program trigger undefined behavior is the way to go.


I have a dumb question. Why can “we” write pretty good apps in languages other than C, but can’t write operating systems? Is talking to hardware so much different than talking to APIs?

Another point of view on the same question: Looking at software and hardware, the latter evolved insanely, but the former didn’t get seemingly faster, at least in userlands. Why bother with UB-related optimizations at all for a wide spectrum of software? Is there even software which benefits from -O3 and doesn’t use vectorization intrinsics? Why can’t “we” just hardcode jpeg, etc for few platforms? Is that really easier to maintain opposed to maintaining never ending sources of UB?

Iow, why e.g. my serial port or ata or network driver has to be implemented in C, if data mostly ends up in stream.on(‘data’, callback) anyway?


It boils down to abstractions papering over ABI details.

How do you write "put 0x12345678 to register 0x04000001" in assembler? mov eax, 0x04000001 / mov [eax], 0x12345678

How do you write it in C-as-portable-assembler? You write (u32)0x04000001 = 0x12345678;

How do you write it in Java? You can't, the language has no such ability and if you try it's a syntax error. You have to call into a routine written in a lower-level language.

How do you write it in C-as-abstract-machine? You can't, the language has no such ability and if you try it's undefined behaviour. You have to call into a routine written in a lower-level language.

By the way, you can't write an operating system in C-as-portable-assembler either. No access to I/O port space, no way to define the headers for the bootloader, no way to execute instructions like LGDT and LIDT, no way to get the right function prologues and epilogues for interrupt handlers and system calls, no way to invoke system calls. All those things are usually written in assembler. Writing operating systems in C has always been a lie. Conversely, you can extend the compiler to add support for those things and then you can write an operating system in extended-C!


This addresses a part of my question, which I didn’t make clear, thanks! I mean after all this assembler stuff one could just use BASIC or similar. Yes, Java has no concept of PEEK/POKE, IN/OUT, but it just wasn’t designed for that. Meanwhile, 1980s small systems were all assembly + basic/fortran. Of course they had no kernel in a modern sense, but all the devices were there: a speaker (SOUND), a serial line to a streamer/recorder (BLOAD), a graphics controller, no dma though, but it’s just a controller with the same “ports” as well, which can r/w memory and generate interrupts. I don’t get it why we don’t just skip C to something high-level after wrapping all this pio/dma/irq/gdt/cr3 stuff into __cdecl/__stdcall format and then use ffi of a language which would decide to support that. I also don’t understand GC arguments down the thread, because GC over malloc seems to be just a synthetic detail. You definitely can implement GC over a linear address space, just bump alloc it until the limit, or page-table however you want for dma. Malloc is not hardware, it isn’t even a kernel thing. Apps run on mmap and brk, which are similar to what kernels have hardware-wise. Mmap is basically a thin layer over paging and/or dma.

It was so easy and then blasted into something unbelievably complex in just few years. Maybe 80386 wasn’t a good place to run typescript-over-asm kernel, but do we still have this limitation today?


We don't do that, mostly because many communities cargo cult C and C++, so unless you have a companies like Apple, Microsoft and Google stepping in and asserting "this is how we do it now if you want to play on our platform".

Arguably with their push for Swift and Java/Kotlin, Apple and Google are much forward than Microsoft on this matter, given that .NET tends to suffer from WinDev worshiping C++ and COM.

You can get that BASIC experience nowadays when playing with uLisp, MicroPython and similar environments for embedded platforms, most of them more powerful than 16-bit home computers.


Let's assume you want write most of the operating system in the high level language and as little as possible in assembler.

For most languages, writing hardware trap handler becomes quite a bit of an issue. In trap handler you cannot rely on an extensive runtime system. Anything that does garbage collection is probably out. Anything that does dynamic memory allocation is out as well.

Beyond that, how easy is it to create pointers, create datastructures that match a specific memory layout, etc. Low level device drivers need to talk to hardware in very specific ways. If it is hard to talk to the hardware, most people are probably not going to bother using that language for an operating system.

In theory you could mix and match languages in a kernel. For example, a filesystem could be written in a language that has an extensive runtime system.


I'd say that Rust (and, to a smaller extent, Zig, and, AFAICT, Ada) allow to write code that is guaranteed to not allocate, and define the exact memory layout of certain structures, all while offering much tighter protections than C.

Of course, there are things that cannot be expressed in safe code in either language. But marking fragments of code as unsafe, where C-like unconstrained access is allowed, helps a lot to minimize such areas and make them explicit.

There is definitely room for a more expressive and safe languages in the kernel-level space. We can look at Google Fuchsia or maybe at Redox OS, both are very real operating systems trying to use safer languages, with some success.


I think stability plays a big role in C continuing to remain dominant. Rust and Zig arent there yet, and wrt Rust in particular the ownership model doesn't play nearly as nicely in non-deterministic environments (taking far more code to deal with hardware such as an external display or printer that might, for example, get randomly unplugged at any point in time works against the grain of a static memory ownership analysis).


I'd say it's a good example why more static checks like lifetimes are useful.

With them, you can at least tell apart data structures that are fleeting and can disappear when a cable is ejected, and those which should stay put. This likely might help avoid another double-free or another freed pointer dereference.



Ada still has plenty of UB; it's just that the cases where it arises are usually more explicit.


If we don't adopt X because it doesn't solve 100% of the problems, then we will never improve.


If anything in the kernel is written in a language that has an extensive runtime system... Well, extensive runtime systems are pretty reliably resource hungry. And when they might suddenly need which sorts of resources tends to be unpredictable.

Vs. the kernel must keep working reliably when resources are running low.


But, today linux simply kills any process to free memory. What could prevent a gc (which also serves allocations, not only collects them back) to just do that on an emergency cycle? Destroy a record in a process array and reclaim its pages (of course without doing any allocations on the way, or by using an emergency pool). Or even just reclaim its pages and wait for it to crash on a page fault if you feel lazy.

which sorts

Dynamic languages only do memory-related ops unpredictably, or is it more than that?


I would guess that the big difference between an app and an OS is that the OS needs to do more complicated things with memory addresses.

An app that runs has its own nicely mapped address space. And it interfaces with devices through system calls. An operating system has to keep the actual addresses of everything in mind, and it usually has to talk to devices through virtual addresses.

As an example of what I think might be the problem. If the OS wants to read data from a device, it might allocate a buffer, wait for the device to write into that buffer, and then later read it. For the compiler, that is essentially "reading uninitialized memory" and thus undefined behavior.


The example works because the compiler has no way to know that the programmer intends the memory to be filled by e.g. a DMA transfer from a device.

If a programmer could communicate this idea to the compiler, it would be somehow safer to write such code. There is a big difference between intentionally reading what looks like initialized memory, and doing so by an oversight.


It's not so much about 'intent'. The spec simply says this operation is undefined behavior. You could have a compiler that you could somehow inform "please just define this behavior as reading whatever is in memory there". But that supports the original point of the article, that plain ISO C is not suitable for OS programming.


Which is why many that learn "my compiler C" than get surprised by what happens when their code runs somewhere else and then most likely blame the other compiler instead of blaming themselves by not learning the differences between ISO C and their daily compiler.


> Is talking to hardware so much different than talking to APIs?

It depends. If your hardware is behind a bus or controller device that's serviced by a separate driver, then you're using APIs of that bus/controller driver.

But think of having to talk to the TCP/IP stack using system calls - you are using an API but you'll still need to have some structure just beyond moving data back and forth over a bus. A USB mass storage driver is going to need different data moving over the USB interface than a USB network interface driver.

Different buses work differently as well - USB device addressing is different than SATA or PCI-E device addressing.

If you are really talking directly to a device, you're manipulating registers, bits, ports, etc. You may have to involve IRQs, etc. Your serial port, for example, can hold 16 bytes before it generates an IRQ to tell the CPU it has data if it's a 16550 I think. Your SATA interface doesn't work like that, it can actually DMA data directly to memory. But both of these could be streamable devices to an operating system.


I think that the statement should be refined to say that it is not possible to develop a monolithic kernel based OS in ISO standard C. A monolithic kernel generally relies on passing pointers to memory between components with few restrictions. This can be problematic for some languages/compilers. However a microkernel OS that provides more structure around how data is shared between OS components can support development of many OS components in multiple languages. Even languages requiring significant runtimes like Java or C# could be used for many OS components like file systems, network stacks or device drivers.

Historically, it has been difficult to beat monolithic kernels for performance and efficiency, and through significant effort, monolithic kernel based OS's exist that are reliable enough to be useful. However, the monolithic kernel is not the only OS architecture.


While maybe theoretically of interest, it is far afield of the pragmatic considerations that underlie the paper: worked on operating systems in common use TODAY.


Ever heard of Android?


microEJ, Meadow Project, Android are such examples.


Yes, C provides a way to talk to ABIs, in addition to APIs. It's not just "talking to hardware" it's talking to other software in a reliable way, such that you can upgrade your C compiler and have code written in C89 talk to code written in C11 which is unheard of in most of the other languages that don't support an ABI. (Think Python2 being incompatible with Python3)

Software has gotten much faster. Yes, almost all software benefits from -O3. What do you mean "hardcode"? as far as I know libjpeg can be linked statically...

UB is easy to maintain, lets take integer addition & overflow, you just issue an ADD instruction and however that CPU executes the ADD instruction is how integer overflow works on that platform and then in the C standard you write "integer overflow is undefined behavior".


> Why can “we” write pretty good apps in languages other than C, but can’t write operating systems? Is talking to hardware so much different than talking to APIs?

Operating systems are written in other languages, such as C++ and Rust.

One requirement is that a language must be compiled and thus cannot rely on a runtime. That excludes Go and Java.

The language needs to support direct memory manipulation.

The compiled binary cannot be emitting system calls since that binary IS the kernel. Thus the compiler must be told to not link or include standard libraries.

You need to disable certain optimizations like advanced vectorization extensions and red zone use on the stack.

There are others. Lots of specific control needed.


This is why rust is so exciting: it's the first new language that's graduated from toy-language space we've seen in a while without a runtime. (python, ruby, go and typescript-nodejs are the other graduates I'm thinking about.)


>One requirement is that a language must be compiled and thus cannot rely on a runtime. That excludes Go and Java.

Maybe I'm wrong, but I know that there exist CPUs made specifically to natively execute Java bytecode, so in reality if the hardware has a baked-in language interpretation it would be actually possible to write an OS completely in Java


ARM "Jazelle" was capable of this, but it required a C implementation of a JVM. Any GC-dependant language has this problem.



True, you can design a CPU for anything. However a OS that depends on such a CPU is not portable to anything else, and can't easily run most programs that people depend on (emulators are possible, but not easy). Also most CPU advances haven't gone into such a thing and it is tricky to apply those advances while also providing what the language needs. None of this is impossible, but it makes such CPUs in todays world of questionable value.

Note that you can port any OS written in C to such a CPU with "just" a new compiler backend and a few drivers. Your OS won't take advantage of the features the CPU provides, but it will work.


Eh, can you really properly implement a CPU without interrupts? I wouldn't categorise anything in that space as a driver


Good point. I assumed there was some form of interrupt system, but not all CPUs need to have it, and lacking that your OS choices will be limited.


running java bytecode natively is neither necessary nor sufficient as you can compile java to any other native ISA, but you do still a relatively heavy runtime for GC.

Having said that, there have been OSs written in languages with heavy runtimes, even GC.



I was answering the question in a general sense for the more prolific operating systems and on generic commonly available general-purpose processors.

Yes one can implement a CPU that natively executes a runtime for a high-level language, make your own ASIC, or FPGA, etc. that does this. That is a more advanced response to the general question.

Knowing the detailed points I mentioned will help understand why specialization of processors is needed to support other higher-level languages that do not meet the requirements I laid out.


Which just proves your lack of knowledge that those runtimes target generic commonly available general-purpose processors.

None of those products use FPGAs or custom ASICs.


> just proves your lack of knowledge

Tone is not needed.

For TamaGo, it seems to allow developers run their application, not build an OS on the hardware. But I have not played with it, you are right.

> TamaGo is a framework that enables compilation and execution of unencumbered Go applications on bare metal

The environment does not seem to allow building a generic operating system [1]. F-Secure ported the runtime itself to boot natively. But please correct me.

> There is no thread support

The environment you run in is specifically curated for Go applications, such as the memory layout. I'd call this an "appliance" rather than enabling Go to be used for full-fledged generic operating system implementations.

[1] https://github.com/f-secure-foundry/tamago/wiki/Internals


Tone follows the last paragraph, returning ball.

An OS is an OS, regardless of the userspace.



> I have a dumb question. Why can “we” write pretty good apps in languages other than C, but can’t write operating systems? Is talking to hardware so much different than talking to APIs?

To some small extent, yes. But I don't think that is the main issue here.

The real issue is that the stakes are much, much higher when implementing an operating system than when writing, say, an image editor. You can live with an occasional crash in a userland app. But the same crash in an operating system may open the door to taking over the entire computer, possibly even remotely.


There are some other candidates, as expressed below, but really the main problem is how difficult it is to write and deploy an operating system that's of usable capability. Even just hitting enough of POSIX to get a GUI and a browser up is a pretty huge amount of work.

How many operating systems do we use that are less than 20 years old?


I suppose because there aren't many languages that allow you to manipulate arbitrary memory locations and cast portions of that to arbitrary types, and also allow relatively easy inline ASM. Which maybe isn't 100% necessary, but seems to be helpful at an OS level.


There are operating systems written in other languages than C.

A driver doesn't need to be implemented in C, but the kernel API is likely written in C, your code needs to talk to it somehow. If your driver is written in C, it's as simple as #include.


Isn’t this sort of circular? If very low-level, I mean in/out, lgdt, etc, were exposed as v8-bare-metal modules, it would be as simple as require() then.

  t = require(“awesome-8253”)
  p = require(“pio-node”)
  // cast your usual 0x42, 0x43, 0x61 spells


Yes, contrary to C myths, any language can have those primitives as Assembly libs, which ISO C also requires anyway.


A lot of C++ codebases benefit greatly by O3 due to the more aggressive inlining and interprocedural optimizations.

Also may UB exploiting things like strict aliasing are enabled by default at all optimization levels in GCC.


In theory the difference between undefined behaviour and implementation defined behaviour is that ID behaviour must be documented. In practice good luck finding that documentation for each CPU and compiler combination. In fact good luck just finding it for LLVM and x64.


Undefined behavior entails "there are no restrictions on the behavior of the program", meaning anything can happen, including executing the opposite of the program text. Implementation defined behavior is saner in the sense that program behavior is still defined. Examples of the latter are the exact type of std::size_t or the number of bits in a byte: https://en.cppreference.com/w/cpp/language/ub


The linked reference page does not say that implementation-defined behaviour must be sensible, only that it must be defined. Contrast with unspecified behaviour where "Each unspecified behavior results in one of a set of valid results."

I expect that most instances of implementation-defined behaviour come with additional rules which state that the implementation has to define something sensible.


No, that's the difference between unspecified and implementation defined behavior.


This unspecifed behavior is perhaps a bit lesser-known than UB, here is a (hopefully non-null ;-) pointer:

https://stackoverflow.com/questions/18420753/unspecified-und...


If you want to be pedantic about it then I guess Clang and GCC don't implement the standards since they treat implementation defined as unspecified.


I don't think making it defined would help much. Overflowing a signed integer is a bug in logic. It would be ideal to have a crash on that. Continuing is going to be bad one way or another unless you luck out with your buggy code so the way the implementation works saves you. It can't be relied upon in general case though.

Imo the way is to develop more tools that detect (either by analysis or at runtime) those bugs and run the code with those attached as often as you can afford it (to take the performance penalty).


That's the thing. When C is portable assembler, you expect signed integer overflow to be the same as in assembler. On x86 you don't expect a trap.

There are quite a few idioms where overflow is used intentionally. There is no reason to turn that into undefined behavior if it works fine on the underlying platform.


There isn’t a “the same as assembler” that’d make sense in C.

For instance ARM, x86 scalar, and x86 SIMD all have different integer overflow rules for shift operations.


I don't expect C to be portable assembler. I expect it to be a simple fast language with clear rules. I am not sure what you mean by an idiom here. I suppose you mean assembler idiom as in C it was always a simple bug in logic. Obviously you can't just use idioms from one language in another without checking what the rules in the other language are.

Platform specific behavior should be as rare as possible. It's a recipe for bugs. The rule is simple enough and major compilers have flags to prevent the overflow. Obviously you pay the performance penalty for using them. It's a choice you're free to make as it should be.


"Can't be relied upon in general case" is implementation defined behavior, not undefined. UB is much worse than what you think, it's not merely can't be relied, but the program can launch nuclear rockets when it happens. Preventing such interpretations is very helpful.


I meant that you can't rely on platform/implementation specific behavior to save you. It's the worst of both worlds: you don't get performance benefits of UB and you introduce a disaster waiting to happen once your code runs on another platform or is compiled with another compiler.

I know what UB is. I think the idea is brilliant and saves millions of dollars of burnt coal every day. Sometimes security matters more and then you compile your code with every flag/sanitizer you can find to exchange performance for security.


Performance benefits of UB aren't obvious, and even if they existed, it's not obvious they are a good tradeoff. It's not only security, but any application mildly interested in correctness. Which leaves gamedev as the only consumer of aggressive UB optimization, and with the advent of online gaming even those might be more interested in security.


Overflowing a signed integer is not always a bug in logic, if you know the underlying representation then purposefully overflowing can be pretty useful.


According to this, it is a bug in the vast majority of cases (over 90%): http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p090...


The problem is that the vast majority of overflows are indeed logic errors. If the behaviour were defined, I wouldn't be able to use ubsan to catch them.


As was pointed out to me previously, UBSAN actually provides checking for unsigned overflow even though it's defined. So if signed overflow was defined, that would not stop UBSAN from flagging it.

Because in reality as you observe so many overflows are unintended. Most of the time programmers, especially dealing with 16-bit or wider integer types, treat them as though they were mathematical integers, and so overflow is extraordinary and worth flagging.

Unfortunately UBSAN doesn't prevent you getting this wrong, perhaps somewhere important. If your test inputs never trip the corner case where overflow occurs you can switch off UBSAN in release builds, ship it, and overflow on a real system where it has real consequences.


Well, TIL, it goes to show how rare are intentional overflows.


> the vast majority of overflows are indeed logic errors

This topic has turned up before. [0]

edit: I got the default behaviour the wrong way round here:

I think C# gets this right. Ordinarily it handles integer overflow by throwing an exception, but it has an unchecked keyword which gives you wrapping. [1] If you're writing code that is expected to wrap, you use the unchecked keyword, and you carry on using the usual arithmetic operators. (I believe you can also instruct the C# compiler to default to unchecked behaviour, so there's also a checked keyword. This strikes me as a mistake.)

Strictly speaking Java gives you the option of checked vs unchecked integer arithmetic, but with terrible ergonomics: the '+' operator always silently wraps, if you want throw-on-overflow behaviour you have to call a method. [2] This is of course so unsightly that Java programmers tend to stick with the infix arithmetic operators regardless of the wrapping behaviour.

C++ has templates and operator overloading, so you can use a library to get signed arithmetic to wrap, or throw, without undefined behaviour. [3] Such libraries are very rarely used, though.

See also this very good blog post by John Regehr, who specialises in this kind of thing. [4] To quote the post:

> Java-style wrapping integers should never be the default, this is arguably even worse than C and C++’s UB-on-overflow which at least permits an implementation to trap.

[0] https://news.ycombinator.com/item?id=26538606

[1] https://docs.microsoft.com/en-us/dotnet/csharp/language-refe...

[2] https://docs.oracle.com/en/java/javase/17/docs/api/java.base...

[3] https://www.boost.org/doc/libs/1_78_0/libs/safe_numerics/doc...

[4] https://blog.regehr.org/archives/1401


What if you write code that doesn't wrap but you don't want the exception or any kind of check either?

That's the whole point of UB. Let the compiler optimize by assuming your integers are in range. I get that it doesn't matter in a language like C#/Java (they are so slow additional checks don't register) but in C throwing has costs and code that wraps would require additional logic on some platforms.

One way or another if you want C to "throw" (trap) you can get that by using compiler flags.


"unchecked" is the default in C#, and "checked" is opt-in, except for compile-time expressions (and System.Decimal, which always throws on overflow).

You can tell the compiler to use "checked" by default for a given project, but it's fairly rare to see that in practice.

I wish it was the other way around, but I guess they didn't consider the overhead of "checked" acceptable as a default back in 1999; and now it's a back-compat issue.


It is being discussed to change the defaults on .NET 7.


C# 11, rather? I wouldn't expect them to change anything on bytecode level, since the choice between wraparound and overflow is always explicit there.

Do you recall where that discussion is taking place? I can't find a proposal to this effect in the usual GitHub repo.


Because the change is on the Visual Studio templates to set the checkbox enabled by default, no need to change the language for something it already supports.

It was mentioned on one of the regular YouTube videos with the team, but I am failing to find it now.


Thanks, I've edited that in.


So you're fine with throwing your hands up "its unreliable, its a bug that should never happen" when standard lacks a clear recipe how to check beforehand whether signed integer arithmetic will overflow? Everyone rolls their own, introducing even more bugs.


Well, I prefer to have standard tools to check or a way to compile so it traps on the overflow. Majors compilers provide that if you value security over performance and are not sure about correctness of your logic.


You don't actually want implementation defined behavior. There is no restriction on implementation defined behavior, it just needs to be documented. Suitable documentation includes "the optimizer assumes this never happens and optimizes accordingly.", or "Look at the source code."


I haven't got the time to read the paper yet but I believe I'd emerge with the more or less the same opinion that I've had before: nobody's forcing you to pass -O2 or -O3. It's stupid to ask the compiler to optimize and then whine that it optimizes. I usually am OK with the compiler optimizing, hence I ask it to do so. I'm glad that others who disagree can selectively enable or disable only the optimizations they're concerned about. Most of the optimizations that people whine about seem quite sane to me. Of course, sometimes you find real bugs in the optimizer (yesterday someone on libera/#c posted a snippet where gcc with -O2 (-fdelete-null-pointer-checks) removes completely legit checks)


At least in compilers like gcc, optimization needs to be enabled to get sane warnings emitted by the compiler, so some people are indeed being forced to pass in -O2 to get sane build warnings for projects.

I would really like it for the C standard to clean up Undefined Behaviour. Back in the 1980s when ANSI C was first specified, a lot of the optimizations that modern compiler writers try to justify via Undefined Behaviour simply weren't part of many compiler's repertoires, so most systems developers didn't need to worry about UD and there was no push for the standard to do so as a result.

If people really want the optimizations afforded by things like assuming an int can't overflow to a negative number in a for loop, my personal position is that the code should be annotated such that the optimization is enabled. At the very least, the compiler should warn that it is making assumptions about potentially UB when applying such optimizations.

There is this false belief that all legacy code should be able to compiled with a new compiler with no changes and expect improved performance. Anyone who works on real world large systems knows that you can't migrate to newer compilers or updated OSes with zero effort (especially if there's any C++ involved). I understand that compiler writers want to improve their performance on SPEC, but the real world suffers from the distortions caused by viewing optimizations through the narrow scope of benchmarks like SPEC.


> If people really want the optimizations afforded by things like assuming an int can't overflow to a negative number in

Perhaps people wanted int overflow to be UD because that is semantically clean. Code where possible integer overflow is intended and not bug is IMHO significant minority of all integer usage, in vast majority it is a bug, similar to invalid memory access. So it would be reasonable for some compilers/platforms to handle integer overflows by cpu exception translated to signal killing the process (similar to SIGSEGV for invalid memory access), and only integer operations explicitly marked for overflow (i.e. unsigned in C) be allowed.

It is just our experience that makes 'integer overflow should wrap' and 'invalid memory access should crash' default positions, while 'integer overflow should crash' and 'invalid memory access should be ignored/return 0' are alternative positions. Conceptually, both are just stricter or looser ways to handle code errors and it makes sense that C does not prescribe neither one.


> At least in compilers like gcc, optimization needs to be enabled to get sane warnings emitted by the compiler, so some people are indeed being forced to pass in -O2 to get sane build warnings for projects.

That's a fair point and it irritates me too. But you're not forced to run -O2 in production if you don't want to. I've had projects where we compile with multiple compilers and different settings -- including different libcs -- just to get all the warnings, while we ship non-optimized or minimally optimized debug builds to the customer..


You can't turn off -O2 in some projects. The Linux kernel has plenty of code that requires optimization to be on to build succesfully -- look at things like BUILD_BUG_ON(). And it's projects like the Linux kernel that have had security issues caused by overzealous compiler optimizations. Getting blindsighted silently by compilers has already proven that the current approach towards Undefined Behaviour by compiler writers is not working.


> If people really want the optimizations afforded by things like assuming an int can't overflow to a negative number in a for loop, my personal position is that the code should be annotated such that the optimization is enabled.

That's not an entirely absurd idea, especially if you wanted to enable optimization piece-wise for legacy code. But in the end does it matter whether you specify the optimizations you want in the build system or in the code?

> At the very least, the compiler should warn that it is making assumptions about potentially UB when applying such optimizations.

The assumptions are usually "assume no UB." I don't know what kind of warnings you expect but this idea has been brought up before and it's not good because it would just make the compiler flood warnings on any code that does any arithmetic on variables (as opposed to compile time constants).


Again, not all optimizations that compiler writers can come up with are Good Ideas. Since the Meltdown and Spectre processor vulnerabilities came to light back in 2017, many large projects have been forced to switch to prioritizing security over performance. With that shift in the industry, I think it's fair to expect compiler writers to give greater weight to potential security issues (including issues related to UB). It really is a different world now than it was 5 years ago on this front.


Meltdown shouldn’t have required anyone to switch anything since it was a purely Intel hardware bug.

Even saying Spectre causes you to care more about memory safety sounds like marketing fluff. Fixing memory bugs doesn’t even prevent Spectre issues.


When Spectre and Meltdown hit, the Linux kernel was forced down the path of adding mitigations which pessimised the generated code. Performance is no longer the primary driver of generated code. That is a change in technical direction and is hardly "marketing fluff". And it doesn't just impact Linux, it impacts languages that are implemented using JIT engines that virtually everyone uses. Thankfully Linux was able to badger the C compiler into doing what was needed, but these sorts of needs of system level projects are way outside of what ISO C permits, and they're hard technical requirements now that can't be ignored.


Memory safety was always and is still more important than performance in code generation, and it's by far the most important thing in security. Spectre mitigations in userland are mostly about removing exact sources of timing data rather than changing anything else.

There are code changes to stop branch prediction but it's mainly limited to the kernel. Or even limited to the Linux kernel, because other OSes have different approaches to it.


IIRC a few mitigations were rolled back as the additional safety was not worth the performance cost.


I’m amused by the people who ask for optimization and then complain about it. Or the “it did what I said, not what I meant” crowd.

But, officially, undefined behavior is always undefined, not just at higher optimization levels.


Alternatively if my compiler has a "run faster" flag I would not expect it to change the "semantics" of my code in the process.

Additionally in C UB is often not intentional nor trivial to detect in your codebase since it may be the interaction of two pieces of code that are not anywhere obviously close to each other. There comes a point where faster but broken code isn't better it's just broken.


"Run faster" involves things like register allocation. How can you justify register allocation doesn't change semantics of code? It absolutely does! If you smash the stack, variables on stack will be smashed, and variables on register will not. This is change of semantics.

Compilers justify register allocation by assuming stack can't be smashed, contrary to hardware reality. Because smashing the stack is UB. That's what C UB is for.


C exposes a (fairly low-level) abstract machine with its own semantics, encapsulating the operational semantics of the CPU. Optimization flags are expected to affect the uarch-operational semantics, but not the semantics of the C abstract machine.

Think of it like trading one cycle-accurate emulator for a better, more tightly-coded cycle-accurate emulator. Or a plain loop-and-switch bytecode interpreter for a threaded-code bytecode interpreter. The way your code is running on the underlying machine changes, but the semantics of your code relative to the abstract machine the code itself interacts with should not change.

> Compilers justify register allocation by assuming stack can't be smashed, contrary to hardware reality.

...which should be entirely fine, as the existence of a stack isn't part of the exposed semantics of the C abstract machine. Values can be "on the stack" — and you can get pointers to them — but nothing in the C abstract machine says that you should be able to smash the stack. There's nothing in the C standard itself describing a physically-laid-out stack in memory. (There are ABI calling conventions defined in the standard, but these are annotations for target-uarch codegen, not facts about the C abstract machine.) There could just as well be a uarch with only the ability to allocate things "on the stack" by putting them in a heap — in fact, this is basically how you'd have to do things if you wrote a C compiler to target the JVM as a uarch — and a C compiler written for such a uarch would still be perfectly compliant with the C standard.

If C were an interpreted language, the fact that the stack can be smashed would be referred to as a "flaw in the implementation of the runtime" — the runtime allowing the semantics of the underlying uarch to leak through to the abstract-machine abstraction — rather than "a flaw in the interpreter" per se; and you'd then expect a better runtime implementation to fix the problem, by e.g. making all pointers to stack values actually be hardware capabilities, or making each stack frame into its own memory segment, or something crazy like that.

As a compiled language — and especially one that has encapsulation-breaking holes in its abstract machine, like the C inline-assembly syntax — you can't exactly expect a runtime shim to fix up the underlying uarch to conform to the C abstract machine. But you could at least expect the C compiler to not let you do anything that invokes those cases where it's exposing potentially-divergent uarch semantics rather than a normalized C-abstract-machine semantics. As if, where there should be a shim "fixing up" a badly-behaved uarch, you'd instead hit a NotImplementedException in the compiler, resulting in a compilation abort.


If you wanted to prevent undefined behavior at compile time by having the compiler throw an exception, you'd actually need to prevent potentially undefined behavior, for the simple reason that compile time doesn't know what your inputs at runtime are. You could come up with a subset of C, but it would be a restrictive and probably useless subset. C strings go out the window, for instance, because you don't know at compile time if a given char * is indeed a NUL-terminated string. You also wouldn't be able to store that pointer anywhere, because it could be freed as soon as you return from the function - and that's assuming we're disallowing threads and signals in our C subset, or else you just couldn't use the pointer at all. (This, by the way, argues that it's a flaw in the language definition, not in the implementation of anything.)

There are a huge number of languages where you pass strings along with their length. There are also plenty of languages where you know (either via compile-time analysis and an insistence on writing code in an analyzable way, as in Rust, or via mild runtime overhead, as in Swift or Haskell or Java) that the memory remains still valid when you reference it. But those languages are not a subset of C. They either need more information about a program than a C program contains, or they disallow operations that the C abstract machine can perform.


You're right that you can't define a safe subset of C without making it impractical. MISRA C defines a C subset intended to help avoid C's footguns, but it still isn't actually a safe language. There are alternative approaches though:

1. Compile a safe language to C (whether a new language or an existing one)

2. Formal analysis of C, or of some practical subset of C, to prove the absence of undefined behaviour

Work has been done on both approaches.

ZZ compiles to C. [0] Dafny can compile to C++, but it seems that's not its primary target. [1][2]

There are several projects on formal analysis of C. [3][4][5][6]

[0] https://github.com/zetzit/zz

[1] https://github.com/dafny-lang/dafny

[2] https://dafny-lang.github.io/dafny/

[3] https://frama-c.com/

[4] https://www.microsoft.com/en-us/research/project/vcc-a-verif...

[5] https://www.eschertech.com/products/ecv.php

[6] https://trust-in-soft.com/


Is it better to write ZZ or Dafny code than to write Java, Python, Haskell, Rust, or Swift code?

(They look cool and I'm enjoying reading about them, but for the things I do for my day job where C is one of the obvious options, I'm not sure they're better options, but usually one of the languages above is a better option.)


ZZ's whole goal is to transpile to C, which (as their website mentions) has some practical advantages. Consider exotic hardware platforms where there's a vendor-provided proprietary C compiler and no other choice.

Dafny has a slightly different emphasis, as it's about formal verification, so it's more competing with Ada SPARK than with mainstream languages.

I can't comment on the general ergonomics on ZZ and Dafny as I've not tried them out, but presumably they're going to be well behind a language like Java with its huge ecosystem of IDEs etc. Neither seems particularly mature, either, so I wouldn't bet the farm on them.

> Is it better to write ZZ or Dafny code than to write Java, Python, Haskell, Rust, or Swift code?

If you've got the choice and you're just trying to do 'normal day job programming' I imagine that yes you'd be much better off just using Java. If you want a language that's quite like C or C++ but much better regarding safety, there are Rust, Zig, and the too-often-overlooked Ada language.


I think you’re conflating undefined behavior with implementation-defined behavior.

How integers act at the extremes of their values, for example, is implememtation-defined — when you’re using the integer types (rather than using IEEE754 floats for everything to get well-defined formal semantics), you’re implicitly telling the compiler that you don’t care what happens at the boundaries, and that you want whichever behavior the target uarch’s ISA entails.

Think of it like exposing an opaque native FFI type in a managed-runtime language. The semantics of that type aren’t “up to” the runtime; all the runtime does is ensure that only a safe set of interaction primitives on the type are exposed into the managed runtime. (“Safe” in the sense of not corrupting the runtime’s own abstract machine semantics; not in the sense of making any guarantees about the operations leaving the value in a well-defined state per the underlying machine.) in C, integers — and the operations on them — are exactly such “opaque foreign types.” As are pointers, for that matter. (And this is also why casting between integers and pointers is UB — since the internal representations of each aren’t part of the semantic model of the C abstract machine, it cannot make any guarantee about their interconvertability.)

C compilers shouldn’t abort on implementation-defined behavior. Having that behavior be implementation-defined is the well-defined semantics that the C abstract machine applies to such things.

It’s only cases where there’s no clear semantics even per implementation, that the C compiler should swerve and say “I don’t know what that should mean; so I give up.”


I agree that integer overflow is implementation-defined, but I wasn't talking about integer overflow, I was talking about char *.

Reading from a pointer past the bounds of the object being pointed to is undefined, not implementation-defined. How do you propose to write a size_t strlen(const char *s) that successfully compiles with your proposed compiler?


Exactly, If I wanted to be operating of the level where I am worrying about registers and stack smashing and such I would just drop down assembly directly. You use C because it has an in-theory abstract machine that I can target and avoid needing to use assembly.


Even comparing two pointers can expose potentially divergent uarch semantics. Should that be disallowed at compile time?


No no no; you see, in these kinds of debates "UB" is only the things I disagree with!


Many of the people who complain about undefined behavior effectively want a #pragma dwim.


The standard committee’s position is that if undefined behavior isn’t easy for the programmer to detect, why would it be easy for the compiler to detect?

I’m a little more familiar with the C++ committee than the C committee. The C++ committee prefers to declare something an error than to declare it undefined behavior. They only declare something undefined when they believe it would be unreasonably difficult for the compiler to detect the problem (e.g., using multiple, incompatible, definitions for an inline function; which can happen if you have a macro expanding differently in different parts of the code, or you have multiple definitions for the function, but only one definition is ever visible to the compiler at any moment in time).

I’m pretty sure the “signed overflow is undefined” rule is something of a special case: it should be easy to detect when source code doesn’t have a hard coded upper bound, but giving an error or warning in all cases will create too many false positives, and declaring that it wraps on overflow has been deemed unacceptable by the committee.


UB problems happen only when compilers detect it, understand it is an opportunity for optimization, because they are allowed to do anything, then do that optimization. When compiler can't detect UB, it actually works as expected: when integers overflow they do exactly that and nothing else, when null pointers are dereferenced they do exactly that and nothing else, when uninitialized memory is read it does exactly that and nothing else.


> UB problems happen only when compilers detect it, understand it is an opportunity for optimization, because they are allowed to do anything, then do that optimization.

This is a deep misconception, and not at all how most undefined behavior is related to optimization.

Trivial example: Compilers assume that your variables aren't written to randomly from other threads. Without this assumption, virtually no optimization would be possible. Therefore, data races are UB - they violate a hard assumption of the compiler. But at no point did the compiler say "oh, you wrote to this variable without synchronization! Now I'll show you, hehehe!".

This is the same for e.g. removed overflow checks. Compilers can optimize many loops only if they assume signed integer overflow never happens. So, because compilers should be able to assume this, it is UB if it happens in your code. The same logic that deduces "this loop cannot wrap around" deduces "this if condition [an overflow check] cannot ever be true".

But it's easier for a programmer who got their UB-reliant checks optimized out to attribute malice to the compiler than to understand optimization fundamentals, and thus we get people complaining to high heaven and back.


>But at no point did the compiler say "oh, you wrote to this variable without synchronization! Now I'll show you, hehehe!".

That's because the compiler doesn't detect the situation. If it knew, it could, say, remove all code leading to that point - it's legal and the resulting code would be faster.

>Compilers can optimize many loops only if they assume signed integer overflow never happens.

I don't think this happens. Most loops iterate over arrays, and at least in C tradition unsigned integers are used as array index with no signed integers in sight.


You are strawmanning the parent, he did not claim that compilers will go "oh, you wrote to this variable without synchronization! Now I'll show you, hehehe!". Your description of the compiler's behavior matches his description; the compilers detect a case where the program might exhibit behavior defined ISO C spec (signed integer overflow), see it as a potential for optimization because they are allowed to do anything (certain passes can be made if overflow is assumed not to happen), then do that optimization.


The parent said

> UB problems happen only when compilers detect it, understand it is an opportunity for optimization, because they are allowed to do anything, then do that optimization.

Perhaps I'm misunderstanding the comment, but I took "detect it, understand it is an opportunity for optimization, because they are allowed to do anything, ..." to mean "detect that undefined behavior occurs and then ..."

Instead, it should be "detect that an optimization can be applied and that the optimized code will do the right thing in all cases that undefined behavior does not occur, and apply the optimization."

I've met programmers who seem to believe that compilers intentionally apply weird transformations when they detect undefined behavior. But really, the compilers are applying sensible transformations for well-defined code and ignoring whether those transformations are sensible for code that isn't well-defined.


Here is an actual example of taking advantage of UB behavior in a compiler.

The compiler sees a variable x that's allocated on the stack. It looks at all the uses of the address of x, and sees that they are all loads and stores. It therefore decides to promote x into a register variable.

Where's the UB, you ask. Well, since the address of x was never leaked, it is UB to compute another pointer to that address (say, via an integer-to-pointer conversion). The compiler never checked for that possibility to affirm that it was UB; it knew that any code that did that would be UB and simply ignored the possibility.

This makes arithmetic overflow a very poor vehicle for UB because it's unusual in that you can't really take advantage of the UB without pointing to the specific operation that caused the UB to occur. This is why I believe that arithmetic overflow UB is gratuitous, and people objecting to UB because of what happens specifically with arithmetic overflow go on to make suggestions that are completely untenable because they don't have familiarity with how most UB works.


Undefined signed overflow isn’t great (and it doesn’t help that C’s implicit integer promotion rules are bad), but it’s important because it allows loop optimizations, which really are important cross-platform, eg Intel is fine with loops counting up but PPC would like them to count down.

The problem causing this is that C loops are overspecified. If there was a “for…in” loop that didn’t make you declare and increment a variable, then manual increments could trap on overflow without causing so many problems.


I don't understand.

If you can turn the loop from upward to downward then you do have a bound for it, and then you can tell if it overflows or not.

Also why is signed overflow a problem and not unsigned?

Surely you want unsigned loop optimized also?


Unsigned overflow is not UB (it wraps) so it has to be preserved more often, which means you have a loop bounds less often, which means those loops can't be optimized.

Typically not a problem for i=0...n loops but eg `for (i=0;i<n;i+=2)` could overflow.


You can cast literally any optimization into this shape. For starters, virtually any optimization pass requires sane assumptions about (absence of) data races and/or out of bounds writes. The point you are arguing (or claiming OP argued) boils down to "compilers behave as I want when I turn all optimization passes off".

No, the OP wrote specifically that "the compiler detects [UB]" and then does optimizations exploiting that. Not "detects potential UB". The former is a common misconception, the latter is basically a truism.


> You can cast literally any optimization into this shape

Tell me how these use UB: integer constant folding, unused load removal, optimal use of GP registers to minimize stack spilling

>sane assumptions

"sane" here is doing a lot of work. Assuming overflow won't happen is not a sane assumption, assuming some other thread won't randomly write into your stack memory is.


> unused load removal

Oh come on, this is like the most UB-centric thing in the entire compiler. The compiler uses UB to know that it has enumerated all uses of a memory location, as any pointer that didn't get its value from the data dependency tree that compiler used had to use UB to compute the address. (Be it from buffer overflow, use after end-of-lifetime, random integer converted to pointer, etc.)


I might be using the wrong terminology, what I'm talking about is removing e.g. `int x = *ptr` where x is then not used in scope (more realistically, `(*struct_inst).member` shouldn't bother to load the other members of the struct). What you're doing sounds like removing a global it detects is never used, which I agree relies too much on UB and should not be done.


Aside from the possible trap ("segfault") mention in the sibling comment, the first example relies on absence of UB because with UB you could enumerate the contents of the stack to find where `x` is living, and discover it isn't there, contra the spec. Even allocating variables directly into registers and never onto the stack relies on UB.


Why is it UB to use registers to store variables? This seems like an implementation detail. The ISO C spec doesn't even have the word "stack" as far as ctrl-f can show me.


> Why is it UB to use registers to store variables?

It isn't; that sentence is even a type error. UB is not a property of compiler output. It's a property of compiler input.

The use of registers to store variables between uses (when those variables never have their address taken) relies on the lack of any defined behavior which would allow the program to determine this is being done. The fact you can't find this in the spec is precisely because it is not defined.


If any object can potentially be reached by forging pointers or enumerating all memory locations, you can't even keep local variables in registers around potential stores to unknown pointers (or ever, in a multithreaded program).


Removing a load is visible behaviour if the load would have trapped for example.

Everybody agrees that compilers should only do sane optimizations. Nobody agrees on which subset is sane.


That's fair, I'll concede that I make some implicit assumptions. But the magnitude of people who hit problems and accidentally hit UB should give a strong indication that a lot of things done now are not close to sane.


Lobbying the standard body to reduce the amount of UB in the C standard is a perfectly reasonable position. Requiring compilers to DWIM is not.


> integer constant folding

    int a = 10;
    int var1;
    int b = 20;

    foo(&var1);

    int c = a + b;
You'd like to constant fold c? Better assume no UB:

    void foo(int* x) { *(x-1) = 0; }
> unused load removal

Same idea as above.

> Assuming overflow won't happen is not a sane assumption

If the alternative is loops being much slower when you request optimizations, then maybe it is. Consult your compiler manuals if this default irks you.


Okay, yes, you are technically right. What I mean is, making an assumption _other than_ that the code won't alter stack/heap/instruction memory that the implementation is using (e.g. aliasing, overflow, use of NULL)


Sorry, I'm tired of playing whack-a-mole with what are obviously sane assumptions to you. I literally mentioned out-of-bounds writes as a "universal UB" example, you asked me to demonstrate how UB is relevant in a few optimizations, I did, and now you shift goalposts.

My original point was that compilers don't specifically abuse UB to sleight you. You're trying hard to draw a line in the sand between different degrees of UB so that you can label one side as "not sane", with (what appears to be) the sole intent of concluding malice on the compiler author side. Please accept that the related tradeoffs have been discussed, and that a default was chosen (if you are curious for a recent instance of this, see [0]).

If you are ok with leaving performance on the table by disabling overflow being UB, you are literally free to do that. You're also free to lobby the standard bodies to alter their carefully considered choices. But you're not going to get anywhere by labeling any of this "insane".

[0]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p090...


Isn't TBAA exactly the assumption that the program won't alter the stack/head/instruction memory in an illegal way?


That's not true at all. Compilers can detect some instances of UB, and they will happily warn when they do.

Most instances of UB are ones that the compiler could not detect. Your compiler isn't going to detect that your integer overflows, it assumes that it won't. Shit blows up if that assumption was wrong.

If you want to detect UB, you should run ubsan and the like. The compiler is not running it.


In some sense, the problem isn't that the compiler "can't" detect that an integer addition will overflow. The problem is that with some exceptions (a loop that is clearly indexing along an array, for example), the compiler will have to flag every integer addition as UB. UB notifications aren't that useful if you're getting a dozen for every five line function in your entire codebase.

Programmers do not, in general, want to do the amount of work it takes to safely add two numbers together. You need something "weird". Like an error path for every addition. Or a type system for ranges, where you actually have to specify ranges (because adding two bytes together that you casually left as 0-255 will create a new ranged value 0-510; multiplication is even more fun). Or some other exotic idea. So we just let them overflow and the chips fall where they may.


> In some sense, the problem isn't that the compiler "can't" detect that an integer addition will overflow.

It absolutely can't if it doesn't know what the values are going to be.

> the compiler will have to flag every integer addition as UB

No, I disagree. If it has no knowledge of my runtime values, it can't flag addition as UB because it isn't. It's UB only if I'm using values that would overflow, and the compiler in general can't know that. If I've done my program right, it will never overflow. There is no UB, any flag would be just absolutely wrong. The compiler couldn't detect UB. There's nothing to flag.

If it knows the values, then of course it can detect it and flag it:

    x.c:3:18: warning: integer overflow in expression of type ‘int’ results in ‘-2’ [-Woverflow]
        3 |  int a=2147483647+2147483647;
> Programmers do not, in general, want to do the amount of work it takes to safely add two numbers together.

I agree that overflow-checking is needlessly sucky in C but that's not "the problem." I usually make sure my arithmetic is safe but the compiler won't know it.


It blows up, but not as violently as when the compiler adds fuel to it.


UB is closer to "we assume this can never happen and make decisions accordingly" than "Ha ha, I caught you making a mistake, let's actively try to mess up your life now."


They know they mess up, they just have defense for it.


The problem is that this doesn't mean anything, at least in terms of the C programming language. You say "When integers overflow, they do that and nothing else," but there is no definition of signed integer overflow, so there is no "that" for them to do. That is what it means for something to be undefined behavior. You have some definition in your mind, which might happen to match what some versions of some compilers do at some optimization levels — but that isn't actually the definition, because there isn't one.


Wrapping arithmetic is supported by gcc and clang.


Reading uninitialized memory is not UB... but anyway. Here's an example of how compilers treat null dereferencing as UB:

    void func(struct x *aptr, struct x *bptr) {
        x->a = 5;
        y->a = 10;
        x->a = 10;
    }
This is a contrived example but you can imagine writing code that makes a sequence of assignments like this. An optimizing compiler can remove the "dead store" which assigns 5 to x->a. However, this optimization is valid because writing to a null pointer is undefined... if writing to a null pointer trapped (SIGFAULT), which is what it actually does at runtime, then the optimization would be incorrect... because you could observe the intermediate value of x->a = 5.

The compiler is not really "detecting that your program has undefined behavior". Instead, it is assuming that undefined behavior doesn't happen, and optimizes your program using that assumption. It's unclear what "detecting undefined behavior" would mean here... what kind of diagnostic the compiler should emit, or how to suppress the diagnostic. It's clear that this simple code is pervasive, and the preferred approach for dealing with it is somewhat outside the scope of a normal C compiler... something like static analysis, runtime instrumentation, or some kind of formal methods.


> However, this optimization is valid because writing to a null pointer is undefined... if writing to a null pointer trapped (SIGFAULT), which is what it actually does at runtime, then the optimization would be incorrect... because you could observe the intermediate value of x->a = 5.

I'm not sure this is a good example, because there's no way for you to read them from the same thread (and reading from another thread would be a race). Reading them from a signal handler would yield unspecified values (unless you're using lock-free atomic types, in which case we're probably not worried about optimizing these assignments).

> When the processing of the abstract machine is interrupted by receipt of a signal, the values of objects that are neither lock-free atomic objects nor of type volatile sig_atomic_t are unspecified, as is the state of the floating-point environment.


That's a good point, however I think it's not hard to come up with a different example that shows how compilers assume that pointers aren't NULL. Like,

    void function(struct x *ptr) {
        ptr->x = 1;
        if (ptr == NULL) {
            some_big_chunk_of_code();
        }
    }
The idea is that this could be the result of inlining, where the inlined code does a null check, but in the context it's being inlined into, we already know ptr is not null.


The compiler should help the programmer optimize instead.

"Warning: dead store detected (file):line clobbered by (file):line. Use keyword volatile to always store or remove (file):line."

This way the optimization is promoted from the resulting binary to the source code, and bugs / typos / etc can be corrected.


Optimizations often deal with inlined code. The code in the inlined function might be fine in the general sense, but once taken in a particular context, there's dead or redundant code because the compiler can deduce values from context. It might even all fold into a constant.

Telling the developers to optimize these would be equivalent to telling them to start inlining their code (or creating variants of every function that might be inlined, to optimize it for the specific context where it will be inlined).


That's a completely unworkable solution.

The above code isn't necessarily code as the programmer wrote it, but the code after it appears after macroexpansion and various optimization passes. If you were writing C++, the code may be templated.


You would get literally millions of such warnings on your average C codebase.


So basically you are saying that in the average C codebase there are millions of landmines waiting to silently fail on you. This is not a ringing endorsement of the language it's more like a ringing indictment instead.


I certainly do not think that each dead store which has been eliminated is a landmine, this is such an extreme position that it isn't even worth considering. In the very rare cases were I needed a store to actually happen I will use inline asm or some form of volatile.


Which coming back to the main subject isn't ISO C, rather a compiler extension.


Compilers should have a -do-what-i-mean-not-what-i-say flags. The required mind reading is implementation defined.


This is a pretty dismissive response to something that's a real problem.

Sure, the "Why are you deleting my checks for if *this is null" is a little silly - but there are definitely sharp edges where UB conflicts with actually useful things you might want to do. Did you know seqlocks are undefined (benign race conditions)? Ever ran into padding concerns playing poorly with trying to do atomic CAS?

It's not unreasonable for the standard to say 'padding is undefined', 'data-races are undefined' - but having no way to say "hey, trust me, please un-poison this thing you don't like" is pretty unfortunate.


> Did you know seqlocks are undefined (benign race conditions)?

It wouldn't be undefined behavior if you used atomic variables--data races involving atomics aren't undefined behavior.


That is true in theory, but in practice it's not feasible. Even assuming you restrict to trivially copyable types, you either have to have a purely-atomic duplicate of anything that you want to store, or play games with trying to get a fast atomic memcpy.

Unless things have changed a lot in the past two years neither LLVM or GCC do that much optimisation around atomics so this comes with disastrous performance implications as well as overhead battling the standard.


Yes, a fast and safe seqlock is still elusive. We will hopefully get the misleadingly named std::atomic_memccpy at some point in the future.


I know this is not relevant to your point... but padding in C is not "undefined". The amount of padding is implementation-defined, but it can only occur in specific places. The value of padding bits is unspecified. "Unspecified" in the standard means that it can be various different things, but it has to actually be one of those things. As opposed to undefined behavior, which is more of a "poison value" that trims paths from your CFG.


Ah yes, that's correct - I believe that reading and comparing the padding bytes results in undefined behavior - the bytes might not actually be initialized depending on the object class. I also don't know if you can actually use the result of comparing an unspecified/implementation defined/indeterminate value - that's where poison is generated.

In practice (UB aside), this is basically fine in the context of a read, compute, CAS loop. Those bytes do have some value in the machine and if that memory isn't written they won't mysteriously change. It's playing games with the optimisers and UB however. You might be able to get around this by first initialising the bytes to zero, then in-place copy constructing whatever you want? I wouldn't bet anything serious on that being defined though.


The poison is only generated with indeterminate values in objects with automatic storage duration (local variables on the stack). Padding bytes in other circumstances will have specific, valid values and can be compared. (An "indeterminate value" is an unspecified value or a trap representation, unsigned char does not permit trap representations, and unspecified values are "valid values".) Use of other unspecified values results in "unspecified behavior" which does not result in trimming the CFG.

I think the spec is written this way to codify how compilers are allowed to do liveness analysis... a local variable is marked as live when it is initialized, and if a variable is not live, its storage (registers or stack) may be used for other variables or temporary values. You then get UB because, for example, two comparisons may result in contradictory outcomes, because the value has been overwritten with garbage between the two comparisons. Or in a more modern compiler, you read from an uninitialized local variable, and the compiler trims the CFG.

Here's an example of what I'm thinking of:

    int x, y;
    // point A
    if (condition) {
        x = 5;
    } else {
        y = 10;
    }
    // point B
A compiler can realize that x and y can't both be live at the same time, and assign them to the same register or stack location. However, this means that if you read the value of x at point A and point B, but x is uninitialized, it will naturally have a different value each time you use it... because the value of Y is overwriting it!


UB is only UB when it comes to ISO C++ guarantees. The implementations can always make their own guarantees above and beyond that; they don't have to restrict themselves to those parts that are explicitly "unspecified" or "implementation-defined" in the Standard.


Nothing in this article convinces me the problem is contained to optimized C. The strict type aliasing is a general problem, right?


In practice GCC won't enable any TBAA optimizations at -O0. There is always -fno-strict-aliasing anyway.


It seems to me a flaw in a language and in a compiler if the average programmer has to avoid higher levels of optimizating because it cannot be predicted what they do.


What if the only languages/compilers without that "flaw" had similar performance to lower optimization levels?

(I don't know if that's the case, but it's what I thought the GP was implying.)


There is a saying that you can make any program run fast if you don't care about correctness.

We have reached the point where C programmers cannot understand the language/compiler anymore.

Given that this has been going on for a long time, my hope is that Rust will be the next systems programming language.


Rust code in unsafe blocks has lots of similar issues with fuzziness around aliasing rules:

https://github.com/rust-lang/unsafe-code-guidelines/blob/mas...


Do you think people understand Rust?


The safe subset of Rust does not permit any UB. This greatly simplifies understanding of UB in the majority of the codebase — because there isn't any. Only when there's an `unsafe` keyword in Rust, you put your "C" hat on.

(it was and continues to be a challenge to uphold this guarantee, because UB in LLVM is so ubiquitous, e.g. Rust had to patch up float to int conversions and eliminate no-op loops)


Rust can be hard to write, but by and large is not hard to read.

Rust does have more abstraction mechanisms than C. So it is possible to go overboard in that area. But that is more a coding style issue. C programs that nest multiple levels of functions pointers can also be hard to understand.


It's easier to understand production Rust than production C


It doesn't seem like anyone is taking the position here that C is flawless. Rather, the question is, given both its flaws and its ubitquity, is it acceptable to trade off developer head-wall interactions for the possibility of high performance.

The only winning move is not to play. (with C)


I don't see it that way. Isn't that precisely the point of higher levels? Don't go there if you don't know what you're doing. Since we can't magically install knowledge into average programmer's heads, the only other choice would be to have those higher levels not exist because the average programmer will never understand them, and also the compiler can't mind-read the average programmer to understand that what they want isn't exactly what they said they want.


I think the general problem is that if so much is going to hinge on this, it should be much easier to tell whether or not you know what you're doing. Undefined behavior really wouldn't be a problem if you could spot it reliably (possibly using tools, like the compiler toolchain itself).

This loose mapping between four optimization levels and a variety of obscure language situations is invisible and situated only in a handful of top engineers' memories. That's not a great way to run a language if you actually care about the correctness of programs written in that language.


> Isn't that precisely the point of higher levels?

I used to think that higher levels are for giving the compiler more time to think so it can come up with a better result (and for somewhat specifying the desired space-performance tradeoff). In my mind the "don't do it if you don't know what you are doing"-things were seperate flags, like -ffast-math.

Obviously I eventually learned that bad things can happen, but "just build prod with -O2" still seems to be what almost everyone does.


-O3 mostly is “try harder” but in practice it is more about using more experimental/dangerous techniques than thinking harder. That’s because, basically, compiler optimizations aren’t as important as people think and there aren’t amazing insights you can come to by thinking harder.

If there were, it’d be better for the compiler to tell the programmer about them rather than rediscover them every compilation.


I always thought the main point of the levels was simply compilation speed. That and debuggability.


For the most part yes. O3 also enables (or used to) some optimizations that are not always profitable.


Nope.

That optimisations must not change the visible behavior (except for running time) of a program was the cardinal rule of optimisation.

Then it got flushed down the toilet. (With the trick of retroactively redefining the behavior to be undefined in the first place and thus all bets being off).


Optimizations aren't supposed to change the meaning of your code. And specifically for C, unsafe optimizations are supposed to only apply with -O3. Level 2 is supposed to be safe.


This is definitely untrue. The difference between -O1, -O2, -O3, and -Os is as follows:

- O1 includes optimizations that only require a small amount of additional compilation time. (Fast compilation, code is much faster than -O0.)

- O2 includes O1, and additional optimizations that use a medium amount of compilation time. (Medium compilation, reasonably fast code.)

- O3 includes O2, and additional optimizations that take more compilation time. (Slower compilation, faster, larger code. Unrolled loops and the like.) It often makes sense to compile hot inner loops with O3 because the compiler can do some fairly aggressive vectorization, if you know what you are doing... and the code remains correct in either case, if it was correct in the first place. It's basically "free" vectorization, when it works.

- Os includes O2, except flags that increase code size. (Medium compilation, code is slightly slower and smaller than O2.)

The only optimization level that actually changes the meaning of correct code is -Ofast, which is rarely used. It's described in the manual with the phrase "Disregard strict standards compliance".

ALL levels of optimization will change the meaning of your code if your code relies on undefined behavior. For example,

https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

> -faggressive-loop-optimizations

> This option tells the loop optimizer to use language constraints to derive bounds for the number of iterations of a loop. This assumes that loop code does not invoke undefined behavior by for example causing signed integer overflows or out-of-bound array accesses. The bounds for the number of iterations of a loop are used to guide loop unrolling and peeling and loop exit test optimizations. This option is enabled by default.

It's always enabled.


> And specifically for C, unsafe optimizations are supposed to only apply with -O3. Level 2 is supposed to be safe.

Citation needed? AIUI optimization levels are entirely up to the compiler. GCC's man page says nothing about the safety of these levels.

> Optimizations aren't supposed to change the meaning of your code.

And they generally won't. The trouble is when the meaning of your code wasn't well defined to begin with (or you thought it meant something it didn't).


> GCC's man page says nothing about the safety of these levels.

Well, it used to.


Imagine you and the compiler are parties to a contract. Your obligation is to not let UB happen. The compiler's obligation is to not change the meaning of your code. Insisting that the compiler should uphold its end of the deal when you don't uphold yours is like saying if you sign a contract to buy widgets, you should have to pay even if the vendor fails to deliver them.


This is the best expression of the idea I have heard!

"Do what I mean!" "Well, if your code does UB then it doesn't mean anything"

In fact, the behaviour of code is only limited to defined behaviours... if you stray... then it's undefined.


> Your obligation is to not let UB happen.

Sorry, no can do. If that's the contract than C is a completely useless language.


> Sorry, no can do. If that's the contract than C is a completely useless language.

That is the contract. Your inference may be correct, but it does no good to insist that C should behave differently than it does. There are better options.


Nope, that's not the contract.


I have always been told that, if a program contains UB, then the computer is justified in doing anything at all. This would naturally imply I am required to avoid any UB.

Do you disagree with this?


> I have always been told that, if a program contains UB, then the computer is justified in doing anything at all

You have been told wrong.

That's the point.

1. The C standard actually contains language that says what actions are permitted. These do not include "anything at all".

Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

This was initially a binding part of the standard. Later versions of the standard still include this language, but made it non-binding. Which certain people interpreted as "the standard now says the exact opposite of what is literally in the text of the standard".

2. The C standard is and always was a compromise document, not a complete definition, in order to make it possible for the standard to be applicable to a wide variety of compilers on a very wide variety of different kinds of machines. It is thus by necessity and intention incomplete. Hence IDB and UB. That is, being compliant with the C standard is not sufficient to be useful implementation of C, something that is (or at least was) stated explicitly in the standard. This language might have been purged from newer versions of the standard, but it was there originally.


Why do you think "unpredictable results" doesn't mean "anything at all", especially when the sentence before that one, that you conveniently didn't quote, says UB is behavior "for which the Standard imposes no requirement"? And that's what it's said all the way since C89.


> Why do you think "unpredictable results" doesn't mean "anything at all"

Why do you think it does??

If I roll dice, the exact number that will come up is unpredictable.

However, the result will not be pink elephants parachuting through the roof.

Once again, the standard then goes on to clarify what it means by "unpredictable":

Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

If that isn't clear enough, there is also the rationale:

http://port70.net/~nsz/c/c89/rationale/a.html#1-5

"The terms unspecified behavior, undefined behavior, and implementation-defined behavior are used to categorize the result of writing programs whose properties the Standard does not, or cannot, completely describe."

Hmm...how can that be?

Let's dig further:

"Undefined behavior gives the implementor license not to catch certain program errors that are difficult to diagnose. It also identifies areas of possible conforming language extension: the implementor may augment the language by providing a definition of the officially undefined behavior. "

Why doesn't this say "UB gives the implementor license to do anything at all"? Maybe because that wasn't the intention behind or purpose of UB?

Let's dig further to "Compliance"

"The three-fold definition of compliance is used to broaden the population of conforming programs and distinguish between conforming programs using a single implementation and portable conforming programs. "

Hmm...according to the "standard purists", there is only one level of conformance: full conformance to the standard, everything else means the compiler can do anything it wants. Strangely, the standard itself disagrees.

Let's look some more:

"Existing code is important, existing implementations are not. A large body of C code exists of considerable commercial value. Every attempt has been made to ensure that the bulk of this code will be acceptable to any implementation conforming to the Standard. The Committee did not want to force most programmers to modify their C programs just to have them accepted by a conforming translator. "

Again, the exact opposite of what the "standard extremists" claim.

There are those who poo-pooh the idea that C is a "portable assembler". Here is what the rationale document says:

"C code can be non-portable. Although it strove to give programmers the opportunity to write truly portable programs, the Committee did not want to force programmers into writing portably, to preclude the use of C as a ``high-level assembler'': the ability to write machine-specific code is one of the strengths of C. It is this principle which largely motivates drawing the distinction between strictly conforming program and conforming program"

So the C standards committee explicitly sought C to be usable as a "high-level assembler". Who would have thunk? Apart from everybody who actually knows about the spirit of C. What is nonsense is this "spirit"? Well the rational says this:

"The Committee kept as a major goal to preserve the traditional spirit of C. There are many facets of the spirit of C, but the essence is a community sentiment of the underlying principles upon which the C language is based. Some of the facets of the spirit of C can be summarized in phrases like

- Trust the programmer.

- Don't prevent the programmer from doing what needs to be done.

- Keep the language small and simple.

- Provide only one way to do an operation.

- Make it fast, even if it is not guaranteed to be portable.

The last proverb needs a little explanation. The potential for efficient code generation is one of the most important strengths of C. To help ensure that no code explosion occurs for what appears to be a very simple operation, many operations are defined to be how the target machine's hardware does it rather than by a general abstract rule. "

Hmm...


Again, UB is defined as behavior "for which the Standard imposes no requirement". No requirement. If your interpretation of the other parts of the standard were correct and there were requirements for what UB could be, then the standard would be self-contradictory.


There's no contract. Nobody ever shared a piece of paper saying "your software must not use UB".

The closest thing to a "contract" a language has is training material and reference books. And guess what, nearly all of C material mostly ignore UB, because enumerating it all is completely impractical even for reference books.

Most of the C software in use on the wild was created before the hyper-optimization trend of C compilers, when your program did mostly what you said, and if what you said is undefined, it would still behave in some way that is close to a literal reading of the code. For most of the C software that exists, the modern treatment of UB is a huge post-facto unilateral contract change.


The C specification is the contract between you and the compiler.


Yep. The creators of the standard spoke to this:

"The three-fold definition of compliance is used to broaden the population of conforming programs and distinguish between conforming programs using a single implementation and portable conforming programs.

A strictly conforming program is another term for a maximally portable program. The goal is to give the programmer a fighting chance to make powerful C programs that are also highly portable, without demeaning perfectly useful C programs that happen not to be portable. Thus the adverb strictly.

By defining conforming implementations in terms of the programs they accept, the Standard leaves open the door for a broad class of extensions as part of a conforming implementation. By defining both conforming hosted and conforming freestanding implementations, the Standard recognizes the use of C to write such programs as operating systems and ROM-based applications, as well as more conventional hosted applications. Beyond this two-level scheme, no additional subsetting is defined for C, since the Committee felt strongly that too many levels dilutes the effectiveness of a standard.

Conforming program is thus the most tolerant of all categories, since only one conforming implementation need accept a program to rule it conforming. The primary limitation on this license is §2.1.1.3. "

It turns out it's not an all-or-nothing thing. Not being "strictly conforming" is not a "breach of the contract" by the developer, and thus license for the compiler to do anything they want.

See also:

"Existing code is important, existing implementations are not. A large body of C code exists of considerable commercial value. Every attempt has been made to ensure that the bulk of this code will be acceptable to any implementation conforming to the Standard. The Committee did not want to force most programmers to modify their C programs just to have them accepted by a conforming translator. "

http://port70.net/~nsz/c/c89/rationale/a.html


It's not a contract if you can't expect one of the parties to have read it.

Compiler developers do not need a contract with the users. No compiler team works with that impression. No compiler user expects one either.

But if you want to use it to justify some impractical behavior, well, both your justification is a complete fabrication, and justifying it is completely meaningless. It doesn't become practical because you have some sociological theory.


You seem to be using the legal definition of 'contract'.

We're using the computer science definition, which is simply "assuming it's correctly implemented, if you feed this program Foo, it'll do Bar".


You haven’t read the C standard? It’s not that long and fairly understandable.

Plus you can go around telling people true things like “malloc() can’t be implemented in standards-compliant C” and they’ll get mad at you.


Which compiler team doesn't take the C standard seriously?

And the C standard is by definition a contract between the implementation and its users.


If you actually read the standard, you will find out that this turns out not to be the case.

The C standard is (or maybe was, quite possible that the compiler mafia removed that bit as well) very clear that standards compliance is not sufficient for a compiler to be "fit for purpose".


So the contract includes a "this contract does not guarantee that the product is fit for purpose" clause. So what? Many contracts do that; i.e. it's a fairly typical trait of contracts.


But they really meant it.

See also the different conformance levels and the rationale given for them.

http://port70.net/~nsz/c/c89/rationale/a.html


I have read the standard, and I don't believe your comment to be true. Can you point out the exact section that backs up what you're saying? (And if you say the "compiler mafia" did remove it, then can you point to an older version that says it?)


Alas, I can no longer find it. :-(

However, the rationale is quite clear on most of this.

http://port70.net/~nsz/c/c89/rationale/a.html


I didn't say it's the contract; I said it's a contract.


That's absolutely the contract compiler implementors are following. If you don't want to be a part of that contract you need to implement your own compiler from the ground up with optimisations based on your interpretation of UB.


I didn't sign a contract though, the compilers just do dumb things and I am at their mercy. If I were negotiating a contract then I wouldn't allow all the cases of UB.


Writing a C compiler that does a trivial translation to ASM is not very hard, in fact there are already quite a few. I don't understand why people don't use that if that's what they want.


Ironically, it's because the code they generate is too slow.


"I didn't sign a contract though, Linux just does dumb things and I am at its mercy."

The license is still the closest thing you've got to a contract with the makers of Linux.

Same with a language specification.


I recall reading recently that gcc moved to making overflow undefined at -O0


Is it even possible to have zero undefined behavior in languages that allow user-defined pointers? It seems like allowing even just one degree of memory indirection creates a singularity beyond which any kind of formal guarantees become impossible. Seems like you'd have to allow only structures which hide memory implementation details if you truly want to avoid all UB. Same goes for any arithmetic which could overflow.

That would require kernel devs to radically rethink how they interact with I/O, which would probably require specific architectures.

In other words, writing a kernel portable on any of the existing ISAs that is also performant is basically impossible, barring some humongous breakthrough in compiler technology.

Seems to me that when it comes to brass tacks, UB is kind of the "we are all adults here" engineering tradeoff that enables shipping fast and useful software, but is technically not strictly defined and thus usually does what you want, but can result in bugs.


The original C committee wrote, as part of its guiding principle to “Keep the spirit of C”, that “To help ensure that no code explosion occurs for what appears to be a very simple operation, many operations are defined to be how the target machine’s hardware does it rather than by a general abstract rule.”¹ That is, if you write `a = b + c` you expect the compiler to generate an `add a, b, c` instruction, and if that happens to trap and burn your house down, well, that's not C's problem.

I'm convinced that the original UB rule was intended to capture this, and the wording was an error later seized by compiler developers. As evidence, consider Dennis Ritchie's rejection of `noalias` as “a license for the compiler to undertake aggressive optimizations that are completely legal by the committee's rules, but make hash of apparently safe programs”². If anyone at the time had realized that this is what the definition of UB implied, it would have been called out and rejected as well.

¹ https://www.lysator.liu.se/c/rat/a.html#1-1

² https://www.lysator.liu.se/c/dmr-on-noalias.html


> if you write `a = b + c` you expect the compiler to generate an `add a, b, c`

this could be violated even by simple optimizing compilers that do constant propagation or strength reductions.

Again, if you do not want optimizations -O0 is always available.


The statement, in context, is a simple illustration of the principle that C operator semantics were defined by the target hardware. Not a treatise on compiler construction.


It would be nice if optimizing compilers' -O0 weren't completely insane on the assumption that the insanity would get optimized out.


Hmmm...

"To help ensure that no code explosion occurs for what appears to be a very simple operation"

How does constant propagation cause code explosion?


Nowhere does anyone guarantee that you won't be hit with nasal demons if you use -O0


Well, it is hard to guarantee anything about a bit of code with no defined semantics, you'll have to settle for what O0 gives you.


I think expecting the compiler to generate code that makes 'a' hold the sum of 'b' and 'c', assuming there is code that "observes" the value in some way, is a more useful model.


> Is it even possible to have zero undefined behavior in languages that allow user-defined pointers?

It kinda boils down to what exactly you mean by defined behavior. A C programmer's take might be that you can run a conforming program in an emulated abstract machine and get defined results out of it. And then you can run the same thing on real hardware and expect to get the same result (modulo implementation defined behavior). This definition leaves some things out (e.g. performance, observable effects in the "real world") but it captures the computational semantics.

Another programmer's take might be more akin to a portable assembler. In that case, you certainly could define reads and writes for arbitrary pointers, in the sense that they must cause corresponding (attempted) loads and stores at the machine level. However, the definition wouldn't be complete since it inevitably leaves much to the underlying implementation. Thus you could have "defined" C programs that show completely different behaviors depending on which implementation and hardware you used. It would be impossible to say what the program's output must be "in the abstract." For someone who just wants to output assembly, maybe that's fine. I'm not sure other people would be too satisfied with it. An out of bounds write could still blow up your program and be remotely exploitable; practically the same thing as undefined behavior, except that now your compiler is also barred from optimizing.

There's quite a bit of tension between these two camps.

Alternatively, you could fully define it at a great runtime cost and potential exclusion of real hardware implementations.


Undefined behavior means something specific in the standard, it's not just an operation that might do different things on different compilers and machines. It means that if it happens, the program is allowed to do anything and everything, even before the UB is reached. Undefined behavior is breaking an assumption that the compiler is allowed to make.

It is probably impossible to make a low-level language with no implementation-defined behavior, but it is certainly possible to make one with no undefined behavior. For example, you can put in your spec that overflowing an unsigned integer can give any value; that is different that putting in your spec that it doesn't happen and if you write it the variable might have no value, multiple values, or burn your socks off.

https://en.wikipedia.org/wiki/Undefined_behavior


It's actually impossible as far as I can see as long as you have unsafe memory access via pointers and memory is used to hold information about control flow and local variables, because overwriting memory can then do weird and wonderful things to the state of the program. E.g. local variables can magically change values, but worse, the control flow of your program can be hijacked in pretty arbitrary ways.

It would be great if this wasn't possible, because then you wouldn't have a whole class of security vulnerabilities.

On older systems without memory protection, the situation is even worse - you could scribble all over the OS memory as well and who knows what happens then.


This happens in C because compilers are allowed a wide range of optimisations. I'm just saying it's probably possible to build a low-level language without UB. Of course no one would use it because optimizing compilers are desirable and skipping optimizations just so broken code is handled to spec won't happen.


It's been attempted. Intel tried doing this with the Intel 432, but it never got beyond the prototype stage as it was very slow, even by the standards of the day (1982).


If by user-defined pointers you mean arbitrary integer-to-pointer casts, then this is a kryptonite for static analysis, and I don't think you can have a language that is both fast and fully predictable (UB-free) in their presence. It breaks pointer provenance and aliasing analysis, and existing compilers already struggle with such casts in C.

But apart from that, you can have pointers, with many levels of indirection, as long as there are rules that prevent use-after-free, unsynchronized concurrent access, and other UB-worthy problems. Rust's borrow checker with rules for no mutable aliasing and Send/Sync markers for concurrent access comes close, but it has to give up on generality for safety (e.g. it can't reason about circular data structures).


> Is it even possible to have zero undefined behavior in languages that allow user-defined pointers? It seems like allowing even just one degree of memory indirection creates a singularity beyond which any kind of formal guarantees become impossible.

With untyped pointers, yes. But it seems to me that if you have strong typing for function pointers you could mostly avoid that.

> UB is kind of the "we are all adults here" engineering tradeoff that enables shipping fast and useful software, but is technically not strictly defined

Well, no, of course it isn't -- the clue is probably in the first half of the name, "Undefined Behaviour"... ;-)


Even kernels only interact with memory in reasonably predictable ways. I think they could all be hidden behind such abstractions, BUT it will make the language a lot more complex.


Without concurrency you don't have to have UB to have user-defined pointers. x86 assembly language has no UB and has user-defined pointers.


CPUs can and do have UB in the instruction specifications. x86 definitely does.


In ring zero there are some, but no instructions that userspace can use (or else security would be impossible)


Indeed, for example the BSR instruction:

> If the content source operand is 0, the content of the destination operand is undefined.

[0] https://www.felixcloutier.com/x86/bsr


No, this is not the same thing as "undefined behavior" in the sense the ISO C spec defines it. The content of the output being unspecified would not, for example, allow the processor to write to some part of memory if the source is zero, or change unrelated registers. x86 is doing "undefined" the right way, unlike the ISO C spec.


It's exactly the same as "undefined behavior" in C. The only difference is in the consequences that follow from it when optimizers get ahold of it.

If it's undefined behavior if BSR is given 0, then therefore the compiler can assume the value passed to BSR isn't 0 (because it's defined that a well-formed program doesn't encounter undefined behavior). Therefore a check if the value is == 0 can be skipped, because it can't be zero since BSR didn't allow it. And since the code inside the if isn't reachable, that probably means now other things are no longer reachable and can similarly be removed. And etc...

That's all the "undefined behavior allows the compiler to murder my cat!" really is - it's just the chain of consequences that result from the compiler following the rules it was told. It was given a rule that a parameter couldn't be null, and so it listened & did what it was told including deleting redundant null checks (since after all, the value was already specified to be not null!).

To demonstrate the problem in a different language, let's say I have this code:

    float asFloat(@Nullable Integer i) {
        return i == null ? Float.NAN : i.floatValue();
    }
You'd expect the null check, right? But let's say I call it from this function:

    void sayHi(@NonNull Integer i) {
        println("Hello " + asFloat(i));
    }
The optimizer comes along and inlines the two together and now sees:

    void sayHi(@NonNull Integer i) {
        float temp = i == null ? Float.NAN : i.floatValue();
        println("Hello " + temp);
    }
Does it still need to keep the null check? After all, I defined that 'i' isn't null, so surely the null check is just unreachable code & can be eliminated, right? If it doesn't, it's obviously wasting performance. If it does, the internet gets angry about the compiler "abusing undefined behavior." Even though as far as the compiler is concerned it isn't undefined behavior! 'i' is defined to be not-null!


What "compiler"? We're talking about the x86 instruction specification. The microcode translator certainly isn't allowed make the sort of optimization you're describing, that would be a violation of the spec that GP linked.


> No, this is not the same thing as "undefined behavior" in the sense the ISO C spec defines it.

Well, it's not UB the way optimiser-extremists (mis-)interpret the ISO C standard.


What would be, for example, the expected behaviour of accessing an out of scope variable?


You will get whatever bits have been written at that location


What location? The variable might only have been allocated in a register.


There's no register allocator in X86 assembly


This was discussed somewhat recently: https://news.ycombinator.com/item?id=28779036


Wow, this was an eye-opening read on my trust with ISO C.

It makes much more understandable why linux codebase is riddled with compiler extensions, ISO C is simply not reliable anymore.

The issue is bigger than what trembles on the surface, just like Dennis Ritchie said, it is a timebomb, soon enough these nuances will burst into a big issue in linux kernel, or worse yet, some essential system like avionics.


I think undefined behavior (as a general concept) gets an unfair share of the blame here. It's notable that almost all criticism of undefined behavior in C tends to focus on just two sources of UB: signed integer overflow and strict aliasing; other sources of UB just don't generate anywhere near the same vitriol [1]. Furthermore, it's notable that people don't complain about UB in Rust... which arguably has a worse issue with UB in that a) there's not even a proper documentation of what is UB in Rust, and b) the requirement that &mut x be the sole reference to x (and therefore is UB if it is not) is far more stringent than anything in C (bar maybe restrict), and I'm sure that most Rust programmers, especially newbies starting out with unsafe, don't realize that that's actually a requirement.

There is a necessity for some form of UB in a C-like language, and that has to deal with pointer provenance. You see, in C, everything lives in memory, but on real hardware, you want as much to live in a register as possible. So a compiler needs to be able to have reasonable guarantees that, say, any value whose address is never taken can never be accessed with a pointer, and so can be promoted to a register. As a corollary, this requires that things like out-of-bound memory accesses, or worse, converting integers to pointers (implicating pointer provenance here) need to have UB in at least some cases, since these could in principle "accidentally" compute an address which is the same as a memory location whose address was never taken.

That suggests that the problem isn't UB per se. If we look at the two canonical examples, arithmetic overflow and strict aliasing, we can see that one of the features of these things is that they have a pretty obvious well-defined semantics [2] that can be given for them, and furthermore, there's no way to access these well-defined semantics even avoiding this feature altogether. And I think it's the lack of this ability to work around UB that is the real issue with C, not UB itself.

[1] For example, it is UB to pass in rand as the comparison function to qsort. I'm sure many people will not realize that before I wrote this, and even parsing the C specification to find out that this is UB is not trivial. For an interesting challenge, try giving a definition of what the behavior should be were it not UB--and no, you can't just say it's impl-defined, since that still requires you to document what the behavior is.

[2] I will point out that, for arithmetic overflow, this semantics is usually wrong. There are very few times where you want <large positive number> + <large positive number> = <negative number>, and so you're mostly just swapping out an unpredictably wrong program for a predictably wrong program, which isn't really any better. However, the most common time you do want the wrapping semantics is when you want to check if the overflow happened, and this is where C's lack of any overflow-checked arithmetic option is really, really painful.


I fully agree with all you said.

Some comments: - we have proposal for provenance rules (N2676), which would clarify the rules and make sure that pointer-integer roundtrips work (now broken on some compilers due to an "interesting" interpretation of provenance) Pointer-to-integer round trips could then be used to work around provenance-based optimizations if you need to (and fix existing code with those casts).

- for signed overflow, you can get run-time traps with the right compiler flags and this is IMHO far better than defined wrap-around for unsigned.

- C23 will get checked integer functions based on the GCC builtins

- you can get (relatively cheap) run-time bounds checking by using VLAs (some people think VLAs are generally bad, but pointers to VLAs are a bounded pointer type!)

There is of course much left to be done. But I like to stress that UB gives the compilers room to do what it wants. But this does not have to be optimization. If programmers do not like what they get, they need to complain to their vendors! The standard has to follow existing practice, so this has to be driven by the compiler vendors based on the feedback of their customers. So please, if you want to see changes, tell this to your compiler vendor and do not accept "we do this because the standard allows this" as an answer.


For [1] I would say that when the compiler does not control the standard library (e.g. most linux systems) then the compiler should treat qsort like any normal non-libc function call, it shouldn't be able to detect that I might be relying on implementation-specific guarantees. From a libc perspective, I would probably have a safe version of qsort used either on request or when NDEBUG is undefined, in which the implementation must either detect the inconsistency and abort in an implementation-defined matter, or select an ordering such that a<b in the ordering implies the existence of a finite chain a<x_1<x_2<...<x_n<b of comparison results in which each x_i is the equivalence class where the compare function said two entries were equal. I think that should be consistent with most sorting algorithms. (Whether the normal qsort should be the same as the safe version would only be needed if there were a significant-enough performance improvement.)

For [2] I disagree for two reasons. For one, merely having an answer at all, rather than entering an undefined behavior is valuable; having x+y=pseudorand(x,y) would still be better than complete UB because the damage that can be done is limited. The other reason is that addition/subtraction/multiplication does actually work modulo 2^n, so if I write e.g. (a+b-c) where b can be large but b-c is known to be small, it works even if the language is technically left-associative and having the a+b first is UB.

I pretty much agree with your point that the main problem is there is no way to work around UB; if there were a workaround there would be much less reason to complain. I would go a bit further and say such a workaround needs to be in the code itself to be effective, not merely a command line arg like fwrapv, so that it gets kept around when doing header inlining, LTO, copy-pasting code to other projects, etc.


> For [1] I would say that when the compiler does not control the standard library (e.g. most linux systems) then the compiler should treat qsort like any normal non-libc function call, it shouldn't be able to detect that I might be relying on implementation-specific guarantees.

I am guessing, for example, that you are not aware that every compiler will optimize `printf("Hello, world!\n");` to `puts("Hello, world!\n");`. memcpy and memset are even more heavily-optimized function calls by the compiler: for example, they don't prevent memory-to-register promotion.


I am aware of these things, the rant by Ulrich Drepper on this breaking glibc guarantees [0] explicitly what I had in mind when writing that.

[0] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=25609


> I pretty much agree with your point that the main problem is there is no way to work around UB; if there were a workaround there would be much less reason to complain.

There is a canonical workaround - cast operands to unsigned integer, do the operation, cast it back to signed integer. Overflow during casting to signed is implementation-defined behavior instead of UB.


> [2] I will point out that, for arithmetic overflow, this semantics is usually wrong. There are very few times where you want <large positive number> + <large positive number> = <negative number>, and so you're mostly just swapping out an unpredictably wrong program for a predictably wrong program, which isn't really any better. However, the most common time you do want the wrapping semantics is when you want to check if the overflow happened, and this is where C's lack of any overflow-checked arithmetic option is really, really painful.

Being _predictably_ wrong is actually much better than being _unpredictably_ wrong, in that case, it's not UB(unpredictably) anymore is it?

If the semantics are exactly what you saying you could just convert the predictable wrong value to a correct one and you have overflow checking.


I think the undefined behavior in C and C++ are even less defensible today than when they started because of the convergence of architectures.

Pretty much every non-legacy architecture does IEEE floating point. Pretty much all of them do a flat address space. The word size is some power of 2(32 bit, 64 bit, maybe 128 bit in the future). They are almost always little endian. The memory models are converging towards the C++ memory model.

Given that, I think simplifying the language and getting rid of foot guns could be done without losing any significant performance or actual flexibility/portability.


You could also add that they all do 2s complement math.

This is what the OpenBSD team did to OpenSSL. If the code has some complexity that is only necessary because it might have been run on a VAX or AIX or early Cray architecture then it is time to excise that complexity. They deleted thousands and thousands of lines of support for architectures that are only seen in museums and landfills today.


C does require 2's complement math.

However architectures of the future aren't going to have flat pointers (PAC/CHERI etc), and even current architectures have some non-IEEE floats (https://en.wikipedia.org/wiki/Bfloat16_floating-point_format / https://en.wikipedia.org/wiki/Unum_(number_format)).


It could be called 'C--' unless that's been taken.


There is a language C-- created by Microsoft Research (Simon Peyton Jones of Haskel and GHC fame)

https://www.microsoft.com/en-us/research/wp-content/uploads/...


ISO C is mostly concerned about making sure that stuff is portable; operating systems on the other hand are intrinsically platform-specific to a degree. So it is not really surprising that pure ISO C is not enough for OS development


You can still be portable between compilers; Linux builds with GCC and Clang, and used to build with tcc (although that probably required patches)


Linux builds on clang after a decade of dedicated effort to make it happen, and that is with clang overall being comparatively similar to gcc (e.g clang implements many gcc extensions): https://github.com/ClangBuiltLinux/linux/wiki/Project-histor...


D got around the overflow-is-UB problem by declaring that 2s-complement arithmetic will be used with wraparound semantics.

Is there any reason for modern C to still support anything else?


Guaranteed 2 complement arithmetic for signed integers was recently added to C++. Wrap around is still UB though because apparently caused regressions in some significant code bases.

Guaranteed order of evaluation of arguments almost made it into the standard, but because of regressions, we didn't quite get the full benefits; for example: i = i++ + 2;

is now fully defined, while this:

   f(++i, ++i);
is no longer UB, but implementation defined.

Ideally for every UB taken away we would get one or more pragmas to get the optimization back like ivdep. The issue is that that doesn't help old code bases.


D also guarantees left to right evaluation order.

> The issue is that that doesn't help old code bases.

D has a flag called -revert=issue where old behaviors will be supported for a time.


Integer overflow is usually logic error, so a reasonable default behavior would be a trap instead of silent overflow (regardless of how signed integrers are stored in memory). Some architectures support that (e.g. MIPS).


The article mentions cases where integer overflow is expected behavior.


C already offers 'unsigned' type variants that offers defined overflow (together with non-negative range).

It would be useful to have another type variant that offers defined overflow (like in unsigned) together with signed range for such cases. But it still makes sense for basic integers to have overflow as UD, as in most cases it is not expected behavior.

Note that in current C, if one needs defined overflow on signed integers, one can cast them to unsigned, to the operation and cast result back to int. That makes it implementation-defined instead of undefined.


> as in most cases it is not expected behavior

Yeah, but the gotcha happens when it is expected behavior.


-fsanitize-undefined-trap-on-error


(first, I am not a D user but I really like what I have seen. I wish WG14 would take more inspiration from D than from C++)

C23 will require 2s-complement.

Signed overflow is still UB in C. I think this is a better choice than wrap around, because overflow is often a bug. With UB, you can use static analysis (to some degree) or run-time traps (if the compiler supports this) and then fix those bugs. If it were defined to wrap around, those bugs are much harder to find.


There are still processors that don't use two's complement, although I'm not sure that should really stop them if they wanted to declare all C implementations must be t-c.


Those processors can use a dialect of C that embraces ones-complement. Mandating such support in the Standard has little practical effect on code, I expect most programs would have to be recoded to work on ones-complement anyway.


It may be unusable for “modern” operating systems (you could write an early UNIX clone with just ISO C because early UNIX didn’t support much).

But that’s basically always been the case. I doubt you could stay within the first ISO C standard and write a modern operating system.


Assuming that an early Unix clone would be written in C plus some amount of assembler. Then what constructs of modern Unix systems make it impossible to in write in the first ISO C standard plus some assembler?

For example, 4.4BSD, early Solaris have just about everything you would expect in a modern operating system. And give the age of those systems, they were written in early versions of the C standard.


(Duplicating an answer to a similar question):

The C standard has notes about a freestanding implementation, mainly for developing kernels. The requirements C has for freestanding environments are very limited ( http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf , you get float.h, iso646.h, limits.h, stdalign.h, stdarg.h, stdbool.h, stddef.h, stdint.h, and stdnoreturn.h) and the starting point for the program is implementation defined; you might get _Atomic, but that’s optional). For the record, C++ goes a little overboard and promises all sorts of things in its freestanding environment.

https://groups.google.com/g/comp.std.c/c/zzyii-DlMiU/m/RnH5i...


Before 1985 there was no C standard, and until 1989 it was only a draft. So I doubt anything was written in standard C in the 1970s or early 1980s.


C89 is mostly a superset of K&R C. So if you can write an OS in K&R you can write it is C89.

The issue was, do modern operating systems require newer versions of the C standard? Or can you write everything in C89.


Inline assembly is not part of C89, so a pure C89 compiler needs to rely on an external Assembler for systems programming for example.

Same applies to using C in hardware that lacked memory mapped IO.


K&R wasn't standardized either though.


This is true for any language. An operating system can be written in Java plus some amount of assembler.


> I doubt you could stay within the first ISO C standard and write a modern operating system.

Do you take that view because of some intrinsic deficiency in the language, or the difficulty of memory safety, or something else?


The C standard has notes about a freestanding implementation, mainly for developing kernels. The requirements C has for freestanding environments are very limited ( http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf , you get float.h, iso646.h, limits.h, stdalign.h, stdarg.h, stdbool.h, stddef.h, stdint.h, and stdnoreturn.h) and the starting point for the program is implementation defined; you might get _Atomic, but that’s optional). For the record, C++ goes a little overboard and promises all sorts of things in its freestanding environment.

https://groups.google.com/g/comp.std.c/c/zzyii-DlMiU/m/RnH5i...


Can you still compile gcc code with -O0 (gcc option to turn optimization off) to get completely defined behavior? When doing so does it actually still turn off all optimizations? Also does -Os (optimization for size) still produce defined behavior?


> Can you still compile gcc code with -O0 (gcc option to turn optimization off) to get completely defined behavior?

No. The standard specifies what's undefined, optimization levels don't change it (though there are compiler flags such as -fwrapv which make undefined things defined).

However, turning off optimizations will make behavior easier to predict.


it used to be passed around never to use -O0 because it was much less used/tested and would generate wrong code more often. Not sure id that was folklore or true.


-O0 is "well tested" because everyone uses it for debug builds. If it produced incorrect results people would be fairly upset (and they are when it does…but it's not particularly common, because incorrect optimizations are generally the problem when codegen is incorrect.)


I still retch when told memcpy and memset cannot take NULL and 0 length.

Try asserting that they're not NULL in glibc and try to boot your machine! Oops... bad compiler people, bad!


Similarly, standard C++ is not usable to implement virtual machines like Hotspot, and yet all virtual machines are implemented using C++ compilers.


Not all, https://www.jikesrvm.org.

Just one example, I can't be bothered to post others.


what's the history of having undefined behavior? why would a language designer not specify?


Because the code should be somewhat portable, and not all processors act the same. This is why left and right shift are weird with overflow. Not consistent between x86 and some other instruction set. Similar for accessing the null address.

Besides that, how would you specify the behavior of reading from an address the user randomly generated himself? What is stored at an address not generated by the compiler is out of scope for the compiler. If you try to define something like 'always trap' or 'always return 0', you incur massive overhead.

Besides that. The fact that signed integer over/underflow is undefined behavior actually unlocks a lot of reasonable optimizations that treat integers like actual integers. Things like (a+1 > a) always being true.


Sanitizers attempt to do define the behaviour of undefined constructs, but expect an integer multiple slowdown compared even with unoptimized builds.

It is very hard to specify the behaviour of, for example, use-after-free, or accessing an automatic variable after it goes out of scope, or data races, in a C-like language without significant runtime cost.


So we can switch to rust now?


Sure; we look forward to your patches (keep in mind that they must preserve compatibility and portability to all currently supported architectures).


Forgot the main flavor, it must be performing, which is the reason UB were "created" in the ISO C in the first place.


I think my girlfriend can figure something out.

She does embedded systems at Anduril using Rust.


Clickbait on arxiv? Et tu, Brute?

No, most operating systems that people are actually use are written in ISO C, so the headline is by definition wrong.


They are written in non-ISO C


From the paper:

"For example, a well-known security issue in the Linux kernel was produced by a compiler incorrectly assuming a pointer null check was unnecessary ([40] fig. 6) and deleting it as an optimization. Or consider this (simplified) patch report for Linux [25]: The test [for termination] in this loop: [...] was getting completely compiled out by my gcc, 7.0.0 20160520. The result was that the loop was going beyond the end of the [...] array and giving me a page fault [...] I strongly suspect it’s because __start_fw and __end_fw are both declared as (separate) arrays, and so gcc concludes that __start_fw can never point to __end_fw. By changing these variables from arrays to pointers, gcc can no longer assume that these are separate arrays."

Both of those sound like simple bugs in the compiler optimizer implementation, in which case they could hardly be use as examples of how the current C standard was bad.


No, they're definitely features. The removal of null pointer checks in gcc is enabled by -fdelete-null-pointer-checks and is often enabled by default.


They aren't bugs. The C standard considers those cases to be undefined.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: