Porting to GCC 14: C language issues

rwmj · 2024-02-19T15:13:53.000000Z

We've done the work for Fedora now. There was a long thread on the subject at the beginning of the effort which is interesting: https://lists.fedoraproject.org/archives/list/devel@lists.fe...

veltas · 2024-02-19T18:05:24.000000Z

As someone who doesn't know a lot about distro work, what exactly did you have to do for Fedora? Fix packages that weren't building anymore with this change?

fweimer · 2024-02-19T19:49:45.000000Z

Packages fail to build after a while all the time. The maintainer fixes them (preferably using an upstream patch), and we move on.

The issue with this set of compiler changes was that a lot of software builds successfully, but the new build suddenly has expected features missing. This happened with the previous attempt:

https://lists.fedoraproject.org/archives/list/devel@lists.fe...

That happened in 2016. In 2019, we undertook another attempt, this time with config.log/config.h diffing, but for various reasons stopped after fixing a few dozen packages, so it never went to the Fedora/upstream proposal stage.

After Xcode and Clang the changed the defaults, it was somewhat easier to convince people that we should get Fedora & upstreams ready for a future GCC change. It also helped that Gentoo has been pushing fixes for Clang compatibility upstream for a while (https://bugs.gentoo.org/408963, with a broader scope in https://bugs.gentoo.org/870412).

cozzyd · 2024-02-19T18:51:53.000000Z

As I understand it, the main problem is autoconf-generated stuff with old versions of autoconf.

pests · 2024-02-19T22:55:31.000000Z

Yep, errors result in autoconf checks failing leading to features being disabled, symbols being removed and more all while the build still succeeds successfully.

rwmj · 2024-02-20T08:35:12.000000Z

This was the danger, but I think Florian did something where he compared autoconf output before and after the change and identified problems that way.

veltas · 2024-02-19T18:27:02.000000Z

This is a long-needed, but gratefully accepted improvement to GCC. I've seen in the wild just how many people see a 'warning' and consider it ignorable, even though many warnings in GCC are certain indication of broken code in pretty much anything new written in the last 35 years.

It's important to bear in mind the main change here is to stop wild C programmers from trying to run or release code that's almost certainly broken, because the diagnostic they got wasn't explicit enough. People can still force the old behaviour if they need to.

Joker_vD · 2024-02-19T15:18:23.000000Z

> It is no longer possible to call a function that has not been declared.

Heh. Surprisingly, I've seen some people claiming that it's impossible for a language without forward declarations to have a one-pass compiler that could handle calls of not-yet-defined functions, yet pre-ANSI C managed to do exactly that, in a pretty obvious (now that you know that it's in fact possible) way.

WalterBright · 2024-02-19T16:38:53.000000Z

In the C compiler I wrote a couple years ago (ImportC), since it was re-using the D semantic routines, forward referencing functions "just works".

In fact, it works so well you don't even notice it working. If the C committee wants to improve C, they should make this an official feature.

P.S. Because of the lack of forward referencing, C code tends to be organized as leaves first, and the entry point at the end. This is simply backwards, the entry point should be at the beginning.

cpeterso · 2024-02-19T20:05:16.000000Z

Are there any compilers our language that parse source files in reverse order? Then you could have a one-pass compiler without forward declarations (though you would need “backward” declarations). :)

Someone shared with me their idea for a parallelized parser, where threads would parse different segments of the source file and then stitch their incomplete ASTs together.

WalterBright · 2024-02-20T01:31:39.000000Z

> Are there any compilers our language that parse source files in reverse order?

I like the cut of your jib.

jcranmer · 2024-02-19T15:25:44.000000Z

But prototype-less functions in C don't actually work all that well. In particular, the inferred return type is going to be an int, so if you're trying to return a 64-bit pointer via a prototype-less function, you're going to have a very bad time.

bewaretheirs · 2024-02-19T16:34:50.000000Z

Historically they worked well enough (on 16-bit and 32-bit machines) because sizeof(pointer) == sizeof(int) == sizeof(general register) on the architectures where C flourished in the pre-ANSI C era.

But with the migration to 64-bit machines, typically int stayed put at 32 bits.

I guess nobody wanted to introduce a new integral type between short and int; they had enough trouble dealing with code which assumed sizeof(long) == 4. I recall stumbling across a comment where the word "beint32_t" appeared where "belong" would have made sense in context..

torstenvl · 2024-02-19T16:56:45.000000Z

I love naïve search and replace errors. In the November 1996 version of the Defense Incident-Based Report System codes definitions in DoD Manual 7730.47, the code 092-C2 refers to "shallful" dereliction of duty.

sgerenser · 2024-02-19T18:50:54.000000Z

Also known as a clbuttic mistake.

Joker_vD · 2024-02-19T17:48:27.000000Z

Why not just make short 32-bits? Yeah, you lose the type for the 16-bit wide integers but x64 doesn't natively support it all too well anyway, unlike the 32- and 64-bit wide integers. And that is what the C integer types are about, right, about being efficiently represented by the underlying hardware, not their exact bitwidth? Right?

oconnor663 · 2024-02-19T18:30:53.000000Z

> And that is what the C integer types are about, right, about being efficiently represented by the underlying hardware, not their exact bitwidth?

If you have one `short` argument to your function maybe. But if you have a `short[]` array, you probably do care about the memory layout of that array. You might need it to be compatible with some particular data format that you're trying to read/write. Same with a field of a struct, if that struct is used for parsing. A lot of C code does parsing like this.

GrumpySloth · 2024-02-19T18:28:19.000000Z

No. It’s for tightly packing data in data structures. Bitwidth is exactly what’s important here.

Joker_vD · 2024-02-19T19:52:49.000000Z

Well, that's a shame because bitwidth of standard integer types is quite uncertain. CHAR_BIT can be (and is, on some platforms) 16 or 32, long was never guaranteed to be 64 bits (it's quite often was 32 bits on platforms with 16 bit ints) etc, not to mention that if that is what the standard integer types were for then they'd probably have names like int8/uint8/int16/int32/etc.

It's almost as if they were not, in fact, intended for precise control of bitwidths in portable manner...

GrumpySloth · 2024-02-19T21:12:34.000000Z

It doesn’t matter what someone in the past thought they were for. That’s what they are for in practice, names are irrelevant here, yes, they are quite bad. But the ones in stdint.h are just typedefs for those, so that’s what we are left with.

Joker_vD · 2024-02-20T02:43:53.000000Z

You can always use "unsigned char[N]" for that, you know, which is more realiable. You can even union it with an integer type for more convenient access, although please use static_assert to verify that the overlap is exact.

All in all, "it's very easy and straightforward to control the size and padding of a struct's fields in portable manner" is yet another C's imaginary advantage: it's not that straightforward or simple. The padding especially has always been a thorny issue.

kevin_thibedeau · 2024-02-19T18:42:47.000000Z

The efficient hardware types are handled by int_fast*_t. The legacy types can't be redefined outside their established ranges because that would break things that depend on them fitting into a known amount of memory.

garaetjjte · 2024-02-19T17:55:55.000000Z

It would still break on struct and floating point type returns though.

Karellen · 2024-02-19T15:54:30.000000Z

Well, there is an argument to be had that, for C, a plain unadorned "int" should probably be the native word size for that architecture. On 64-bit systems, "int" would therefore also be 64 bits.

_kst_ · 2024-02-19T16:23:46.000000Z

I think there are two reasons int has not gone from 32 to 64 bits on 64-bit systems.

Part of it is backward compatibility: code written to assume 32-bit int could break. (Arguably such code is badly written, but breaking it would still be inconvenient.)

Another part is that C has a limited number of predefined integer types; char, short, int, long, long long (plus unsigned variants and signed char). If char is 8 bits and int is 64 bits, then you can't have both a 16-bit and a 32-bit integer type. Extended integer types (introduced in C99) could address this, but I don't know of any compilers that provide them.

bewaretheirs · 2024-02-19T16:37:37.000000Z

If you can have "long long", why not "short short"?

In that alternate universe, char could be 8 bits, short short 16, short 32, and int 64.

cpeterso · 2024-02-19T20:08:27.000000Z

And “long short” and “short long” types. :)

travisgriggs · 2024-02-20T05:19:58.000000Z

For 24 bits?

stkdump · 2024-02-19T17:43:29.000000Z

Since the extended integer types are just aliases to the other types, they wouldn't solve the problem. Also in C++ these aliases create a problem with overload sets when you mix the two worlds and try to produce portable code. For example long on some platforms is 32 bit and 64 bit on others, also platforms use inconsistent fundamental types for 32 and 64 bit aliases. All in all if you want to produce portable code you neither use those extended integer types nor long. You assume char, short, int, long long are 8, 16, 32, 64 bit respectively, which holds on all relevant platforms.

dzaima · 2024-02-19T18:12:49.000000Z

Extended integer types are decidedly not just aliases to other types - the C standard has separate "standard integer types" which are the regular char/short/int/long/long long, and "extended integer types" which are any additional implementation-specific types. stdint.h-defined types can be either of those categories (and on regular clang/gcc they're all standard integer types and not extended ones). So you could have a system with char/short/int/long/long long being 8/16/64/64/64-bit respectively and still be able to provide an int32_t that's none of those; just, noone does.

dataflow · 2024-02-19T19:37:07.000000Z

What really sucks about this in C++ is that it prevents you from knowing whether you can overload based on those types.

myrmidon · 2024-02-19T16:37:20.000000Z

> Extended integer types (introduced in C99) could address this, but I don't know of any compilers that provide them.

What environment are you working in? Because I don't know a single half-recent compiler that does not provide stdint (uint8_t, ..., int64_t), but I mostly work with GCC/LLVM toolchains.

retrac · 2024-02-19T16:54:03.000000Z

Some embedded compilers will provide stdint. And if the compiler doesn't, I've found that one of the first headers written for a project ends up being an equivalent to stdint.

It's pretty common to develop part of an embedded C program under Linux or similar host environment. Better debuggers, better profiling tools, etc. And uint8_t and friends are particularly important when you're working cross-platform.

dzaima · 2024-02-19T17:16:31.000000Z

Extended integer types aren't necessarily related to stdint.h - in the vast majority if not every one of those "half-decent compilers" the stdint.h types are just typedefs over plain old char/short/int/long/long long, which are not extended integer types. Extended integer types is a mechanism to allow having types other than those.

myrmidon · 2024-02-19T18:12:15.000000Z

> in the vast majority if not every one of those "half-decent compilers" the stdint.h types are just typedefs over plain old char/short/int/long/long long, which are not extended integer types.

Sure, but isn't that just an implementation detail? Because I really don't care if my int64_t is internally typedef'd to "long long int" or "__m64", as long as there is a standardized interface to ask for it.

dzaima · 2024-02-19T18:19:42.000000Z

Point being, _kst_'s comment of "I don't know of any compilers that provide them" is correct - there are few if any compilers that actually have actual extended integer types, and thus introducing such in compilers might be non-trivial, and plenty of code may exist under the assumption that they don't exist and thus could break (things like integer promotion rules, _Generic, varargs; and also intmax_t is of mention as it must be at least as wide as any supported standard or extended integer type (which is also why clang/gcc __int128 doesn't qualify as an integer type as per the standard))

WalterBright · 2024-02-19T16:40:12.000000Z

Yeah, except that ints in your data structures will unnecessarily consume far, far too much memory.

Karellen · 2024-02-19T18:01:05.000000Z

If 32-bit ints didn't make your structures consume far, far too much memory in the early '90s, when consumer-grade computers came with 4-8MiB of RAM and 256MiB disks, then I don't see how 64-bit ints could have done so in the mid-'00s when they came with 1GiB of RAM and 256GiB disks.

gpderetta · 2024-02-19T18:05:21.000000Z

They still come with 64k (or so) of L1 cache.

WalterBright · 2024-02-20T01:35:46.000000Z

I am constantly amazed at how much memory programs uselessly consume.

flohofwoe · 2024-02-19T17:32:02.000000Z

OTH, the explicitly sized integers (int32_t, int16_t etc) had been added already in C99 (e.g. 25 years ago), and should be used when a specific memory layout of structs is desired.

WalterBright · 2024-02-19T17:48:52.000000Z

If you're going to use those, then there's no reason to have ints be 64 bit.

Personally, I find using int32_t in general to be an uglification of the code. I never use `long` in C code anymore, as it's sometimes 32 bits and sometimes 64 bits. I use `int` and `long long`.

Do I care about 16 bit code anymore? No. Very few programs would port between 16/32 these days anyway, no matter what the Standard says or how hard you try to write code portably.

extraduder_ire · 2024-02-19T19:52:04.000000Z

Always thought it was odd how "long long" is one of the only common types that's two words with an implicit int at the end.

Almost makes me want to add "typedef long long longer;" to some code that I don't intend anyone to maintain.

LeFantome · 2024-02-19T20:26:02.000000Z

Is it the numbers you do not like? If you use “long long”, it must not be the length.

I don’t love long long. As an amateur compiler writer, it hurts me. “long long” makes “long” both a modifier ( like unsigned is ) and a type. Yuck.

I wish it was i8, i16, i32, and i64 ( with u versions of each ). f32 and f64 for floats. Those are easy to understand and fairly easy on the eyes.

If those numbers are too noisy, the CIL ( .NET ) types could work. For example, i4 and i8 instead of i32 and i64. I do no love the look of i1 either though. I guess you could special case sbyte and byte as aliases.

WalterBright · 2024-02-20T01:34:31.000000Z

> Is it the numbers you do not like?

Correct.

> fairly easy on the eyes

Not for me. It's a personal thing, I just don't like it. When I removed them all from my code, it was like I'd scraped the barnacles off my boat.

dzaima · 2024-02-19T21:55:44.000000Z

"long" is always a modifier, just potentially applied twice, and potentially to nothing. A more written out version of "long long" is "long long int", and similarly "long" is really "long int".

cpeterso · 2024-02-19T20:18:56.000000Z

Google’s C++ style guide similarly recommends using ‘int’ in general and never using ‘int32_t’, though it recommends using ‘int64_t’ for bigger numbers instead of ‘long’:

https://google.github.io/styleguide/cppguide.html#Integer_Ty...

Gibbon1 · 2024-02-19T20:49:35.000000Z

I essentially do not use int short, long, long long at all. Frankly I think those were a terrible mistake and people should avoid using them.

WalterBright · 2024-02-20T01:30:44.000000Z

Anybody who uses "int short" should be keel-hauled. That nonsense did not make its way into D!

Gibbon1 · 2024-02-20T03:51:06.000000Z

The lack of memory safe casts drives me a bit batty.

You would think casting something as signed or unsigned wouldn't promote to a int / unsigned int. Ditto for const and unconst.

WalterBright · 2024-02-20T07:33:38.000000Z

When the first C++ Standard (C89) was being created, about half the compilers implemented "sign preserving" semantics, which is what you're advocating, and the other half implemented "value preserving" semantics.

A great battle ensued, and many champions were slain.

The value preserving folks carried the field, and the sign preserving folks changed their compilers.

hgs3 · 2024-02-19T16:34:38.000000Z

I was always a little surprised C never had an integer type the size of a native word. Probably the closest thing would be intptr_t since pointers presumably use a single word to represent addressable memory.

flohofwoe · 2024-02-19T17:29:07.000000Z

AFAIK until the switch to 64-bit architectures, int actually was the natural word size. Keeping int at 32-bits was probably done to simplify porting code to 64-bits (since all structs with ints in them would change their memory layout - but that's what the fixed-width integer types are for anyway, e.g. int32_t).

In hindsight it would probably have been better to bite the bullet and make int 64 bits wide.

dataflow · 2024-02-19T19:38:57.000000Z

> AFAIK until the switch to 64-bit architectures, int actually was the natural word size

32-bit int is still arguably the native word size on x64. 32-bit is the fastest integer type there. 64-bit at the very least often consumes an extra prefix byte in the instruction. And that prefix is literally called an "extension" prefix... very much the opposite of native!

Karellen · 2024-02-20T12:07:54.000000Z

Aren't 32-bit registers/operations also called "extensions" of their 16-bit counterparts on the x86 line, due to the ISA's 16-bit 8086/80286 lineage?

So could one make the argument that a 16-bit int ought to be the native word size on x64?

dataflow · 2024-02-20T13:57:21.000000Z

No. This isn't about dictionary pedantry. 16-bit is actually frequently more expensive than 32-bit on x86.

bonzini · 2024-02-19T22:02:19.000000Z

32-bit also often has the prefix byte (if one of the operands is r8-r15 or, for extending moves from 8-bit registers, r4-r15)

garaetjjte · 2024-02-19T17:56:46.000000Z

int being originally native word size is the reason for weird integer promotion rules.

flohofwoe · 2024-02-20T11:08:59.000000Z

IMHO it's only weird because the promotion is to 32-bit integers on 64-bit platforms. If all math would happen on the natural word size (e.g. 64-bits), and specific integer widths would only matter for memory loads/stores, it would be fine.

edflsafoiewq · 2024-02-19T22:20:11.000000Z

int is the smallest type you can do ALU ops on, so as long as x64 can still do 32-bit arithmetic, it's "natural" for it to remain 32 bits.

zozbot234 · 2024-02-19T16:57:30.000000Z

> pointers presumably use a single word to represent addressable memory.

Only in flat address spaces, which excludes platforms like old 16-bit x86 or modern CHERI. There, "pointer difference within a single object" need not be the same size as "pointer reference".

gumby · 2024-02-19T16:50:37.000000Z

It did on the PDP-11

WalterBright · 2024-02-19T16:40:48.000000Z

size_t and ptrdiff_t works for me.

fweimer · 2024-02-19T20:57:55.000000Z

I would argue that the native word size is still 32 bits on x86-64, though. With many instructions, using 64-bit registers needs a segment prefix override. Some RISC architectures do not even have 32-bit zero-extending integer instructions, so for them, 64-bit as the native word size makes sense. On the other hand, I'm not sure if <stdint.h> and uint16_t were invented at the time, and “short short int“ is not valid syntax (even today), so there wasn't an obvious way to denote a 16-bit integer type.

bee_rider · 2024-02-19T18:33:28.000000Z

If the programmer didn’t specify the size of an int, it should mean “dealer’s choice.” Let the compiler pick a default, better yet make it a compiler option.

Fortran is, as always, ahead of the game.

Conscat · 2024-02-20T00:52:58.000000Z

It is a compiler setting.

`-Dint=__INT64_TYPE__`

pklausler · 2024-02-19T23:09:26.000000Z

“as always”?

bee_rider · 2024-02-19T23:53:23.000000Z

I think you accidentally put a question mark where you meant to put an exclamation point.

Joker_vD · 2024-02-19T15:30:37.000000Z

Yes, they don't; it would be possible for a compiler to gather info about such call sites into a list and then, when it's finished compiling a compilation unit, to check this list against the now-complete symbol table and fail if some called functions have mismatched definitions or are still undefined... that's apparently was too much work back when C was designed so we have "hope for the best" design instead.

sltkr · 2024-02-19T15:50:16.000000Z

This doesn't work in C because the function is not necessarily defined in the same translation unit.

You'd need to do this at link time instead, which would require completely overhauling the format of object files, dynamic libraries, static libraries so they carry information about the types of functions instead of just the symbol names. It's not an easy fix.

Joker_vD · 2024-02-19T17:54:42.000000Z

> This doesn't work in C because the function is not necessarily defined in the same translation unit.

It does work in C ― that's what the include files are for (among other things), after all. So it's possible to be able to forward-declare functions outside the translation unit only (extern-declare?), those inside the translation units don't need to even if the compiler works in a single pass. And those external declaration could still be introduced at the very end of the translation unit and still would count. I dunno, seems like a pretty reasonable idea.

zozbot234 · 2024-02-19T16:52:06.000000Z

That's what DWARF is for, and similar formats on other platforms. Also Pascal-lineage languages commonly use this kind of embedded type information to provide module interfaces, it works quite well.

kryptiskt · 2024-02-19T15:47:58.000000Z

There is no information about the function's prototype in the object file, so the linker can't determine that. The compiler can't either since it's just compiling the compilation unit and has no knowledge of the outside world except what we tell it by prototypes (injected via header files or otherwise), but we aren't doing that here.

Someone · 2024-02-19T17:12:27.000000Z

> There is no information about the function's prototype in the object file, so the linker can't determine that.

That that information isn’t there is an implementation choice.

For example, they could have hacked it in like C++ did by mangling names (https://en.wikipedia.org/wiki/Name_mangling). That probably would have required supporting longer identifiers (IIRC archives limited them to 14 characters), but that’s doable.

mfichman · 2024-02-19T15:35:00.000000Z

I think that would by definition be a two-pass compiler.

Joker_vD · 2024-02-19T15:43:07.000000Z

No, you emit all the code in one go, then after you've done that you have some residual pieces of data left one the side: the symbol table, and the list of all the calls of (hopefully)-forward-declared functions. At this point you could run the check on that list against the symbol table, no additional codegen or re-reading the source text needed.

Granted, you can call that a second pass although that's not that different from emitting a function's epilogue IMHO.

sltkr · 2024-02-19T15:56:08.000000Z

There is no need for the second phase. You can record the implicit declaration when you encounter it, and if there is a subsequent declaration, you can check that it's consistent or error out immediately.

This is what C compilers already do, in fact, to produce warnings when an implicit declaration doesn't match a later explicit declaration. But this is a best-effort warning only; it doesn't work if there is no declaration because the function is defined in a different translation unit, as I pointed out above.

mfichman · 2024-02-19T22:14:54.000000Z

Usually, the second pass in a compiler does not re-parse source files. Rather, it operates on another data structure, like an AST, intermediate representation, or the list mentioned in the original comment. At least, that’s my understanding of multi-pass compilation.

Joker_vD · 2024-02-20T11:34:39.000000Z

Well, this "list mentioned in the original comment" is not an AST or an immediate representation of the program in any reasonable sense just as a symbol table is not. Otherwise, setting the exact values of the text/data/bss size fields in the executable file header's at the end of the compilation would count as the second pass as well which IMO it should not.

mfichman · 2024-02-20T12:00:50.000000Z

The difference is that you need to make a complete second iteration (or pass) over the entire list to correctly check all of the callsites after all function type information is collected. The same is not true for symbol table usage in a single-pass C compiler.

NelsonMinar · 2024-02-19T16:26:05.000000Z

Oh this makes me feel old. When I first learned C declaring functions was very much an optional feature. A good idea, but optional.

uecker · 2024-02-19T18:49:45.000000Z

Why surprisingly? In pre-ANSI C is it only possible because there was no type safety. In other languages it could work by deriving the type from the call, but this does not work in C.

dataflow · 2024-02-19T19:32:51.000000Z

Well it depends on the calling convention. They're right in the sense that this can't be done for arbitrary calling conventions.

TapamN · 2024-02-19T20:59:52.000000Z

>The reason for that is that C does not offer a generic function pointer type, and standard C disallows casts between function pointer types and object pointer types.

I've wondered why this hasn't been relaxed a bit when void* is used. Say you have these functions (modified from the link)

  int compare (const char *a, const char *b) {
          return strcmp(a, b);
  }
  
  int compare (const void *a1, const void *b1) {
          const char *a = a1;
          const char *b = b1;
          return strcmp (a, b);
  }

And then you have some FP of type int(*compare)(const void *a1, const void *b1) somewhere...

If you have correct pointer const-ness, why isn't the first method the PREFERRED way of doing this? It's shorter, more clear, safer (calling compare directly and not through a FP still gets you proper type checking (you wouldn't call either of these example functions directly, but for others you might)), and IDE suggestions can better explain what the function is. I've thought several times about suggesting to compiler writers/the C committee to bless the first method. Is there some obscure hardware that the first would be incorrectly compiled or something (i.e. the calling convention for void* and foo* is different)?

dzaima · 2024-02-19T22:18:26.000000Z

C11:

    A pointer to void shall have the same representation and alignment requirements as a pointer to a character type. [48]
    <blah blah pointers to qualified vs unqualified, structs, and unions also have the same representation between themselves>.
    Pointers to other types need not have the same representation or alignment requirements.
    
    [48]: The same representation and alignment requirements are meant to imply interchangeability as arguments to functions, return values from functions, and members of unions.

The thing I've heard for systems where e.g. an int* and char* aren't equal is where the "int*" is the "primary" pointer type, and a char* has to add back the would-be-leading-bits or something. Though as per the above quote, char* vs void* would be safe? Doesn't help anything other than char* though.

erik_seaberg · 2024-02-19T22:51:18.000000Z

There used to be hardware that supported pointers to individual bits or complete words, rather than the eight-bit bytes we see today.

https://en.wikipedia.org/wiki/Word_addressing

bubuche87 · 2024-02-19T21:32:18.000000Z

I am not sure I understand what you say (despite having read your message several times), but pointers don't have all the same size. Not always, at least. (void* is supposedly big enough to contain any type of pointer, even if I think I remember I read that it's not necessarily true for function pointers).

And if the size of "void" and "char" aren't the same you cannot push two void* on the stack and pop two char*.

But, like I said: maybe I didn't understood what you said.

TapamN · 2024-02-19T23:12:57.000000Z

I was wondering if pointer size is still a problem. Hopefully this explanation is clearer:

  int compare_char (const char *a, const char *b) {
          return strcmp(a, b);
  }
  
  int compare_void (const void *a1, const void *b1) {
          const char *a = a1;
          const char *b = b1;
          return strcmp (a, b);
  }
  
  int (*call_void)(const void *a1, const void *b1) = &compare_char; //not valid in current C standard

Why can't the standard be changed so that it is valid to set call_void to &compare_char, so we don't need to write the longer compare_void? Are there architectures are still in active use where not all pointers are the same size in plain C (not worrying about C++) that would disallow this?

AFAIK, x86, ARM, MIPS, SPARC, Alpha, and SuperH would all work fine calling compare_char though call_void. There could be some other issues, like near/far on 16-bit x86, but that would be orthogonal to implicit function pointer void* casting, and could still occur if call_void was set to compare_void. Do one of the more obscure embedded CPUs still in use have a varying pointer size?

flohofwoe · 2024-02-19T17:24:26.000000Z

Tbf, all of those changes should not be surprising to any C programmer for at least the last two decades or so. I'm actually surprised that GCC was so lenient for so long.

veltas · 2024-02-19T17:50:13.000000Z

It wouldn't have been surprising to make this the default in the 90's even. The things here are pretty much never done intentionally post-standardisation. And it's easy to force these back to being just warnings to make old builds work if needed.

iforgotpassword · 2024-02-19T19:49:58.000000Z

The only annoying thing really is the second strict aliasing example, requiring the temporary variable. It just seems really inelegant. But it has been a pitfall ever since, since using an explicit cast removes the warning (now error) but still breaks strict aliasing.

I think the most pragmatic solution is to just compile everything with no strict aliasing. You still get errors for accidental incompatible assignments but won't get bitten by the optimizer.

actionfromafar · 2024-02-19T15:13:01.000000Z

I like how delightfully stable GCC and C is when things from C99 is mandatory only now, a quarter of a century later.

fl0ki · 2024-02-19T15:32:07.000000Z

And still not even mandatory if you use -fpermissive or -std=c89.

I don't envy how much tech debt they have to deal with, while still trying to make it possible for new code to have a more modern experience. For all the excitement around new languages, we're still going to have a lot of C code for a long time, and difficult work like this should be appreciated.

estebank · 2024-02-19T15:55:44.000000Z

I can't help but wonder how many bugs could have been detected and fixed a quarter of a century earlier if those lints had been adopted more diligently by the C ecosystem.

loeg · 2024-02-19T16:27:49.000000Z

My perception is that more and better warnings, and common use of -Wall, came into vogue around the time of GCC 4, in the mid 2000s. It would have been pretty abrupt to immediately enforce C99 in 1999-2000.

estebank · 2024-02-19T16:34:24.000000Z

Yet in 2022 this change from warn by default to error by default was consequential enough for Fedora:

>> So what is the current status? How many packages are going to be affected by this change? How are we going to track the progress?

> I see an unaudited rebuild failure rate of about 10% for rebuilds of source packages that produce arch-full binary packages. This number does not include packages which fail to build in rawhide without the compiler change. It includes packages which configure checks for something that we really do not support (like setproctitle or strlcpy). After the first pass completes, I'll have to do a second pass with the expected-to-be-undeclared functions gathered from the first pass filtered out. That should give us better numbers.

pm215 · 2024-02-19T17:07:41.000000Z

My impression is that a lot of the fallout was related to checks in configure scripts and equivalent. That kind of code was traditionally written a lot more sloppily and was much less likely to have the "standard" set of warnings or the make-warnings-into-errors flag enabled. So, yes, quite a lot of packages failed to build, but the offending code was typically not code that was run when somebody was actually using the program and not code where bugs were consequential. And of course Fedora's a big collection of software with a pretty long tail -- 10% of packages likely works out to a lot less than 10% of user-executing-a-program hours.

jcranmer · 2024-02-19T17:15:08.000000Z

I would guess that the main cause of failures is autoconf, which has a habit of relying on extremely dubious C code to check for feature availability. On top of that, the compiler runs are done in a way to guarantee that no one will see any warnings, even errors can tend to be silently ignored (!), and a particular version is often baked into a source package, so upgrading autoconf to a more sane version tends to take eons.

loeg · 2024-02-19T16:49:30.000000Z

If this was intended to be responsive to my comment, I'm missing the connection.

spookie · 2024-02-19T16:15:16.000000Z

The other day I tried to compile a really old program, the Power Crust and its predecessor. It just worked, for some reason I was expecting it not to. I didn't even have the filetype it was supposed to accept, but it was fine, it was just a list of points. One per line.

Sanity is such a rare thing these days.

lelanthran · 2024-02-19T16:23:46.000000Z

> I like how delightfully stable GCC and C is when things from C99 is mandatory only now, a quarter of a century later.

This is how stable it is - they were still supporting C constructs from prior to C89 standardisation (which is when, I believe, void pointers were introduced).

"maybe consider using void * in more places (particularly for old programs that predate the introduction of void * into the C language). "

haolez · 2024-02-19T15:21:42.000000Z

Didn't some parts of C99 get rolled back on C11 and other newer ones?

fl0ki · 2024-02-19T15:40:36.000000Z

Variable Length Arrays were mandatory in C99, optional in C11, and are mandatory again in C23.

I'm curious how these dynamics play out in less popular compilers. If a compiler implemented VLAs in C99, they almost certainly still have that feature for backwards compatibility even after they support C11.

Is there any compiler which appeared on the scene between C11 and C23, and during that window, chose not to support VLAs and thus C99? It's not like C11 itself was very widely adopted, precisely because of the long implementation and industry rollout windows.

Joker_vD · 2024-02-19T15:47:43.000000Z

Uh, are they mandatory again?

                       For these reasons, we propose to make variably-modified types
    mandatory in C23. VLAs with automatic storage duration remain an optional language
    feature due to their higher implementation overhead and security concerns on some
    implementations (i.e. when allocated on the stack and not using stack probing).

from N2778 (https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2778.pdf)

fractalb · 2024-02-19T15:54:30.000000Z

Thanks for the link. As I understood it, it's just the support for the syntax that is mandatory.

> Variable length arrays with automatic storage duration are a conditional feature that implementations need not support

LegionMammal978 · 2024-02-19T16:22:55.000000Z

The syntax and also the semantics. For instance, you can take the sizeof() a variably-modified type, or the offsetof() one of its fields, and the compiler has to do all the layout calculations implied by the type declaration at runtime. These features are partially what motivated the mandatory support. The only part that is still optional is using such types by value as stack variables (i.e., variables with automatic storage duration).

actionfromafar · 2024-02-19T16:26:01.000000Z

How do you create a variable array on the heap?

confused

klodolph · 2024-02-19T16:38:55.000000Z

With `malloc`, and converting the pointer to the correct type.

    void function(int n) {
      int (*arr)[n] = malloc(sizeof(*arr));
    }

actionfromafar · 2024-02-19T16:51:22.000000Z

wow, that's wild

uecker · 2024-02-19T18:59:39.000000Z

That is exactly how you create any other type of object on the heap.

actionfromafar · 2024-02-19T21:43:26.000000Z

Snarky or just know a lot? It changes quite a bit how the compiler works. It has to know to make sure malloc gets the array element size argument multiplied in runtime by n. To a mere user it broke my mental shorthand of how a C compiler works.

Joker_vD · 2024-02-20T02:14:21.000000Z

> It has to know to make sure malloc gets the array element size argument multiplied in runtime by n

Um, C compilers already do that with arrays with compile-time lengths.

    #include <stdio.h>

    void main(void) {
        char x[20][30];
        printf("%zu\n%zu\n", sizeof(x), sizeof(x[0]));
    }

prints

    600
    30

so you can have "char *y = malloc(sizeof(x)); memcpy(y, x, sizeof(x));" and it must work since C89 at least. The main problem with VLAs is that they make exact the stack frame size unknown until runtime which complicates function prologues/epilogues but that's the problem in the codegen part of the backend, the semantics machinery is mostly in the place already.

P.S. And yes, uecker is a member of the ISO C WG14 and GCC contributor, according to his profile.

LegionMammal978 · 2024-02-20T05:32:07.000000Z

I think there is a real difference: in the static case, the compiler can just recurse into the type definition at any point, compute fully-baked sizes and offsets, and cache them for the rest of the compilation. But in the dynamic case, you end up with the novel dataflow of types that depend on runtime values, and more machinery is necessary to track those dependencies.

Of course, this runtime tracking has always been necessary for C99 VLA support, but I can easily see how it would be surprising for someone not deeply familiar with VLAs, especially given how the naive mental model of "T array[n]; is just syntax sugar for T *array = alloca(n * sizeof(T));" is sufficient for almost all of their uses in existing code. (In any case, it's obviously not "creating an object on the heap" that's the unusual part here!)

Joker_vD · 2024-02-20T11:30:19.000000Z

> more machinery is necessary to track those dependencies.

Well, is it much more machinery? IIRC doing

    void f(size_t n) {
        int x[n];
        n += 1;
        ...

does not resize x, so there is no dataflow dependency or rather, x depends on a hidden const variable so no additional dataflow analysis necessary.

LegionMammal978 · 2024-02-20T16:09:49.000000Z

Yet

  void f(size_t n, int cond) {
      if (cond) { n += 1; }
      int x[n];
      ...

does resize x depending on the value of cond, so the size can't necessarily be known until the point where the type (int[n] in this case) is named.

Also, the compiler has to make sure it keeps around implicit locals to store the variable layouts, so that code like

  void f(size_t n) {
      typedef int array[n];
      n += 1;
      array x;
      ...

functions as specified. This kind of pattern is one of the bigger things setting the feature apart from just "syntax sugar for alloca()".

pjmlp · 2024-02-20T09:46:01.000000Z

It only works at the same level, the moment they get passed in as arguments, they decay into pointers even if using [].

fl0ki · 2024-02-19T16:01:30.000000Z

I'm not tuned in to the nuances here but I note that whoever felt they understood it well enough to update Wikipedia summarized this as "The C23 standard makes VLA types mandatory again. Only creation of VLA objects with automatic storage duration is optional."

https://en.wikipedia.org/wiki/Variable-length_array#C99

If that phrasing isn't accurate I'm sure they'd appreciate an edit.

uecker · 2024-02-19T19:47:21.000000Z

VLAs were optional in C11 and compilers that supported them in C99 also supported them in C11. The only important compiler not supporting VLAs is MSVC, but this compiler also did not support other features from C99. People using MSVC were stuck using an obsolete version of C for a long time. Recently this changed and MSVC started to support C11 and C17 - skipping C99.

garaetjjte · 2024-02-19T18:07:03.000000Z

Portable code couldn't use VLAs anyway, it was never supported in MSVC.

uecker · 2024-02-19T19:49:01.000000Z

Portable C code could not use MSVC until recently, because MSVC was stuck with C89 because MS wanted everybody to switch to C++.

pjmlp · 2024-02-19T18:44:50.000000Z

Among others.

fractalb · 2024-02-19T15:48:30.000000Z

VLA support is mandatory in C23? I'd like to know the rationale behind this decision. Can you provide any references? Thanks.

veltas · 2024-02-19T18:19:33.000000Z

If I understand right, this is mandatory:

  void example(int n) { printf("%zu", sizeof(int[n])); }
  void example2(int n, int (*a)[n]) { }

This is optional:

  void example3(int n) { int x[n]; }

marmight · 2024-02-19T19:05:57.000000Z

  > The corrected standard C source code might look like this (still disregarding error handling and short writes):
  >
  >  void
  >  write_string (int fd, const char \*s)
  >  {
  >    write (1, s, strlen (s));
  >  }

And disregarding the passed file descriptor! :)

fweimer · 2024-02-19T20:01:26.000000Z

Hah! Thanks, fixed.

Scubabear68 · 2024-02-19T16:53:51.000000Z

I really am shocked that GCC took K&R, pre-ANSI C for this long. For Pete’s sake, I was ANSI-fying Aztec C libraries in the late 1980’s!

lanstin · 2024-02-19T20:27:00.000000Z

Maybe because they got bundled into distros soon and as a critical dependency. For a long time as the internet ramped, portability meant you could pretty quickly edit a few header files or Makefile to make it compile for your particular hardware/OS. But then as the perl build scripts showed and autoconf tried to spread, it became expected to download and compile with no changes. In the context of a distro, maybe that put pressure to provide quick work around to keep old code compiling with minimal changes to the actual source?

kevin_thibedeau · 2024-02-19T19:52:02.000000Z

K&R still exists in lots of the GNU userland apps that see little to no updates.

layer8 · 2024-02-19T19:43:06.000000Z

GCC never cared about Pete.

omoikane · 2024-02-19T17:37:04.000000Z

Implicit int becoming errors is going to break certain IOCCC entries, including some more recent entries that might have done it due to size constraints, for example:

https://www.ioccc.org/years.html#2019_burton

https://www.ioccc.org/years.html#2019_duble

veltas · 2024-02-19T17:47:59.000000Z

You can still compile these by downgrading the errors to warnings. They are just errors-by-default now.

ezekiel68 · 2024-02-19T18:44:37.000000Z

Not hostile toward it, but -- if the IOCCC ceased tomorrow, the world would be neither better nor worse off for the loss (in the grand scheme).

lsllc · 2024-02-19T21:21:09.000000Z

I think it best we keep the folks who like to write that sort of C code busy with the IOCCC rather than out coding in the real world!

/s

jart · 2024-02-19T21:21:01.000000Z

It's really sad how the C standards committee is pushing this anti-c89 thing because it breaks one of the most important use cases for C which is writing programs that work under both GCC and JavaScript. See https://justine.lol/sectorlisp2/#evaluation and https://justine.lol/sectorlisp2/#polyglots I've yet to see any explanation of what we stand to gain, by disallowing implicit int. The whole thing just comes across as religious in nature.

DaveFlater · 2024-02-19T22:33:15.000000Z

In C++ this is simply legal code. But in C, the bogus warning has now escalated to a bogus error.

   void fun110 (char const * const *a) {}
   char **a;
   fun110(a);

zzo38computer · 2024-02-19T20:34:43.000000Z

I agree with making most these warnings to be errors by default; actually, they are the same warnings that I had previously thought that should be errors by default (the only exception is incompatible-pointer-types, although that makes sense too to be an error, so it is OK anyways). I think this is good