Hacker News new | past | comments | ask | show | jobs | submit login
C2x: the next real revision of the C standard (gustedt.wordpress.com)
175 points by ingve on Nov 12, 2018 | hide | past | favorite | 184 comments



They should add something to run code when leaving a scope, no matter how you leave the scope (break, return, ...).

That would allow cleanup and other such actions without resorting to such hacks as: "goto cleanup" pattern, separate functions, fake for loop to break out of, ...


You can do this with GCC/CLang via the cleanup attribute: https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attribute...

It would be nice to have this be a part of the standard.


It's very useful. I use it for scoping the holding of locks. It replaces the idiom where you make a separate function to hold the 'when locked' code and pass all required state back and forth with nice clean inline blocks of code.


You can do it with GNU C. It would be nice if it was standard.


You can do it in Bjarne Stroustrup C, which is standard.

You can do it in the 20-year-old version of it, and draft versions many years before that.


C++ has about as much to do with C as Java with JavaScript these days though. It's just not an option in many cases where C is fine. I've gone back and forth the last 20 years or so, and have plenty of experience with both languages. Even the experts are beginning to loose faith, it's just getting too damn complex. I'll take variants as an example, the hoops people are willing to jump through to get something as basic as unions working properly is just insane. It looks more like religion than programming from here, Rust has the same cult like behaviors going on all over the place.

Edit: Ah, I see; you're more worried about them turning C into the next C++ :) I might even agree some days.


Please, C and C++ share a large subset, and, most importantly a memory model. Java and JS share, what, a couple of parentheses?


Sure, the interop is unmatched.

What they don't share is mindset; C is all about keeping it simple, I don't even know what C++ is about any more, but it sure isn't simple. Many fundamental tricks that are routinely pulled in C require jumping through plenty of hoops to keep a modern C++ compiler happy.

Speaking of interop, calling into C from other languages is very different from calling into C++. Yes, you can write C wrappers, but it's rarely done. And the further the language drifts into template lala land, the more tricky it gets.


> Many fundamental tricks that are routinely pulled in C require jumping through plenty of hoops to keep a modern C++ compiler happy.

Not in my experience. Currently my man open source project consists of some 70,000 of C which compiles as C++. Maintaining C++ compatibility takes very little effort. I tend to switch to C++ before a release, to catch regressions. Often there is nothing. The most I spend is maybe five to ten minutes fixing some minor things. Some of those minor things have nothing to do with C versus C++ like signed/unsigned warnings.

Note that these signed/unsigned warnings from GNU C++ actually found a real problem: I had code which assumed WEOF was negative. (It was originally char based code that was switched to wchar_t; of course EOF constant in the narrow character world is negative.) The GNU C++ compiler also identified the situation in the same code that a variable that should have been wint_t was mistakenly int.


I'll repeat this here. Automatic void casts, flexible struct fields and calculating offsets to find the containing struct given a field are three things that I use all the time in C. Last time I checked, C++ wasn't very impressed by either one.


The classic C90 version of the flexible array member (called the "struct hack") works fine in C++. So does the standard offsetof macro as well as containerof.

The classic struct hack looks like:

  struct header {
    size_t size;
    char data[1];  // not [0]
  };
the size of the structure to just before data is offsetof(struct header, data).

Automatic void casts are a misfeature in both languages; more so in C. In code that I control, I use a pointer to a character type as an "any memory" type, so that it requires a cast in both directions:

E.g.

  typedef unsigned char mem_t;
a custom allocator or allocator wrapper will return a pointer to mem_t.

I use macros for casts, which compile to the classic C cast in C, and the more restricted casts in C++:

  #ifdef __cplusplus
  #define strip_qual(TYPE, EXPR) (const_cast<TYPE>(EXPR))
  #define convert(TYPE, EXPR) (static_cast<TYPE>(EXPR))
  #define coerce(TYPE, EXPR) (reinterpret_cast<TYPE>(EXPR))
  #else
  #define strip_qual(TYPE, EXPR) ((TYPE) (EXPR))
  #define convert(TYPE, EXPR) ((TYPE) (EXPR))
  #define coerce(TYPE, EXPR) ((TYPE) (EXPR))
  #endif
The GNU C++ compiler has a cool feature: -Wold-style-cast. With this I can pinpoint uses of the casting notation, and then replace them with these macros.

The stupid void star thing and its conversion rules should never have been invented. C++'s treatment of it is more sane, at least, by working without a cast in only one direction.

What's really awful is official API's that use void star for opaque handles. The Kronos Group's OpenMax is one of these. You can mix up different kinds of handles for different objects and the calls will compile.


It's only a hack in C++ these days, from C90 it's standardized.

That's like your opinion, plenty of C programmers worth their salt would disagree with you.

Look, I'm not saying it's impossible to write code that compiles both as C and C++. The same is true of many other languages. The point is that you can't write C and expect it to be valid C++, which was the only thing I claimed.


As someone who's been paid in the past to write shellcode, stuff like that paid my bills.


I'm pretty sure it was the same old regular buffer overflows as always. Which begs the question, what are you aiming for here?


A lot of the time, simply overflowing a buffer like you would in the 90s doesn't cut it thanks to all of the mitigations built in these days. Finding code that completely forgets the shape of it's data really helps get the job done though.


When I start doing anything in C, it is always refreshingly simple.

Then my project grows to the point where I need to do some simple string operations.

At which point I remember the wonderful simplicity of garbage collected, high level languages...


I use C directly when I have to, my preferred setup these days is embedding a more convenient scripting language in C.


Most C is inducted into C++ eventually. The point is that it would be nice to have something like RAII without subscribing to the full C++ perspective.


well, compile with a C++ compiler and just use RAII ?


It's not that easy.

A modern C++ compiler will have all sorts of issues with fundamental C idioms, and depending on C++ is a very different story from depending on C.


C++ compilers do not have issues with fundamental C idioms.

I maintain a significant body of code which compiles as C or C++. The executable size and performance are about the same.

A C++ compiler will, of course, have "issues" with C99 and C11 features, that's for sure.


Running without exceptions is one C idiom. RAII, operator-new, and many other C++ features break down without them.


Obviously, I do not use RAII, operator new or exceptions in a code base that compiles as C or C++.

Here let me note that I have not gone out of the way to disable anything in C++. I have not disabled EH in the compiler, or RTTI or anything. Yet, the executable size is close to the C one, and the performance is basically the same.


> Running without exceptions is one C idiom

well, did you know that for instance the windows C library was implemented in C++ ? and yet we don't see it throwing exceptions left and right.


Which idioms are you referring to?


Automatic void casts, flexible struct fields and calculating offsets to find the containing struct given a field are three that I use all the time. Last time I checked, C++ wasn't very impressed by either one.



C, not C++.


  /**
   * GCC and Clang extension to call cleanup function at exit
   * from the function with pointer to given function as argument.
   */
  #define defer(func) __attribute__((cleanup(func)))

  #ifdef _CRUST_TESTS
  
  // Sample destructor, to test defer
  static void _tst_charp_destroy(char * * value) {
    if(*value) {
      free(*value);
      *value = NULL;
    }
  }
  
  // Test case
  it(defer, "must deallocate object at the end of the function") {
    defer(_tst_charp_destroy) char * s = calloc(1, sizeof(char));
    (void)s;
  }
  
  #endif


They probably don't want to because then they'd have to specify how they would interact with the setjmp/longjmp functions (which are, in fact, specified as part of the ISO C standard, section 7.13 in C11).


setjmp/longjmp already interacts poorly with other commonly used language features (e.g. values of non-volatile locals upon longjmp).


In case anyone is wondering, GCC and other compilers catch non-volatile locals clobbering when you compile with -Wextra.


I completely agree, I think this is something that C desperately needs.

D has scope(exit){...} which is similar to constructs like defer {...}

Many things can be done in some form with macros and the C11 generic keyword if they have to be, but defering statements to run on scope exit is still elusive as something that can be relied upon.

https://tour.dlang.org/tour/en/gems/scope-guards


C desperately needs to be left alone.


You've made a few comments like this; care to explain your thoughts rather than just say a bunch of variations on "nooooo!"?


The C language is finished and great as it is.

Many programmers like myself have no problem with it whatsoever.

Change in a key infrastructural component like this is only destabilizing.

The ISO C people should have the decency to disband and work on their pet language research projects in private.


The C language (and the C++ language, for that matter) basically _never_ make backwards incompatible changes. This is a first class priority for the standards committees.

Adding simple abstractions which don't hide complexity in the underlying assembly is a huge productivity and correctness boost. Not that it matters anyways; modern optimizers are always going to do things that absolutely bend your brain.

Wouldn't that be nice if you could use a `scope(exit) {}` instead of accidentally missing two or three `goto cleanup_7;` which result in an RCE?

I would absolutely love a standards-compliant way to scope-guard sections of code. It'd make huge swaths of C code simpler; it's even in enough demand that Clang and GCC implement as their own extensions.


Better go through the standards then, both of them have introduced breaking changes in their revisions.

I can gladly provide a couple of examples.


Please do provide examples. Revisions to the C and C++ have almost never broken meaningful backwards compatibility, at least at the language semantics level.


Besides what nrclark has written on his/her comment, C11 dropped gets() and the introduction of the new memory model in C11 might break code that was relying on CPU specific semantics not in synch with it.

Regarding C++:

- auto changed its meaning on C++11

- export templates were removed in C++11

- exception specifications were deprecated in C++11, removed in C++17 and might do a come back in C++23 with value based exceptions

- gets() got removed in C++11

- declspec and auto had a small semantic difference, settled in C++17

- initializer lists introduced in C++11 changed their auto deduction type in C++17

- the required implementation semantics for std::string in C++11 broke COW in compilers like GCC.

Just a small list, there are a couple more.


C's variable-length arrays are one example. They were introduced in C99, and downgraded to 'optional' in C11. So that's one place where a newer C compiler could refuse old C code and still be language compliant.

Also the draft C2x standard does away with K&R declarations, which will be another compatibility break.


Here is a counterpoint: The C language is not finished, and many programmers like myself would like to see the language continue to evolve and grow in reasonable ways, especially given the fact we'll keep on writing it for a while to come. The ISO C people should be commended for doing their hard work.

Alright. Glad we cleared this whole thing up.


> we'll keep on writing it for a while to come

Nope; I will be writing in "it". What you obviously want is to be writing in something other than "it". You basically want to be able to write Rust, Go or D into a file that has a .c suffix and is passed through a preprocessor.


> The C language is finished

There's no evidence of this. There's just existing code. Ok. There's existing code in lots of other languages with different syntax (large and small differences). So what?

The latest video about Oden was a fantastic primer on QoL changes that should be standard, but there are people who always think what they learned is optimal. These are the same people who trivialize evidence to the contrary, in defense of their particular viewpoint.


You are free to stay on C89 or C99. They are not going away anytime soon, too much code relies on it.


If you think if standard-driven changes to the compiler will never break your C89 code, because you're using the -std=c89 switch or whatever, you are pretty naive.

The only guaranteed way your compiler won't break your code is if it's left alone.


That can happen due to changes unrelated to new standards too. So if you are worried about compiler changes, then definitely freeze the compiler version you use. Alternatively invest in in quality assurance to verify that your code still works with a new compiler version.


The standard is versioned. You can always use a modern compiler targeting an older standard. You don't have to use C11. Or C21.


And then there is this activity called "working with other people".

Also, gratuitous changes to the compiler can break things that affect C90 operation. There is a risk.


Or go back to K&R, for that matter.


Once you do that, you more or less recreate C++. IMO C vs. C++ is fundamentally an issue of RAII and arguably templates. This proposal would add RAII to C.


You have seem to have left C++ in 1999. C++ has expanded so much that someone can create its own operating system just by adding drivers around its standard library.


Someone should submit a proposal.


The hacks you have listed have the advantage to be visible. Reading the code, you know whether they are called or not.

The issue with cleanup on scope exit is that this introduces ambiguous code-flows. You would need to demonstrate a strong advantage, beyond making life easier for the writer of the code.


The idiomatic way to do that is with destructors.

EDIT: Sorry for the confusion, I misread the article and thought this was about C++2x.


Yep, talking about C here, not C++.

If they would add that feature, you could use macros to get constructor and destructor like behavior (as long as it's within the same scope, you could write a macro that constructs your object, and in addition contains something like "at scope exit: run this function"), which would be super useful and as far as I'm aware not very complex for the compiler (as all they have to do is run some code whenever exiting scope) nor very far outside of the spirit of C (it's still very low level, just a small convenience feature with interesting implications, especially when used with macros).


C++ is a dialect of C.

It is the result of starting with C, and then accommodating a massive, sustained sequence of feature requests, exactly like ones we are seeing in this thread.

Everyone thinks that if just their pet feature request is added to C and everyone else's ideas are rejected, it will still be C.


C++ is more than that; it's a set of additional features that create something of a C-flavored multiparadigm language. Which inevitably does turn it into a kitchen sink language, since it's trying to be all things to all people.

This idea doesn't seem to be in that spirit to me. It's not trying to do something radical like, say, adding OOP semantics to C. It's suggesting that C, a structured programming language, get a feature designed to improve the ergonomics for people doing structured programming in C.


C doesn't have destructors, does it?


You can use C++ as some kind of C with destructors though :P Scoped cleanup is really a nice thing..

But then you also want templates of course to get rid of your void*. And oh, wait, what's that? A cross-platform thread implementation (well, sort of, depends on platform)? A string thing which I can search without strstr? Anonymous functions which capture? Auto? Gimme gimme :]


> You can use C++ as some kind of C with destructors though :P Scoped cleanup is really a nice thing..

C++ is not a strict superset of C. You actually have to cast void pointers, for example.

> But then you also want templates of course to get rid of your void*.

No, actually, I think I can live without that, but thank you.

> And oh, wait, what's that? A cross-platform thread implementation (well, sort of, depends on platform)?

C11, threads.h. Been there, got that.

> A string thing which I can search without strstr?

Okay, I'll give you that, string handling is hell.

> Anonymous functions which capture?

Have their uses, but I'm not sure if C is really the right place for it.

> Auto?

auto (as used in C++11 and later) is a solution in need of a problem in C. When you don't have piles of templates and iterators, you also tend not to need to automatically derive types. In a world where signed/unsigned integer comparison can have fatal consequences, you probably do want to be very sure about your types.


> C++ is not a strict superset of C.

I know that. I use both C and C++ practically daily. The post was partly joking. But only partly (for example of course I could live without templates if really needed but seriously they make some jobs a lot better and easier). Hence I aptly phrased it some kind of C hoping (idly it seems) no one would trip over it because it's obviously not C.


>C++ is not a strict subset of C.

I think you flipped this :) ?


I mean, it’s not wrong…


Oops. Edited a fix in.


Sometimes I wonder if a C++ without inheritance would be the right thing. All the crazy design abuses I have seen were around inheritance. Other things like auto and anonymous functions are just super useful and I wouldn't want to miss them.


I don't think those other features are half as compelling as you think they are.


Using both C and C++, each in their place, I actually do think they are.


Then why ever use C if they're that compelling? I'm not really understanding how wanting one little thing from C++ implies diving all the way in.


Then why ever use C

Because some platforms I've used it on did't have proper C++ support. Or because a project I contribute to is already written in C. Or because of limited resources. Etc.

I'm not really understanding how wanting one little thing from C++ implies diving all the way in.

It's just an example of a possible train of thought. And I wouldn't call something like templates 'one little thing'. So suppose there's a choice between both, and no limit on resources etc, and I could go for everything C but templates could make everything easier then just going for C++ instead is an option. Even when not going 'all in' and writing what looks like standard C + template support.


> And I wouldn't call something like templates 'one little thing'.

Agreed. But templates aren't the one little thing, they're something you said would be 'next'. The 'one little thing' is a way to replace the GOTOs used for releasing resources. I think it's an easy argument that when people are repeatedly building a pattern out of GOTO, there's a gap in the control flow tools that a language gives you.


Auto is already taken in C. It’s the default, rather than register or static, so you hardly ever see it


auto was taken in exactly the same way in C++ until fairly recently. They re-purposed it because it was obsolescent.

auto was already in vanishing disuse in C at the time ANSI C came out in 1989.

The C++ use of auto resurrects an aspect of the way auto was used in the C predecessor languages B and NB.

In these pre-C languages, everything was an integer cell. A declaration of local variables looked like this:

   auto x, y;
the auto meant automatic storage (stack), but the real purpose here is syntactic: the auto is a declaration specifier, which makes the above construct a declaration. There is no indication of type, because it is implicit. Without the auto it would look like a comma expression with no effect.

This implicit typing survived into C for "implicit int" return values, like "main () {...}". But type specifiers were introduced and became mandatory in declarations, and so auto became unnecessary verbiage since all it did now was specify a storage class that was already the default one.

C++ resurrect the syntactic use of auto in a way: once again, it provides declaration syntax that allows the type specifier to be omitted. This time, of course, that type is inferred rather than assumed integer.


This is a great thoughtful reply to my terse, shallow comment. Thank you. It’s insightful


Not the language, but some compilers have something similar:

https://stackoverflow.com/questions/2053029/how-exactly-does...


Yeah... Actually adding something like defer {...clean up code...} would be really nice.

But who are we to ask..? Just programmers, i.e. mere mortals nobody really cares about. :-)


Brief aside: Boost has a facility for doing this in C++ without declaring a new class. (It boils down to a destructor, but it's more compact.)

https://www.boost.org/doc/libs/release/libs/scope_exit/doc/h...


Do they have a new version based on lambdas? The syntax of that is painful. I splatter my code with this:

  // WTFPL
  class Finally {
    std::function<void()> f;
  public:
    Finally(std::function<void()> f) : f{f} {}
    ~Finally() {f();}
    void disable() { f = [](){}; }
  };
Which looks much prettier:

  socket* s = open_socket(); // or whatever
  Finally cleanup_socket{[](){ close_socket(s);}};

  // optionally, to keep it alive:
  cleanup_socket.disable();
Am typing on mobile. No guarantees about syntax errors. I still need to look into where the right places to put r value references and/or std::move are, but for the most part, I never put anything other than a pointer in the closure so I'm not worried about copy constructor cost.


However your implementation based on std::function might allocate, which is then non-zero cost compared to normal exit statements. Better directly store the lambda inside the class, which requires that to be generic (Finally<LambdaType>). That can be hidden by using the class via a templated function which uses types interference:

    auto cleanup_socket = make_scope_guard([&]{ close_socket(s); });
Searching for scope_guard yields lots of alternative implementations.


make_* functions are no longer necessary in C++17 which adds "Class template argument deduction" [1]. You can now do

  std::pair p{"aaa", 123};
and

  Finally guard{[]{cleanup();}};
without specifying template arguments.

[1] https://en.cppreference.com/w/cpp/language/class_template_ar...


Thx, I wasn't aware about this feature yet!


Boost.ScopeExit does declare a new class via macros.


Yes, and I hinted at this, but it's more compact.


> They should add something

I disagree.

> to run code when leaving a scope, no matter how you leave

This, and various other features, were already added to C long ago. The resulting dialect was called C++.


I'm not sure I understand what you're getting at here.

Are you implying that people who want new features in C should just migrate to C++ because it has even more features? Or that C is perfect as-is and anyone who wants to add new features to it should bug off to another language? Or that this particular feature is somehow equivalent to (or a slippery slope toward) adding a whole set of OOP features to C, so we might as well take that to its logical conclusion?


Pretty much all of the above.


I'd like to hear you elaborate. You seem to be very invested in your concerns, but it's hard to understand exactly what they are.

A single, clearly articulated response would do a lot more than a fusillade of terse barbs to help others understand why you think this is a bad idea.


Request for C++14-style constexprs. It would be a great way to spice things up, and I think would be very compatible with the C philosophy.

C is a great language, and I love working with it. But constexpr is something I really feel would improve the language (by eliminating the need for complicated #define macros). C++14's constexpr was a breath of fresh air when I first used it. It basically lets me write real code which evaluates at compile-time and can generate constants. I'd love to see constexpr ported over to C.


That is one thing I would like to see too. That does seem to be one place where the newer challengers to C, are able to have a significant advantage over C.


Already works with clang. Just gcc cannot do it. I.e they are throwing an error on non-constants within if expr's, instead of returning 0.


I think having a primitive templating system would also reduce the usage of macros. While we're at it throw in AST macros too.


Your second request seems considerably outside the style and philosophy of C. Or was that sarcasm?


Yep, it was made jokingly trying to make a point about the slippery slope of 'bloat'. But didn't go that well.


Out of curiosity, would you consider constexpr to be bloat?

I feel like it has tons of applications, especially for embedded C programmers. One example that comes to mind: CRC calculation. Most microcontroller CRC libraries are lookup-based, or a mixture of a lookup table and other methods.

Currently, a lookup table can be done only 3 ways in C: pre-compute it and store 'magic numbers' (either while developing, or with some custom pre-processing), compute at program initialization (which costs code-space and startup time), or a compile-time computation based on some very ugly macros.

With constexpr support, I could just write a C function that calculates the table values. Then I could use it to populate a constexpr table at compile time, or even use the function at runtime if I needed to do some debugging.

C already has sizeof, which is a compile-time computation. Why not let users write their own compile-time functions too?

I recently wrote a command-lookup library in C++14. Using constexpr, I could precompute the hashes of each string and populate a switch() statement. Plus I could use the exact same function to hash my incoming strings, even if they're of unknown length. That's not possible in C at all, not even with loop-based macro trickery.


I think constexpr is a great feature. But without templates (or meta programming in general) I feel like it's severely limited in it's uses (although you cite a great use case). Adding templates is another whole can of worms (for templates to be really useful, add operator overloading too? so on..)


It went all right, once I adjusted my sarcasm detector...



> Improve array bound propagation and checks

Looking forward to this one, specially since Annex K ( Bounds Checking Interfaces) has proven to have not solved anything and is now scheduled for possible removal.

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1969.htm#ad...

These other ones are also interesting from security point of view.

> Add a new calling conventions with error return for library functions that avoids the use of errno

> Extend specifications for enum types

> Add a simple specification of nullptr, similar to C++


> Looking forward to this one, specially since Annex K ( Bounds Checking Interfaces) has proven to have not solved anything and is now scheduled for possible removal.

It has solved the usual bounds errors on those who use it. I.e MS, Embarcadero and safeclib. Since glibc, musl and freebsd libc refuse to use the Annex K it's easy to blame just them, not the safe extension, which do serve their goal.

With safeclib you even get compile-time checks, more than with the simple _FORTIFY_SOURCE=2 checks. Only Android Bionic got a bit better lately. safeclib is as fast or even faster than the fragile assembler bits in glibc.

> Add a new calling conventions with error return for library functions that avoids the use of errno

That's of course part of Annex K.

What's urgently missing is still a proper string API. ASCII str* goes nowhere, wide char wcs* is not widely used, 80% use utf8. wcslwr, wcsfc, wcsnorm are even missing. The whole u8* string API is missing, and not even discussed. ICU, libutf8 and libunistring don't get you far. coreutils still cannot handle non-ASCII strings.


Not at all, because it doesn't fix out of bounds errors caused by copy-paste errors where the given buffer size doesn't match the actual size of the declared buffer or string.

A situation which is described on the field report I linked to.

Only C++ compilers like Visual C++ which provide overloaded versions of Annex K without size parameter thanks to the improved C++ type system, do actually provide a real usable version of Annex K.

But then it isn't C anylonger.


> Not at all, because it doesn't fix out of bounds errors caused by copy-paste errors where the given buffer size doesn't match the actual size of the declared buffer or string.

That's wrong, check the implementation.

C++ only has advantages because there g++ is not as broken as gcc with constexpr, but clang since 5.0 is doing fine.

https://github.com/rurban/safeclib


As you mention on "Compile-time constraints" this extensions to Annex K require a supporting compiler.

It is perfectly viable to have a 100% ISO C99/C11 compliant compiler that will happily overwrite the target buffer, read more bytes than actually exist or search the whole memory block for a '\0' terminator, because Annex K does not require the existence of the compiler extensions used by your safeclib.


Right. Microsoft does it wrong.

But neither ISP nor POSIX require _FORTIFY_SOURCE neither which uses the same object_size (bos) CHECKS. The alloc_size builtin or the CPU supported bounds checks are not used at all with the major libc's.

Problem is not the Annex K, but the refusal to implement and use it. There's no problem with state as in errno or locale. The criticism is easily debunkable. It's outright NIH bias.


What I would like to see would be tagged memory like the SPARC has, which was put to good effect on Solaris, to become common across all major platforms.

Apparently at least Android might adopt it on ARM.

And do concede, maybe the extensions you have used, to actually become part of Annex K as well instead of deprecating it.

Because despite my dislike for C, I am fully aware that UNIX derived platforms will stay with us for the years to come, so we need to improve the current state of affairs somehow.


The linked discussion on the memory model is particularly interesting to me as it covers a number of issues and ambiguities with the current specification for strict aliasing and what kind of accesses should/shouldn't be allowed, including for example the problem that there's no way to obtain untyped space on the stack (Q75).

Since this as a whole is among the least consistently implemented and (arguably based on the number of questions it generates) least well understood aspects of the standard it's nice to see some authoritative efforts to clarify the intended behaviors.

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2294.htm


> the problem that there's no way to obtain untyped space on the stack (Q75)

As in a standard alloca?


Fun fact: alloca is neither in (any) C standard or POSIX. It is just a very old extension, implemented in many compilers with somewhat compatible, under specified semantics.


I am aware of that, which is why I said “a standard alloca” instead of “the standard alloca”. I’m asking if they’re proposing adding a version of alloca to the standard.


Very old indeed, Turbo C 2.0 for MS-DOS already had it.

Not sure about the Small C dialect from early 80's.


While you can't get untyped space on the stack, it is explicitly permitted for objects of character type to alias objects of other types. I don't see how effective_type_3.c has undefined behavior.


The aliasing rule isn’t symmetric: you can access objects originally declared as types other than char using char pointers, but you can’t access objects originally declared as char using non-char pointers. (And malloc()ed buffers are neither of the above and have their own rule.) Here’s a blog post explaining it in more detail:

https://gustedt.wordpress.com/2016/08/17/effective-types-and...


Wishlist: the arc4random family (would also put up with it being in POSIX), strlcat and strlcpy (yes, glibc, just for you), getline (fgets sucks), reallocarray and recallocarray, that signed integer overflow becomes at least implementation-defined rather than pure undefined behavior, ideally fix the notion of locales and guarantee (u)int8_t == unsigned char if (u)int8_t exists (lots of code actually relies on that).


getline is POSIX. If you want to see arc4random in POSIX, you should write to the Austin Group mailing list: austin-group-l@opengroup.org. They also have public biweekly meetings you can attend and present your case at.


getline is POSIX, but not C. I feel like the functionality is so fundamental to writing C as securely as the language allows that it ought to be promoted to being in the C standard.

arc4random is in complete POSIX limbo as far as I can tell. First started as a rename to posix_random[1] and then OpenBSD slammed the door shut by withdrawing it and now we're at risk of getting the worst of both worlds with Linux's getrandom[2].

[1] http://austingroupbugs.net/view.php?id=859

[2] http://austingroupbugs.net/view.php?id=1134


arc4random() and getrandom() are orthogonal. The former generates cryptographical pseudorandom data in-process, periodically re-seeding from an OS entropy source. The latter in an interface to an OS entropy source.


Yes, they are orthogonal. However, chances are POSIX can only be convinced to add one. And I'd rather they add the one that can't be misused by a programmer.


How does getline work when you don't have FILE?


FILE is part of the standard, see 7.21.3 and e.g. 7.21.5.3. Section numbers as per C11 latest public draft; I don't have two hundred bucks to spend on an actual copy of the standard.


u)int8_t == unsigned char if (u)int8_t exists

This alone will make C incompatible with its original platform, PDP/11 where char is either 7 or 9 bits.


No, char was 8 bits on the PDP-11, which was a 16-bit little-endian machine. (not middle-endian unless you count the typical compiler being weird with 32-bit values on the 16-bit little-endian hardware)

C didn't run on the PDP-7.

Being either 7-bit or 9-bit would be a choice for 36-bit hardware. Wikipedia's list of 36-bit hardware is: MIT Lincoln Laboratory TX-2, the IBM 701/704/709/7090/7094, the UNIVAC 1103/1103A/1105 and 1100/2200 series, the General Electric GE-600/Honeywell 6000, the Digital Equipment Corporation PDP-6/PDP-10 (as used in the DECsystem-10/DECSYSTEM-20), and the Symbolics 3600 series.


But the current C standard no longer allows a 7-bit byte anyway, as it mandates that CHAR_BIT >= 8.


Sorry, it should have been PDP-10, the point stays the same -still one of the founding architectures of original UNIX. PDP-11 used 16-bit words and 8-bit bytes.


For once I agree with the glibc developers: strlcat and strlcpy are bad interfaces.

Truncation can cause security bugs, and it is generally awful behavior. There is no silver bullet here: the programmer must not be a dummy about buffer sizes. Pretending otherwise is unhelpful.


> rather than pure undefined behavior

Even better (someone please correct me if necessary) why isn't the very concept of "undefined behavior" completely removed? What good does it accomplish in a language?


Allows the compiler to assume certain cases where undefined behavior would appear are unreachable and make optimizations based on trusting that those cases will not occur.


> guarantee (u)int8_t == unsigned char

You mean CHAR_BIT==8 as mandated by POSIX?


If uint8_t exists, then I believe the following ought to be true:

1. CHAR_BITS == 8, and

2. the compiler is required to make uint8_t an alias of unsigned char.

Point 2 is there so that you can cast to uint8_t safely, which some cryptographic code tends to rely on. If uint8_t is just any integer type, there's no safe cast to uint8_t due to strict aliasing rules[1]. That's what I want to fix. CHAR_BITS == 8 if uint8_t exists is just a collateral.

[1] https://cellperformance.beyond3d.com/articles/2006/06/unders...


It follows from the fact that sizeof (char) == 1 and CHAR_BITS >= 8 that sizeof (char) == sizeof (uint8_t), but strictly speaking I don't know that the standard guarantees that uint8_t is an alias of unsigned char and could as well be a distinct type.


I will just pray for the better error handling proposals.

Both are fine. For modern code with no global state, especially the multithreaded kind, everything is better than the errno.h mess.


Unlikely to happen. This isn't "how do we rethink C", there are plenty of other, better places for that to happen. It's how do we make incremental improvements that benefit us, but remain in the spirit of C and compatible with the enormous existing body of code.

In fact there's a Charter describing it better than I can:

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2086.htm


I read the Charter, yes.

Now, how do those error-related proposals violate it? Such as:

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2289.pdf

It seems to be very delicate about being backwards-compatible.


In fact, I think WG14 was in favour if the N2289 proposal.


I don't think "error handling" should be a thing in C. Errors are just data, not a special case.

errno.h isn't so much builtin to C. It's just part of the standard libary, and more so of POSIX. It's only used for interfacing with the OS. And it's not that bad, since errno is a thread local variable.


C is not a holy cow, is it? It's just a very, very popular language. Error handling is important in programming in general, and having ways to implement it conveniently helps a lot.

Yes, errors are just data. And everything is data, including the code itself, right? It a question of abstractions introduced to the language and the program at question.

errno.h is bad. It's inconvenient, it's global, even if it's a thread-local something, it's cumbersome to use.

AFAIK, there are 2 error-related proposals for inclusion. One is making errors special, the other is a bit more generic. Both would improve on the current situation.


You don't need to change the language. You need to add new functions to the standard library. There are already quite a few versions of older functions, with an "_r" suffix in their names (from the top of my head, strtok_r(), although that is not a great example) that take an explicit pointer to local state, eliminating global state. You could just make more of them if you want to eliminate errno.

> It a question of abstractions introduced to the language and the program at question.

I don't think these kind of abstractions are in the spirit of C. As it says in the Charter linked here, "provide only one way to do an operation".

A more acceptable way to me would be allowing multiple return values in general. But maybe there are technical reasons why this is not done. Or it has to do with the complexities that such a mechanism would add to the expression syntax.


The error handling proposal adds multiple return values- not as records/tuples, but as a sum type so only one can be returned from any given call.

This could certainly be made more general, or just done via tagged unions (the way multiple return values can be done via structs), but there are massive advantages to be had by building it into the language and calling convention.

The implementation can be far more efficient than any of a) the standard `int f(ret *out)` idiom, b) using errno, or c) some sort of `struct my_tagged_union f()` (which nobody uses anyway because it's a huge pain syntactically). In addition to being cheaper, the proposed built-in version of (c) is also syntactically simple enough to become a standard, shared mechanism.


> The implementation can be far more efficient

Really, really hard to believe. Not only since returning unions is already legal (isn't it)? What function do you have in mind where an explit error-return-pointer argument is not close to maximally efficient?


The first problem is that the error-return-pointer-argument approach must go through memory, because the caller can request that the return value go anywhere at all.

The second problem is that existing calling conventions for `struct { bool tag; union { .. }; }` put everything in memory anyway, using a hidden pointer argument. Further, there's no way to put this type in the standard library because C doesn't have generics.

The new implementation can put that single-bit tag in the CPU's carry flag where it has dedicated branch instructions and doesn't interfere with other values. It can leave the actual return value in a register without any kind of union aggregate lowering.

So far this is all just calling convention tweaks, and could be done by pattern-matching user-defined tagged unions, but building it into the language a) makes it possible to standardize its semantics and connect it to platforms' C ABIs so other languages can also participate and b) makes it far simpler to implement and use so it's actually likely to be adopted.


Why would it matter that the return-pointer write goes through memory? It's a handful of cycles. I'm curious what kind of function you envision that is so short-lived AND needs tricky error handling AND is extremely performance-sensitive.


All of them- it adds up. Calling conventions are a perfect place for this kind of microoptimization, because they apply pervasively and (assuming you want better error handling support) without additional change to program source.

The same reasoning applies to putting effort into register allocators, or switching from setjmp/longjmp to table-based unwinding, etc.


Small micro optimizations do not add up, especially not for something like error handling that does not concern most operations to begin with. Something like this error handling strategy clearly has its own cost in complexity of implementation, such that the whole thing will collapse under its own weight before you even notice a speed up.

You need to make sure that you keep the size and complexity of the language and its specification within reasonable limits. So you can't just add "all of them" with a blanket statement that they will add up.


I absolutely agree about the multiple return values in general.

Unfortunately the discussion around N2289 is more about providing something a lot like exceptions for both C and C++ but without all the exception problems.

The proposal really has some ugly corners related to how one actually handles the errors. You can read the discussion here:

https://www.reddit.com/r/cpp/comments/9owiju/exceptions_may_...


I'm being a bit nit-picky, but in C code is not a lot like data, no.

You cannot take sizeof a function, and you can't copy functions around, for instance. The address of a function is a value ("data"), but not the function itself.


Yeah... C is not Lisp, okay. :-)

But that only strengthens the point: language features are just abstractions we use to reason about data. One can always strip all the abstractions and end up working with raw bits.

If there's a useful way to think about certain kinds of data - it might be useful to codify that way as a language feature. Such as specialised error handling.


In Lisps, a function is also just a reference, and not the object itself. You can't take its size or copy it. (You could copy it if your dialect provided a copy-function function, of course. I don't think I've ever seen one. Such a thing could be useful if it provided a frozen copy of a closure's lexical environment, that would be unaffected by mutations when the original copy of the function is invoked.)


Okay, I will try to explain it to the downvoter. Suppose we have a lambda like this:

  (let ((counter 0))
    (lambda () (incf counter))
When we evaluate this we get a function. It contains the captured lexical environment. If we call that function, the captured counter variable mutates.

Now suppose we had a copy-function library function. I would expect it that if we apply it to this function, we get an object which carries a duplicate of the lexical environment. This means it has its own instance of counter. If we call the original function, the copied function's counter stays the same and vice versa.

I don't remember seeing such a copy-function in any Lisp dialect; there isn't one in ANSI CL. It seems it might be useful, same as the ability to copy a structure or OOP object.


If you could sizeof or copy a function by a simple assignment, how would you port it into a machine with separate code and data spaces?


With a crapload of special case code, nests of #ifdefs and a dollop of "we should have thought of that possibility at this level of abstraction" refactoring.


ISO/IEC C committee does not think like that. The people considering it portable assembler are long gone away from that committee. The current members consider it a mid-high level programming language with a defined abstract semantics.

Conditional compilation is just a compatibility workaround to them.


You cannot do these things while remaining in the realm of well-defined behavior emanating from requirements given by ISO C.


I haven’t read the proposals but I will comment anyways :)

I do like how Go does the posix interface where errors are handled as multiple return statements and cleanups using defer.


Error handling is best done with asserts which check for off-domain inputs and off-range outputs, and explode the running program if triggered.

I mean, security trumps convenience and uptime, right guys? Y'all do build your functions with domain and range in mind, right guys?


> I mean, security trumps convenience and uptime, right guys?

Not in a heart-lung machine, no.


errno is a̶n̶ ̶a̶b̶o̶m̶i̶n̶a̶t̶i̶o̶n̶ ̶t̶h̶a̶t̶ ̶s̶h̶o̶u̶l̶d̶ ̶b̶e̶ ̶k̶i̶l̶l̶e̶d̶ ̶w̶i̶t̶h̶ ̶f̶i̶r̶e̶ a failed experiment. That kind of error-reporting-via-side-effect not only is incredibly error prone, it is also problematic for optimizations.


You might want to familiarize yourself with how errno works.

It is not used for reporting. Rather, various functions in C and POSIX return an indication that some exceptional situation has occurred. Then errno holds a value which classifies the cause of that situation. It doesn't report the error.

(In the case of portable ISO C, we can't even depend on that; we must set errno to zero before calling most library functions. If that function fails, then if errno is nonzero, it has a value pertaining to that function.)

Some newer functions in POSIX return the errno value, like pthread_mutex_lock. That makes sense when the return value has no other purpose like "bytes processed" or whatever. The errno values are positive, so they combine poorly with in-band signaling.

Anyway, optimizing is possible based on the return value: like putting the expected success case into the hot code path or whatever.


Thank you, I'm aware how errno works. C has been able to return structs for the last 30 years, so returning a pair of value+error code is no big issue even in C.

Optimizing errno is so easy that many compilers by default do not even update errno for many math functions because it hopelessly destroy any chance of vectorizing and aggressive optimizations.


Your idea would seriously uglify code. If we want the function's result and the error, we have to capture the return value in an object. If we just do

  if (library_function(arg).result < 0) ..
we have lost the errno. For the complete analysis, we must now do:

  struct error;

  error = library_function(arg);

  // test error.result; if bad, work with error.num
Yes we have had struct returns (proper, on-stack ones) for more than 30 years, which means we could have had API's like this for 30 years.

I agree that errno for math functions isn't such a great fit. It's much better suited for I/O and resource management.


To be fair, the issue of ergonomics is an artifact of C only. In a language with proper sum types (or as I like to think of them, "better enums"), the pattern would look more like this:

    match library_function(arg) {
      Ok(num) => // success
      Err(errno) => // failure
    }
But since C doesn't have proper sum types, we have to approximate this pattern with things like errno or returned structs.


yes, the syntax is not great, but it is not really worse than errno. Especially because you can initialize the error object at declaration point and, as you pointed out, you would otherwise need to zero errno:

  struct error res = library_function(arg);
  if (res.result)
  {
     // use res.errno
     ...
  }
Also remember the whole thread is about improving the syntax for this sort of stuff. C could take (another) page from C++:

  auto [result, errno] = library_function(arg);
  if (result == -1)
  {
     // use errno ...
  }


errno is multithreaded.

It's just a macro for a function like:

  #define errno (*__get_errno_location())
This is equivalent to having getter-setter functions, like in Microsoft's Win32 API, which has GetLastError and SetLastError.


And it's terrible for performance, so much so that it can't be used everywhere errors may need to be returned.


>Add a type aware print utility with format specification syntax similar to python

Does anyone have more information about this proposal?


Will anything be done about this problem? https://github.com/mpv-player/mpv/commit/1e70e82baa9193f6f02...


This reads like a problem with the POSIX libraries and not with the C language itself, so it‘s probably not up to the language to fix it.

C actually comes pre-equipped in a pretty nice position with its default 8 bit char data type (which is perfect for UTF-8).

The POSIX standard however, I agree is in serious need of an overhaul for locale features and you‘re probably better off ignoring it completely and just going ICU these days anyways - though then you have to go the extra length to still use the OS provided means (LC_* env under Unix) to get to the user‘s preferred locale.

But once you have that, yeah, ICU is probably the way to go for actual string formatting.


Locales and the most basic functions around them are defined in the C ISO standards. (e.g. section 7.11 of http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf)


I hope they make real progress on improving the C language, not just slap on yet another library/header to the standard lib (cough.. <complex.h> ..cough).


C is a good language, but why does the standard library need to be in the language standard? It's perfectly possible for there to be competing modern libraries which implement everything from the stdlib which can be built completely from pure C source. Suppose I build libvortico, which had proper string handling, threading, filesystem, etc. There would then be no need to deal with stdlib quirks. Just use that as an alternative.


You're perfectly welcome to build your own library and put whatever you want in it.

The reason there is a "standard library" is to make portability possible.


But portability has nothing to do with whether the library is standardized. One could port any API, whether from the standard or a simpleton like me, and port it in pure C and OS calls to any environment. One could even port a better API to wrap the stdlib calls, to add support for at least what the standard offers.


Your argument goes through just as well for the base language itself. What's the point of having a C standard at all? Since you could just port your compiler to any platform...


The point of a language standard is to have multiple compilers and to be able to write code to some standard. The difference between these points is that with a "stdlib" alternative library, you can still have multiple compilers and write library function calls without following the standard, but simply the library's API. The reason I believe the stdlib shouldn't be a standard is for the same reason that libpng, libjpg, libzip, etc. aren't part of the C standard. It should be the software vendor's choice of which standard library to use.


' char8_t: A type for UTF-8 characters and strings, see N2231 ' What does this mean ? C11 already brought the u8 type


char8_t is the type, u8 is just a prefix for constants, like u8"xxx"; And it's only in C11++, not C11 AFAIK.


You are right, this is only for string litterals (so read only).But Wikipedia C11 mention the feature, what do you mean by c11++ ? Btw c17 added 0 feature.


u8 exists for string constants, but not character constants.

they're proposing to add u8 as a character constant prefix.


Next revision: Please stop kowtowing to recalcitrant compiler vendors too lazy to invest in updating their code.


What about concurrency?


There was a lot of work done on that in the C11 release with updates to the memory model and thread.h. I have heard some criticisms of thread.h APIs which could be addressed, such as the ability to specify stack size.

There are C++2x proposals to add a green thread/goroutine style mechanism. I'm not convinced that it's something which belongs in the C core language or C standard library though.


C is older than me, but it still has so much things to improve. And I am very glad that it is still being actively improved.


Previous attempts to doing this have introduced unnecessary ugliness to the language such as the possibility to mix declarations with code. I can only see the language getting worse by introducing new pointless features. Unnecessary bloated C already exists and it is called C++


By "mix declarations with code", do you mean the ability to put local variables like `int x;` in locations other than immediately at the start of a function? I don't understand why this is a bad thing.


(In C you can have local variables inside any block, e.g. 'if' or 'for'.) It's subjective, of course, but I find it much clearer to have a dedicated place to look for variables.


This, and especially in the first argument of 'for'. Not back compatible, very inelegant and luckily discouraged in many projects' coding guidelines.


Actually it is block, not function. It makes it very difficult to read the code if you are looking for the scope of the variables, or investigating stack usage.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: