This just goes to show how hard it is to create a API to last 30 years. So much has changed since then, but libc needs to keep the same API. Things like Unicode and the internet (which brought the need for security to forefront) have come into popularity since then, but you can’t fix the old functions.
So one should focus on creating versioned API IMO. Nobody does that for some reason. I should be able to declare my program to use v2023 API and consume libraries which use v1975 API.
No, one should focus on creating smaller independent modules and don't mix unrelated stuff like networking fundamentals with string convenience functions.
Are you familiar with linker hell? I agree in theory with your suggestion, but not for C or C-like languages which have such a poor story for linking compilation units.
That doesn't work in languages with nominal types, and even structurally-typed languages can have a hard time.
You call a function that returns a v1975/SomeStruct and pass it to a function expecting a v2022/SomeStruct. What does the new function do when some of the struct fields are set to invalid values?
What if the type was a "Some1975Struct" that got passed? Then that would give a huge blaring error because it's not the same type as a "Some2022Struct".
The naming convention isn't important. If the structs are from different versions of the library then you either have a type error (nominal types) or have to figure out the semantics of arbitrarily uninitialized fields.
I think you may have lost track of what this thread is discussing.
To be more concrete, let's say you have the following code (pseudo-Go):
package main
import "example.com/http_server" // v2
import "example.com/utils" // transitively depends on http_server v1
function main() {
var srv *http_server.Server = http_server.New()
utils.ServeHttp(srv)
}
You're going to have a problem, because ServeHttp expects an HTTP server struct from the v1 API, and you're passing in a struct from the v2 API.
If you tried to write an adapter for this then (1) you'd need to somehow construct a concrete proxy of a type you don't control, and (2) there's no way to import both the v1 and v2 APIs at the same time in your code, so you don't have a way to declare a conversion function.
That's why APIs with versions declared by dependency edges are impractical.
... where "build system" means a bunch of preprocessor macros, at which point you just use the oldest version of everything available and never update anything that works.
1. Every source file must be annotated with language version.
2. Every language version can remove things from previous language versions. More like "hide" I guess, but you can't compile source file that's using removed features or APIs. So to migrate to new version you're supposed to change sources. Of course preferable using some migration tools, but that's out of scope.
3. You can access old API indirectly. For example if you're using library which uses old API and returns struct from that old API.
4. There should be some well thought rules for situation that you describe. To make structs forward and backward compatible as much as possible. May be even to provide some implicitly running migrations to convert between structs with different versions.
5. If there's no way to automatically convert v1975/SomeStruct to v2022/SomeStruct, you can't do that and need to convert it manually.
This is hard problem and must be thought on every level: data layout compatibilities, ABI compatibilities, type system compatibilities. But I'm not convinced that it's unsolvable problem. And if solved it would provide great benefit allowing lots of freedom and agility for language development.
It really doesn't, though, although in principle Editions could provide an escape hatch it would be so drastic as to be largely unthinkable.
Because of Rust's commitment to long term stability all of the standard library is there forever, even if it's a mistake and thus deprecated. std::u8::MAX will be in the library forever, even though you can write u8::MAX to get the same constant, name.trim_left_matches(remove) will exist forever even though you ought to write name.trim_start_matches(remove) because "left" assumes an LTR writing system.
If NonZeroI64 is a bad idea, too bad it's in the standard library forever. If AddAssign is a bad idea, too bad it's in the standard library forever. The language syntax is allowed to evolve via Editions, but the standard library never breaks backward compatibility.
Whereas C++ versions are each distinct languages which aren't necessarily interoperable, Rust Editions promise to interoperate and this is used all over the place. As a result the Editions can only "really" touch syntax, how you in some sense spell Rust, not what it means.
Essentially Rust 2015 Edition and Rust 2021 Edition are actually exactly the same language except that the spelling is different, and as a result there are some things you can spell in one that you don't have a way to spell in the other. So in Rust 2015 Edition I can name my function async, because I want to, in Rust 2021 Edition that's fine, but it's written "r#async". Same function, same name in a sense, but different spelling. On the other hand, if I have an actual async function in Rust 2021 Edition, I can't write that in 2015 Edition at all - there's no way to spell the async keyword in that edition.
The library however, is mostly semantics, not syntax. We care about what it does, not how to spell things on the whole.
But the point is that there's no reason why editions couldn't touch semantics, too.
There's no reason why they couldn't add a new attribute
#[available_in_editions(2018, 2021)]
that could allow them to actually remove deprecated functions, structs, trait, macros, whatever in newer editions. Code written for edition X would still compile and work, but code written for edition Y wouldn't be able to use things marked as unavailable.
So, with this hypothetical attribute, the thing isn't gone, but it doesn't compile any more, whereas with the current situation the thing also isn't gone, and you get a warning (unless you told the compiler to forbid this rather than warning, in which case it doesn't compile).
This is identical to a [really_deprecated] attribute. We know it's deprecated but kelnos wants to force us to use a separate crate to use it for some reason.
Hey, don't get me started on semantics... what if you trim from the left or right, but are using RTL. Imagine your surprise when the exact opposite happens.
Well, that and also a showcase of just how BAD the design of the C stdlib string functions was, and continued to be with every iteration where someone introduced a new bad-and-poorly-thought-out API to replace the old bad-and-poorly-thought-out API.
It still boggles the mind how a function that leaves a C string potentially unterminated ever even made it into the standard...
> Easy to say with 50 years of hindsight. These are the pioneers we are talking about here.
Eh, well kinda. Languages with decent string handling predate C, so it's not like there wasn't precedent to follow. The creators of C were pioneers, but they were also people who favoured a quick, hacky approach over a clean careful one. That has certain advantages, but it's rather unfortunate that C has become so foundational and we've been stuck with those hacks.
What would those languages be? I'm only used to relatively modern languages with "decent string handling", but they almost always treat strings as opaque high level objects with dynamic memory allocation under the hood. Such an approach wouldn't exactly fit into the C philosophy.
Also, once UNICODE is added to the mix (which involves a lot more than just the text encoding), a decent string processing library isn't exactly trivial, it either needs a very big chunk in the stdlib, or a a handful of specialized 3rd party libs).
Even in Zig, which has a very decent modern low-level approach for handling string data, people used to high level string objects would probably be shocked at how 'inconvenient' it is to work with string data (which can be solved with specialized libraries though).
Not in the way that is described here. I took "always treat strings as opaque high level objects with dynamic memory allocation under the hood" to mean that the string would have operations, like append, that would reallocated/resize the string if needed. C strings are definitely not that.
While strcpy is forgivable considering it came from the primordial soup, there is no excuse for strncpy since it was designed to solve the problems with strcpy. And the string library is full of gaffes like this.
No, strncpy is not "designed to solve the problems".
strncpy is a function to fill out fixed size data structures such as some on-disk data structures, with a variable length string it right pads the structure with null bytes.
It does exactly what it was supposed to do, something that was pretty useful in 1970s UNIX programming but rarely if ever what you need today.
The "needs to" is debatable. Compared to the C++ stdlib, the C stdlib is so small that adding a modernized and incompatible "v2" next to "v1" is realistic. The effort could start as a 3rd party implementation similar to MUSL.
The old headers with the old APIs would still exist for "legacy code" but would generate "deprecated" warnings.
Once that new "3rd party stdlib" has proven itself in the real world, the C commitee might consider it for inclusion in the standard.
No, at the very least I would add a (reserved) stdc_ prefix to all stdlib functions (and defines, and header filenames...), and keep options open for API versioning (e.g. stdc2_...).
That way we could also get rid of all the random reserved identifiers we need to (theoretically) adhere to now.
> As with volatile, C is using the type system to indirectly achieve a goal. Types are not atomic, loads and stores are atomic
I don't get this argument. This same wording can actually be used to make pointers untyped: "pointers are not inherently typed, loads and stores are typed. (In fact, LLVM IR recently made pointers untyped too)". And yet, clearly there's value in having them typed at the language level. Similarly here, there's clear value in having them part of the type system, if only to prevent a footgun of someone forgetting to mark a load as atomic/volatile.
These type qualifiers cause the opposite foot gun. The programmer confidently writes compound assignments, which look like they're atomic/volatile thanks to the type qualifier but of course they are not.
With intrinsics you don't have that problem. You just can't write the compound assignment - it doesn't exist, whereas in C it is just quietly compiled to two separate operations.
C++ 20 deprecated this nonsense, but at almost the last opportunity WG21 voted to un-deprecate for C++ 23.
I'm not a big fan of restarting the discussion from the big Reddit argument on the same topic[1], but as I understand it: from embedded (or at least some developers') POV, this is a non-concern. Volatile never implied atomic in regards to interrupts, and pretending it should is wrong. On some platforms, single volatile load/store (whether `*ptr = 1234`, or `volatile_store(ptr, 1234)`) already can compile to "two separate operations" that can be interrupted in the middle. With that in mind, you are already supposed to be aware of that and execute such operations in no-interrupt contexts, and having this also apply to compound assignments is no more of a footgun than any other operation on such memory.
(and if you do not need to care about the above for your platform/use-case, then I don't see why you would care about whether compound assignment compiles to one, two or more discrete opcodes)
Not to mention, `REG |= 0x4` is just too entrenched (and seen as idiomatic) on some platforms.
> With that in mind, you are already supposed to be aware of that and execute such operations in no-interrupt contexts
It's certainly possible embedded C programmers have told you this, but, imagine if this was actually true - you mustn't touch these MMIO registers unless interrupts are disabled. But, wait, how do we turn off the interrupts? That's an MMIO write, which supposedly we mustn't do until the interrupts are switched off...
The paper mentioned in that Reddit post isn't actually what ended up happening at Kona by the way, though I assume the Redditor didn't know that. The paper's authors weren't able to produce any evidence at all that this is used correctly in practice (e.g. a survey of 100 microcode C++ projects which use compound assignment showing that yup, no correctness bugs here), and they could only explain how it might be used correctly for some bit ops, so their paper just un-deprecates the bit-ops. As a result x /= 23; would have remained deprecated on volatile x, a small piece of sanity.
After all your micro-controller might (some do, some don't) have a single CPU instruction which atomically clears bit six of I/O register 0x3B but it's fair to say it definitely doesn't have a CPU instruction which somehow atomically divides that register by 23. Because nobody needs that.
However, at Kona WG21 voted (though by the smallest margin at the event) to undo the whole deprecation. So you can write x /= 23 in C++ 23 without even a warning that this isn't sane.
At Kona the committee also was very exercised about EU and US agencies pointing out that writing more C or C++ is a terrible idea because these languages are unsafe. Surely - several prominent WG21 members harumphed - it's wrong to treat C++ the same as C on this issue. And yet, on this relatively trivial issue of volatile compound assignment, keeping C++ consistent with obsolete C that might not even exist was considered to trump safety considerations at the same meeting, with the same people.
As to REG |= 0x4 what you probably should look for are actual intrinsics for your platform, which do only and exactly what the platform can actually implement, rather than offering the "Eh, we'll just muddle along and maybe it'll work" approach these C SDKs have today. This is less error-prone, and can often be more efficient.
It's also very wrong for volatile: the compiler has to handle a volatile 'type' differently than regular types: you cannot 'cache' the value in a register, memory must be updated.
I understand that wm4 was probably a pretty difficult person to work with, but he got stuff done well and from the safe distance of my chair he was a rather entertaining fellow. The other mpv devs can't miss him but I kinda do.
When Go showed up on the scene, I was just about losing my patience with libc, and one if the things I found miraculous about Go was "no more strtok and friends".
> If you would like to see interesting innovation, check out what Cosmopolitan Libc is up to. It’s what I imagine C could be if it continued evolving along practical dimensions.
I found that sentence weird because when I checked Cosmopolitan, it included most if not all of the functions this article was claiming shouldn't be used.
It's not as strange as you might think. Back in the classic Macintosh era (pre-OS X), it was quite common to write programs in C which essentially ignored libc. It was not a Unix system, so C's filesystem API made for an awkward fit, and the system's native callback-based IO was more efficient anyway. For similar reasons Mac programmers had little use for C's string functions or its allocator. Nor was there any terminal, so printf had little value.
I never spent any time in the early DOS/Windows world, but from what I heard similar coding patterns were found there.
I learned C on the Amiga, and similar patterns were often followed. Example: we generally used the platform-specific AllocMem instead of malloc, even though a mostly complete C library implementation was available. One reason was AllocMem gave you finer control of what sort of memory was allocated (Amiga had "chip" memory, "fast" memory, and other oddities I now forget.)
For many things there are much better versions of the functions available in the operating system. For example on macOS you really should be using Core Foundation for strings, dates, etc.
In a cross-platform application you can create a platform layer that wraps OS API usage and provides a POSIX fallback for the platforms you don't directly implement.
>Without libc you don’t have to use this global, hopefully thread-local, pseudo->variable. Good riddance. Return your errors, and use a struct if necessary.
On contrary you should absolutely use errno, if only to report your IO errors.
Or maybe he never does IO. After all a program that doesn't output anything is likely safer, since these are all evil side effects. I can just see that new programming paradigm: "side effect free programming" taking over the world in storm.
Just based on that gaffe I'm not sure I can take anything else in that post seriously. It's clearly not based on any practical experience.
It's not a gaffe, it's moving C in a direction that's safer and more like modern languages by removing mutable global state. That's what this means: "Return your errors, and use a struct if necessary."
If you don't understand why errno is bad then it'll be hard to explain it.
Errno is fine if you're accessing it right after a stdlib function that sets it, because it's the equivalent of a function's error code. Once you get out of that context errno becomes less and less useful and probably shouldn't be used, because without that context you won't know who/what actually set it.
“Previously both POSIX and X/Open documents were more restrictive than the ISO C standard in that they required errno to be defined as an external variable, whereas the ISO C standard required only that errno be defined as a modifiable lvalue with type int.”
Even now that it is guaranteed to be thread-local, it still is a bad API because of (same page)
“A program that uses errno for error checking should set it to 0 before a function call”
That library functions don’t reset errno is considered a feature / convenience, it allows calling a bunch of them then checking if the entire thing succeeded or failed.
Obviously this assumes the failure of one function does not trigger a UB down the line, and that you care about general but not specific failure. And also obviously this is easy to replicate with an API which returns error objects / codes.
IIRC this is most(ly) convenient in graphics APIs, I don’t know how common leveraging this in the libc actually is.
It also requires code to explicitly set errno many, many times. I think there are many errors lurking in C code there, with code calling foo, not checking errno or checking it but not resetting it, later calling bar, and assuming errno was set by bar.
> It also requires code to explicitly set errno many, many times.
Yes, errno should be cleared every time you want to check it. Although you can check it multiple times with a single clear, as long as a set errno leads to an exit.
A more likely cause if the issue you outline is that errno is not context-local tho, so if you call something which causes errno to be set you might get an errno from an unexpected error. This is similar to overly broad exception contexts, but much harder to diagnose.
You won't find that on either ISO C nor POSIX specifications.
It only happens to be implemented that way in some environments, as a macro to either thread local or some function call that retrieves the right errno.
Still, even that doesn't take care about signal handling.
That's a total strawman. Errno has nothing to do with signal handling at all. While errno is archaic, it is a completely sound design.
That you need to be very careful in signal handlers is common knowledge, and obvious from what they do; signals handlers as a first approximation behave like executed in a separate thread, but even worse since they hijack a running thread and thus block the hijacked thread from executing the previously running code. (So you could say they're like fibers / cooperative multitasking?)
The bottom line is, you can't touch errno in your signal handler, obviously -- just like you can't printf() or any other thing that has "critical sections". And since you don't do that, you're safe.
BTW2: I noticed on that page that errno is explicitly mentioned, but obviously only as an example of modifying thread-local data in a signal handler. As the docs point out, any such modified data must be restored before leaving the signal handler as to not pending code running on that thread. So in that sense, thread-local variable like errno are "safer" for than functions like printf() that take locks -- you _can_ touch them in a signal handler but you need to restore them.
So, what does matter? What is your actual statement? That you can't use errno in a multi-threaded environment? That you can't use errno in the presence of signal handlers?
What exactly is your complaint about the GNU link? As far as I can see it more or less regurgitates what POSIX has to say about async-signal-safety.
You made some vague statements that a mechanism that is used on billions of devices is fundamentally unsound, while providing no evidence other than maintaining that it would break in conjunction with signal handlers -- when signal handlers are expressly to be used with caution, no matter what you intend to touch.
If you worry that you can't safely touch errno in a signal handler (provided that you reset it) on some broken platform from decades ago, because a spec like this [0] is too much swallow (understandably so, since this spec is a futile attempt to reconcile all the relevant history into a useful document), then I have a solution... Don't touch errno in a signal handler, which is probably solid practical advice anyway.
Oh, and how do you reconcile the fact that many syscalls are declared async-safe by POSIX that may set errno themselves?
"Operations which obtain the value of errno and operations which assign a value to errno shall be async-signal-safe, provided that the signal-catching function saves the value of errno upon entry and restores it before it returns."
Where "shall" is usually understood as equivalent to "must". Are you going to cite me one reliable source for the adventurous claims you're making all the time, or are you just going to continue putting out blatant falsehoods and strawmans?
> As for soundness, it is C we are talking about here.
Well the only thing we're doing here is trolling, nothing of substance has been put on the plate so far.
Still doesn't cover signals, only works if C11 threads are being used (it is all open if OS threads follow the same TLS mechanism), and those C89 and C99 code bases get nothing from it anyway.
You can call any system function that is marked as async-signal-safe. Whether that's a great idea is a different question.
> Don't expect a compiler error, this is C, where the programmer knows best.
As for this snark, these considerations are pretty much OS and concurrency stuff that transcend any implementation language [0]. No one prevents you to switch back to single-threaded cushy OOP-y Javascript, however note that someone has to implement and run that for you.
[0] yes, signals aren't beautiful, and in 2023 a beautifully designed OS might better do delivery of asynchronous messages using event handling and/or using dedicated threads. However, that's computing history for you, and it can be worked with. If you don't like it find a different platform. C doesn't really require signals, and if you use Unix with a different language, signals won't just go away.
The main issues with errno are you need to remember to reset it before you enter a context whose failability you care about, so any code which relies on errno must be:
errno = 0
// code which may error here
if(errno) { … }
And that errno is notably not scoped, so the code which may error should only be composed of calls to libc, or code whose interaction with libc you understand perfectly, otherwise you need to save and restore errno around uncontrolled calls.
> The main issues with errno are you need to remember to reset it before you enter a context whose failability you care about, so any code which relies on errno must be:
> errno = 0
> // code which may error here
> if(errno) { … }
Which are the functions that set errno, and only errno, on failure?
In practice, you will clear errno once, and then repeatedly check for failure after every call to a libc function.
> And that errno is notably not scoped, so the code which may error should only be composed of calls to libc, or code whose interaction with libc you understand perfectly, otherwise you need to save and restore errno around uncontrolled calls.
Consider you are trying to do a foreign function call from an interpreted language. You make the call and then want to check errno to see what the error was, but you don't know what precise C calls the runtime may have made in the meantime, or whether it might have reset errno. The only reliable thing to do is to mark library functions explicitly as modifying errno, and storing that somewhere else so that you can reliably retrieve that value.
It's a pain, and if it's not done correctly by everyone then it leaves you with subtle intermittent bugs.
> Lots of APIs allow reporting of I/O errors without use of an errno-like construct.
libc included--the C11 threads API returns some errors directly using the constants thrd_busy, thrd_nomem, and thrd_timedout. Unfortunately, it also uses a catch-all constant, thrd_error, for other errors.
The pthreads API returns errno values directly as return values, but that's POSIX, not C.