> Any C alternative will be expected to be on par with C in performance. The problem is that C have practically no checks, so any safety checks put into the competing language will have a runtime cost, which often is unacceptable. This leads to a strategy of only having checks in "safe" mode. Where the "fast" mode is just as "unsafe" as C.
I don't think this is true, in the general case: Rust has shown that languages can be safe in ways that improve runtime performance.
In particular, languages like Rust allow programmers to express stronger compile-time constraints on runtime behavior, meaning that the compiler can safely omit bounds and other checks that an ordinary C program would require for safety. Similarly, Rust's (lack of) mutable aliasing opens up entire classes of optimizations that are extremely difficult on C programs (to the extent that Rust regularly exposes bugs in LLVM's alias analysis, due to a lack of exercise on C/C++ inputs).
Edit: Other examples include ergonomic static dispatch (Rust makes things like `foo: impl Trait` look dynamic, but they're really static under the hood) and the entire notion of a "zero-cost abstraction" (Rust's abstractions are no worse than their "as if" equivalent, meaning that the programmer is restricted in their ability to create suboptimal implementations).
Not to mention that both C++ and Rust can specialise algorithms and containers for specific types, whereas in C most developers resort to void* and function pointers. It's not unusual to see C programs written in a "typical" C style become dramatically faster when rewritten in a more modern language.
For example, typical C programs also don't use hashtables even when this makes the most sense, causing weird performance cliffs due to O(n^2) algorithms all over the place. Why not hashtables? Because they're not generic, so they're a pain to use. Not impossible of course, it's just that C developers avoid them.
Similarly, "strongly typed" containers full of simple struct types enable compiler auto-vectorisation that's often unavailable in C for the same kind of reason.
Last but not least, you would have to be a masochist to write heavily multi-threaded code in C... so hardly anybody does. These days, that's throwing away over 90% of the computer power in a typical PC, let alone a server.
It's not unusual to see C programs written in a "typical" C style become dramatically faster when rewritten in a more modern language.
This was equally true back when C vs Fortran was the big debate, and something not easily captured in benchmarks. C, as written by an expert in high performance C, was equally fast as Fortran written by an expert in high performance Fortran. C, as written by a domain expert with limited programming skills, was often very much slower than Fortran written by a domain expert with limited programming skills.
This actually reminds me a bit of an old competition between two Microsoft MVPs comparing C++ and C#, where they went back and forth optimizing their respective versions of a model program, and discussing the optimizations they made.
The gist, as I recall it was: the initial, idiomatic, written-for-maintainability version of the C# program was significantly faster than the C++ equivalent. Up until the end, the C# version also generally needed to go through less heroics to keep up with the C++ version. Eventually, the final C++ version did end up being faster than the fastest C# one, but, considering what needed to be done to get there, it was a decidedly Pyrrhic victory.
One huge mitigating factor, though, is that the model program was doing something business-y. I doubt C++ would have had such a hard time of it if it had been a number crunching or systems program.
So, one of the things they discovered as part of the back and forth was that C#'s generational garbage collector was actually an advantage. Because it made finding memory for a new object allocation O(1), while for C++ it was O(N).
That observation was actually key to the C++ version ultimately producing the fastest version. Chen replaced malloc() with an implementation that was tailored to the problem in question.
I guess the thing that I always find lacking in these discussions is a cost/benefit analysis. Yes, C++ will let you do things like that, and they will absolutely allow you to wring every last drop of performance out of what you're doing.
But, if you aren't in a situation where optimizing to that extent is cost-effective, and you're working in a business domain where frequent heap allocation of short-lived objects is what you do, so that idiomatic, standard C++'s default way of doing things is known to generally be not significantly better, and often slower, than some of the faster GC languages, then it's just possible that that you should go for the pragmatic option.
Precisely. This is perhaps the strangest part of the original post: C++ has the same performance advantages as Rust! It has them not because it's more safe (although it is, in some regards), but because it allows programmers to express behaviors that the compiler can reason about statically.
Rust's assumption is that it's the compiler's job to reject all wrong programs (usually with a helpful diagnostic). In C++ the assumption is that it's the compiler's job to permit all correct programs.
You obviously ideally want both, but that's not actually possible when you have a language this powerful. So, Rest's choice means sometimes (more rarely these days but it can happen) you will write a program that is correct, but the compiler doesn't believe you and rejects your program, you will need to alter it, perhaps after alterations it's actually nicer, but equally perhaps you feel this made it uglier or slower, nevertheless you have no choice in Rust (well, you could try waiting a few years, the compiler gets smarter)
However the C++ choice means sometimes (maybe even often) you will write a program that isn't correct and the compiler gives you no indication whatsoever that there's a problem, you get an executable or object file or whatever out, but what it does is completely arbitrary. Maybe it works how you expected... until it doesn't.
The magic phrase in the C++ standard is "Ill-formed, no diagnostic required". For example suppose you try to sort some floats in C++ 20. That's ill-formed (floats aren't in fact Totally Ordered but the function signature says you promise they are) and no diagnostic is required for... whatever it is your program now does. Maybe it crashes, maybe it works fine, not their problem, good luck with that.
Now, probably if all your floats are like boring normal finite reals like -2.5 or something this will work fine, there's no practical reason it wouldn't, but who knows, the C++ language denies all responsibility. So it gets to be very "optimal" here since it can do whatever it wants and it's your fault.
To expand on your float sorting example, sorting a slice[1] in Rust requires the element type to implement the Ord trait, i.e. be totally ordered. Trying to sort a slice of floats will result in a compiler error, even though it might be totally fine as long as all your floats are "ordinary".
Instead, to sort a slice of floats, you have to explicitly specify what would happen for the non-ordinary cases; e.g. by using `.sort_by(f32::total_cmp)`, where f32::total_cmp()[2] is one possible interpretation of a total ordering of floats. This requires writing more code even for cases where it would be completely unnecessary.
So rather than introducing a hard to detect bug (with NaN, Inf, -Inf), Rust makes me think about it and not just let whoever worked on compiler decide.
How is this a negative? I'd rather program fail at compile than runtime, and rather it fail loudly than quietly.
Also Rust doesn't prevent you from making optimal ordering, just a tinge more verbose.
I also like this priority in Rust, which constantly makes me wonder why the developers allowed shadowing. It has already caused runtime bugs for me while the compiler didn't even throw a warning about it, and as Rust is otherwise so strict about making possible mistakes like this explicit it's definitely not the first cause I consider when debugging.
While I think shadowing is great for code readability and I've never encountered a bug caused by it, you can always make sure clippy doesn't let you do it by putting a `#![deny(clippy::shadow_reuse, clippy::shadow_same, clippy::shadow_unrelated)]` at the top level of your crate.
Like proto I've never had this happen, even though I was initially sceptical until I found myself writing stuff like (real examples more complicated hence decision to break them down)
let geese = something(lots_of_birds).bunch().of_chained().functions();
let geese = geese.somehow().just().count_them(); // We don't actually need geese, just #
Could you name that first variable something else? Yeah. But, it's geese, it's not the number of geese, it's a different type, but it is just geese, that's the right name for it. OK, maybe rename the second variable? But number_of_geese is a stupid variable name, I would push back on a patch which tried to name a variable that because it's stupid. n_geese isn't stupid, but it is ugly and Rust is OK with me just naming it geese, so, geese it is.
However, if you do run into trouble those Clippy rules can save you. You probably will find you don't want them all (or perhaps any of them) at deny, but Rust is content for you to decide you only want a warning (which you can then suppress where appropriate) and importantly these are three rules, you might well decide you only hate shadow_same or shadow_reuse or something. Here's the link specifically for shadow_reuse as an example:
Safety features. The committee are, perhaps unconsciously, biased against safety on the presumption (seen in many comments here on HN) that safer has to mean lower performance.
But part of the impetus for Carbon is that WG21 (the C++ Standards Committee) rejected proposals that C++ should focus on better performance and safety. So maybe performance is no longer important either. What's left?
Where they've taken things which might appear on the surface to be modelled on a safer Rust feature, usually the committee insists they be made unsafe. For example suppose I call a Rust function which might return a char or might not, it returns Option<char> and if I'm an idiot and I try to treat that as a char, it doesn't type check because it isn't one, I need to say what I'm going to do when it isn't or else that won't compile.
You can write that in modern C++... except it can automatically try to take the char (which isn't there) out of the empty optional structure and that's Undefined Behaviour. So whereas the Rust prevents programmers from making easy mistakes, the C++ turns those into unexploded bombs throughout your code.
Many on the C++ committee are interested in the borrow checking, but are not sure how to make it work in C++. The hard part is they cannot break compatibility with code that is legal with previous versions of C++. If there is even one pathological case where the borrow checker will reject code that doesn't have a memory leak then they will not accept it, and require whoever proposes this borrow checker to prove the absence of such a thing. (note if it rejects code that worked until the leaks mean you run out of memory they will accept that). I don't know if such a thing even exists, but if it does I'm confident that in Rust it is new code that you can write differently to avoid the bug, while with C++ that may be a very massive effort to figure out 25 year old code nobody understands anymore before you can rewrite it.
One obvious corner case: It is very common to allocate a buffer at startup and let the system clean it up when the program exits. (often this is embedded cases where the only way for the program to exit is power off). I don't know how you do this in rust (if you can - I'm not a rust expert)
> allocate a buffer at startup and let the system clean it up when the program exits
This is possible with the lazy_static library or (not yet in stable Rust) OnceCell. It allows you to allocate & initialize any datastructure once during runtime and get global read-only access.
And C++ has the potential to be faster than C, mostly thanks to metaprogramming (templates, ...). It is horrible if you have to do it, but if you are just using the standard library, you don't have to feel the pain but still take advantage of it. That's how algorithms are implemented. Because so much is known at compile time, optimizers can do a lot.
The reason C++ is generally regarded as slower is that C++ programmers tend to create objects on the heap all the time because constructors and destructors make it easy. Modern C++ also discourages raw pointers and so you get references counters all over the place, essentially turning C++ into a garbage collected language. I am not saying it is bad, but it certainly impacts performance.
But if you manage your memory in C++ just as you do in C, keeping track of all your buffers reusing them, and not using more than necessary, I can easily see C++ beat C.
> Modern C++ also discourages raw pointers and so you get references counters all over the place, essentially turning C++ into a garbage collected language.
This doesn't match my experience. It's true that modern C++ discourages owning raw pointers, but the solution is usually unique_ptr, not shared_ptr. Truly shared ownership is actually pretty uncommon IME, usually you can have one entity which obviously "owns" the object and then any other reference to it can be non-owning.
It's also worth noting that with std::move, actually changing the refcount of a share_ptr can be pretty rare even if you do have shared ownership.
This is not my experience. Most developers are just not very good at what they do, and the go-to smart pointers for not-very-good C++ developers is std:shared_ptr<T>.
This has been my experience as well - especially when C++11 came out. I have seen codebases where it has been "use std::shared_ptr for everything, becuase it is safer if/when we use threads". I know that doesn't make sense, but it just was the attitude back then.
Tbh, Back then, I didn't see a problem with it. Once i started chasing down weird bugs where objects aren't freed properly because no one knew which objects own what, I have been very cautious.
Hmm, that might be. Most of the C++ I've seen has been in LLVM, Google projects, projects where I'm the only developer or projects where I laid the groundwork which other people build upon, so I'm probably not mainly looking at the kind of code bases you're talking about.
unique_ptr is pretty bad for performance as well. It is more complicated to use compared to raw pointers and encourages an OOP object-per-object piecemal code and data architecture. I've never seen a C++ program making use of unique_ptr that didn't give a strong smell of enterprise programming.
There's nothing more complicated about using unique_ptr than a raw pointer, it just expresses who's responsible for calling `delete` explicitly in code rather than implicitly through program flow.
There's nothing complicated? You have to 1) #include <memory> 2) Write "std::unique_ptr<My_Foo_Type> foo" instead of just "My_Foo_Type *foo" in every definition. 3) Are required to define My_Foo_Type as a class with a separate deleter, or provide a deleter template argument at each declaration. 4a) write "foo.get()" in various places instead of just "foo". or 4b) lend around the unique_ptr in various places, breaking modularization and increasing build times. 5) Be stuck with a non-POD type that you can't just memcpy() around. 6) enjoy the worse runtime because your program has just been artifically compartmentalized even more!
Sometimes you C++ guys are just blinded by the tale of "zero-cost abstractions".
unique_ptr, like the idea of RAII in general, binds together what should be separate. Data schemas and physical layout on the one hand, and memory and lifetime management on the other hand.
What you get as a result is what you deserve: The idea of "more safe and maintainable" where the "more" isn't added to the equivalent non-RAII program. No, it is added to the more convoluted, less understandable, and thus inherently less safe and maintainable program. Who knows what the bottom line is (in my experience often safety is a bit better but I pray for you if you need to debug a problem, and maintainability is much worse), but out of interest in my own sanity I know my preference.
I really don’t see what the big deal is? Generally the only time you should be returning or passing around a unique_ptr is when you’re actually transferring ownership of the referenced object. Otherwise just dereference it and pass around a reference to the underlying object.
I'm not following, what is Rust doing exactly? Coupling schema / layout with lifetime management? If that's what you mean I would like to disagree about the "great results" because of a gut feeling, and possibly the disagreement could in theory be justified with build times, or viewpoints on maintainability or whatever. But unfortunately I have no basis for doing so. I don't understand Rust well. And have very little experience, expect failing at compiling some projects and their 500 dependencies a couple times...
Use correctly std::unique_ptr<T> has no measurable impact on performance compared with the equivalent non-smart-pointer code. You use std::unique_ptr<T> to indicate ownership, and pass raw pointers around to indicate non-ownership. That approach has the strong smell of a good programmer using the right tool for the job, especially considering the job is to communicate intent to the future reader.
It's like the classic argument against using exceptions: compared with the traditional C method of completely ignoring error conditions and not checking status, they're much slower.
> Use correctly std::unique_ptr<T> has no measurable impact on performance compared with the equivalent non-smart-pointer code.
One wart of unique_ptr (and other smart pointers) is that it cannot be passed in a register when used as a function parameter, at least with the System V ABI used on Linux.
Also, the caller is responsible for destruction and there is no way to specify that a function always "consumes" a unique_ptr so the compiler cannot eliminate the destructor code: https://godbolt.org/z/sz79GoETv
Of course if the compiler can inline the call or at least controls both and can clone the function with a custom calling convention then that doesn't have to be a problem. But it still sucks that even something as seemingly simple as a tiny wrapper around a pointer does come with a cost.
That's the point. As a rule of thumb, fine-grained ownership is a very bad idea. It makes your program into a mess, which will be slow and make your program hard to understand. The slow part applies in any case, whether you have to suffer it in code (as you do with C) or not (as in many other languages that allow you to make even more of a mess).
As a C programmer, I try to avoid tracking ownership in separate struct member fields. I try to make central data structures that keep care of the tracking. Cleaning up shouldn't happen pointer-by-pointer. Usually a much bigger context has a shared lifetime, so there is no point in splitting stuff up in individually tracked "objects". Instead you just track a bigger block of memory.
> unique_ptr is pretty bad for performance as well.
Do you mean in terms of cache locality because it's heap-allocated instead of stack-allocated, or are you actually commenting on the overhead of copying some extra ints and invoking the destructor?
Because it's certainly correct that just because you can use a unique_ptr, doesn't mean you should. ("A std::unique_ptr is used for expressing transfer of ownership. If you never pass ownership elsewhere, the std::unique_ptr abstraction is rarely necessary or appropriate." - https://abseil.io/tips/187)
Safety is a good reason. I like protection against leaks and use after free. If I’m already allocating I’m not going to worry about the little bit of extra performance cost the abstraction might have.
To be clear: I'm not advocating for the use of `new` / `delete` over unique_ptr. But if you're creating a local object that never has to leave the current scope (or a member variable that's never moved in or out), there's no benefit to using a unique_ptr instead of creating the object directly on the stack or inline as part of your class, where the object's lifetime is bound to the scope, and your destructor is automatically run when its containing scope is cleaned up.
As an added bonus, you don't actually have to do a separate heap allocation.
I agree! You should use a regular object if possible, I’d never suggest otherwise. The rare exceptions I’ve run into are annoying initialization order issues (usually code that I didn’t have the time/knowledge/political credits to refactor) and large arrays that blow the stack.
As of C++17 not so horrible, and C++2x versions even less so, unless one has some strange fetisch for SFINAE and tag dispatch.
Since 1993, I never saw any need to keep bothering with C other than having it imposed on me, C++ had enough C89 subset on it, if I ever miss coding like C and its warts.
> Nowadays that compatibility is up to C11 subset.
Not true unfortunately, the "C subset" is still stuck at something that can at best be called a fork of "C95" which was then developed into a "bastard language" that resembled C on the surface, but isn't actually C (e.g. the incomplete designated init support in C++20 is the best example of this half-assed "looks like C, but isn't actually C" philosophy).
> It's not unusual to see C programs written in a "typical" C style become dramatically faster when rewritten in a more modern language.
On the other hand, empirically, it is not unusual to see straightforward C programs being dramatically faster than comparable C++ programs written in enterprise style, and to also build much faster.
> Last but not least, you would have to be a masochist to write heavily multi-threaded code in C
You have to be a masochist to write heavily multi-threaded code that uses a lot of ad-hoc synchronization with mutexes and atomics. As it turns out, for many many tasks, it's also a spectacular bad way to go about parallelization, because mutexes are the _opposite_ of parallelization.
As a rule of thumb, do coarse-grained concurrency. Install a few queues, come up with a job system, and it won't be hard to get parallization right in plain C at all. Writing in C is often a good idea because what's a bad idea to do on hardware coincedes pretty well with what is painful to write.
> On the other hand, empirically, it is not unusual to see straightforward C programs being dramatically faster than comparable C++ programs written in enterprise style, and to also build much faster.
Your only comparison cases are cases where the code in question was re-written in C. This most likely means that everyone already knew it was slow and so the re-write also fixed the fundamental problems. If the code had been rewritten in C++ it would also be faster - and since C++ allows some optimizations C doesn't it would be even faster. (it is known that if you switch from gcc to g++ your code often will run faster if it compiles)
There is a reason for enterprise style C++. Most of the time it is still fast enough, and it is a lot more maintainable.
> it is known that if you switch from gcc to g++ your code often will run faster if it compiles)
I've never heard such a claim, can you back it up? And what does it say about the language?
> and it is a lot more maintainable
If you equate "maintainable" = readable, I've never once seen maintainable enterprise code. Everything is a convoluted mess that never gets anything done. Probably I haven't worked at the best shops, but then again, where are those? And why doesn't the language help mediocre programmers to write maintainable code?
I suspect that maintainability is almost exclusively a function of experience, not the programming language used. Experienced programmers do seem to agree that C-style C++ or even plain C is the way to go.
https://www.codeproject.com/questions/445035/c-vs-cplusplus-... has a long discussion. The short answer is C++ has stricter aliasing rules, and so the compiler can apply more optimization. This of course assumes that your C code is also valid C++ code (C is no a pure subset of C++), and you don't have those aliases - those apply to a lot of C programs but not all.
> And what does it say about the language?
C++ has a stronger type system. This is already known. You avoid a few bugs in C++ because of this. The type system isn't nearly as strong as Haskell.
> I've never once seen maintainable enterprise code. Everything is a convoluted mess that never gets anything done
Two sides of the same coin. While the code is convoluted, it often is doing a lot of things in a generic way. More straightforward code is possible, but only by created a lot more code, and quantity is itself convolution.
> And why doesn't the language help mediocre programmers to write maintainable code?
It does. However you have to be careful here. C++ is often used for very large problems that are also complex. I would never use python for something that is over 100,000 lines of code as you can't change anything anymore for fear that some case isn't covered in the tests and so you won't see that syntax error until months later. I maintain 15 million lines of C++ (and this isn't the largest C++ codebase I know of).
Not, I'm not arguing that C++ is a great language. It has a lot of inconsistencies, and foot guns. However it is still the best language I know for very large, very complex programs. (Note that I do not know ADA or Rust, two that often come up in context of very large, very complex programs. I would not be surprised if they are better. That C++ is better known that others is itself an advantage to C++)
> I suspect that maintainability is almost exclusively a function of experience, not the programming language used.
Sort of. As I said before languages like Python are out of the running for very large programs because they are not compiled and so you can get runtime errors. There are also intentionally impossible to write languages that we can throw out even sooner. However there are for sure other languages that can play in the very large program space. So long as we limit ourself to languages that play in the very large program space, experience is the largest factor.
> become dramatically faster when rewritten in a more modern language
IME that's mostly a myth though. A C compiler will stamp out a specialized version just as well if it can see all the relevant function bodies (either via inlining or LTO).
"Zero cost abstraction" isn't just a C++ thing, it happens mostly in the language agnostic optimizer passes. For instance the reason why std::sort() shows up faster in benchmarks than C's qsort() is simply because std::sort() implementation is all inline template code, not because of some magic performance-enhancing qualities of the C++ template system.
inlining only goes so far. You won't get full of qsort to be inlined, and if it's not inlined, it needs to be at least cloned to be on par with std::sort, so the comparator function could get const-propagated.
AFAIK out of the major compilers, gcc has the most aggressive cloning, but it's still nowhere near to const propagate the comparator from qsort. With std::sort with a stateless comparator function object (such as std::less, which is the default), you get this for free*.
* of course this is not entirely free, as this is more prone to code bloat. But, you can always type-erase the comparator, and use a function pointer, or std::function, if this ever becomes a problem. But you can't convince a C compiler to const propagate the comparator in qsort all the way through, if the optimizer chooses that it doesn't worth it.
glibc qsort's implementation is in libc.so, not in the header. GCC doesn't have anything to work with.
It's also an apples-to-oranges comparison, since std::sort and qsort implement different algorithms.
A lot of std::sort's performance is actually from using the version without any callbacks. If you pass a comparator function which just compares two integers the obvious way, it gets much slower. So one of std::sort's biggest advantages is actually not that it uses templates, but that it's specialized for the common case of not needing a custom callback. Theoretically the compiler should make the two cases the same, but apparently GCC is too dumb (that's not a slight on GCC; I think people expect too much from compilers):
external_sort is just std::sort hidden behind an extern function implemented in a separate .o file. Those benchmarks are from sorting 1MB of random and already-sorted data (as indicated in the names). I think it's important to test such cases, because often online people benchmark code which is written all in a single file, whereas real-life C++ projects are usually organized in such a way that every little class is in its own little file, which gets compiled into a separated object file, and then it all gets linked together without LTO. And then those same people go on to claim performance benefits of their language without actually using the setup which enables those benefits, which IMO is a bit dishonest.
When I drill further down into everything I want to drill into, maybe I'll publish the source for the benchmarks somewhere.
> If you pass a comparator function which just compares two integers the obvious way, it gets much slower. So one of std::sort's biggest advantages is actually not that it uses templates, but that it's specialized for the common case of not needing a custom callback.
This is not true. `std::sort`'s default comparator is a `std::less` object. The advantage comes from using a stateless callback functor object. If you pass a capture-less lambda instead of a function pointer, you can reap the same benefits as using the default comparator. Even if that capture-less lambda just forwards to a specific free function anyway.
In short, `std::sort(beg, end, [](auto x, auto y) { return foo(x,y); })` can be faster than `std::sort(beg, end, &foo)`.
Interesting but I'm not sure about the relevancy to the above comment.
On a sidenote, it has weird claims:
> obviating the need for ownership type systems or other compiler approaches to fixing the type-safety of use-after-frees. This means that we need one heap per type, and be 100% strict about it.
C lets you do most of what C++ can if you rely on always_inline. This didn't used to be the case, but modern C compilers will meat-grind the code with repeated application of the following things:
- Inlining any always_inline call except if it's recursive or the function uses some very weird features that libpas doesn't use (like goto pointer).
- Copy-propagating the values from the callsite into the function that uses the value.
Consequently, passing a function pointer (or struct of function pointers), where the pointer points to an always_inline function and the callee is always_inline results in specialization akin to template monomorphization.
This works to any depth; the compiler won't be satisfied until there are no more always_inline function calls. This fortuitous development in compilers allowed me to write very nice template code in C. Libpas achieves templates in C using config structs that contain function pointers -- sometimes to always_inline functions (when we want specialization and inlining) and sometimes to out-of-line functions (when we want specialization but not inlining). Additionally, the C template style allows us to have true polymorphic functions. Lots of libpas slow paths are huge and not at all hot. We don't want that code specialized for every config. Luckily, this works just fine in C templates -- those polymorphic functions just pass around a pointer to the config they are using, and dynamically load and call things in that config, almost exactly the same way that the specialized code would do. This saves a lot of code size versus C++ templates.
qsort only isn't inline, because libcs don't supply an inline definition. If you write your own qsort, then you'll see it getting inlined and/or function cloned for different types.
The only real difference between qsort and std::sort in terms of code generation, is that for std::sort the default assumption is to function clone and for qsort it is to generate the full slow function. Now the compiler will in most cases detect that qsort can be cloned or inlined, but sometimes it might decide not to and the fallback is, in most cases slower then the C++ fallback.
PS.: I'm just annoyed that my generic C hashtable that is written in a qsort style doesn't get function copied/inlined when it's used for more than one type.
Gonna beat a dead horse here, but >50% of PCs that are surveyed by Steam have 12 threads or more.
That’s PCs that have steam installed at all.
Intel’s bare minimum current-gen i3 processor has 12 threads. That’s the absolute cheapest desktop-level processor you can get.
Your phone probably has 6 cores (though not 12 threads).
So yes, if you’re writing code for desktop hardware, it’s safe to assume you have at least 8 threads. Maybe you don’t want to consume all of them, but it’s better to let the OS handle scheduling.
Gaming is very much not representative. There's roughly 120M active steam users, vs. ~1.4 billion windows installs.
If I look around me, for instance in my whole family we're two with Steam installed but ever household has a desktop or a laptop (and generally a 7-8 years old cheap entry-level 350€ one, you'd be hard-pressed to find even a quad-core in there)
It's half past 2022 and the most sold laptop here in France, 7th-ranked in GDP, has 8 gigabytes of RAM and 4 cores. This is what the real world looks like. (and just a year ago it was still 4GB of RAM iirc)
That does not mean not making use of multiple cores of course, but a software should still be able to work on a single-core. Right now we only have certifications such as https://www.blauer-engel.de/en/productworld/resources-and-en... (see https://www.umwelt-campus.de/en/research/projekte/green-soft... for the methodology) but hopefully in a few years we can start making it first heavily discouraged and over time less and less viable to create resource-wasting software - in any case this is a thing I am asking of the people whom I vote for :-)
Thank you! Please keep pushing such certifications until they become regulations that, like GDPR, even we American developers cannot ignore. Then I can make a strong business case to move away from Electron in the product I'm currently working on.
Edit to add:
Related to your links to best-selling computers, I've been thinking about downgrading to a low-spec PC as my daily driver, and using a remote machine for the times that I truly need something powerful for a big compile or the like. That would force me to feel the users' pain. But how far should I go? Taken to the extreme, I could use a machine with a spinning rust hard drive (not SSD) and the bare minimum system requirements for Windows 10 or 11, and keep all the crapware on it to more accurately reflect the typical user's environment. But then, maybe I'd just be hurting myself for no benefit, since the pressure to value developer productivity over runtime efficiency would not actually go away in the absence of regulations.
I’m not advocating making software multithreaded only, since obviously that doesn’t make sense.
But, in many modern languages (including c++) multi threading
1. Doesn’t significantly detract from the performance of single core systems
2. Can massively improve the performance of multi core systems, even with 2 cores or more.
For appropriate applications, the memory overhead and the cost of the bootstrapping code for instantiating a worker thread should be dwarfed by the time of actually computing the task (we’re talking about actions 100ms or longer). Not using multiple threads when you could reasonably half or quarter that time (without needing to drop support for single-core systems) is just foolish. If you’re that worried about single core performance then maintain two code paths, but at least recognize that the majority of commodity systems sold today, including the ones you listed, have multiple threads available to them to do the work that have the most painful wait times.
> Related to your links to best-selling computers, I've been thinking about downgrading to a low-spec PC as my daily driver,
my rule of thumb for the software I develop is - on my desktop computer (2016 intel 6900k, still plenty powerful) - there mustn't be any slowness / lag in any user interaction when built at -O0 with -fsanitize=address. This has ensured so far that said software had correct performance on optimized builds on a Raspberry Pi 3 in ARMv7 mode.
> Article 3(2), a new feature of the GDPR, creates extraterritorial jurisdiction over companies that have nothing but an internet presence in the EU and offer goods or services to EU residents[1]. While the GDPR requires these companies[2] to follow its data processing rules, it leaves the question of enforcement unanswered. Regulations that cannot be enforced do little to protect the personal data of EU citizens.
> This article discusses how U.S. law affects the enforcement of Article 3(2). In reality, enforcing the GDPR on U.S. companies may be almost impossible. First, the U.S. prohibits enforcing of foreign-country fines. Thus, the EU enforcement power of fines for noncompliance is negligible. Second, enforcing the GDPR through the designated representative can be easily circumvented. Finally, a private lawsuit brought by in the EU may be impossible to enforce under U.S. law.
[snip]
> Currently, there is a hole in the GDPR wall that protects European Union personal data. Even with extraterritorial jurisdiction over U.S. companies with only an internet presence in the EU, the GDPR gives little in the way of tools to enforce it. Fines from supervisory authorities would be stopped by the prohibition on enforcing foreign fines. The company can evade enforcement through a representative simply by not designating one. Finally, private actions may be stalled on issues of personal jurisdiction. If a U.S. company completely disregards the GDPR while targeting customers in the EU, it can use the personal data of EU citizens without much fear of the consequences. While the extraterritorial jurisdiction created by Article 3(2) may have seemed like a good way to solve the problem of foreign companies who do not have a physical presence in the EU, it turns out to be practically useless.
"Patching" that hole seems to require either action on the American side or, perhaps, a return to old-fashioned impressment or similar projection of Majestic European Power to Benighted Lands Beyond the Ocean Sea. /s
The EU can fine US companies the same as it can fine most other extraterritorial companies, that is only if the other country allows it. The EU is not going to start an armed invasion over a GDPR violation.
Still big multinational companies will have international branches (Google, Amazon, Microsoft, ...) that can easily be fined in their host countries.
The EU can also prevent companies from doing business in the EU if they don't follow the local laws. No need for an armed invasion if the EU can block all transfers from EU banks for anything related to your company.
I think GP was referring to enforcing GDPR against companies that do not do business in the EU (no employment, no sales, no bank account, no revenue, no taxes, etc.).
For example, a company like Digital Ocean might have no assets of any kind in the EU (assuming that they don't own their European datacenters), so the EU cannot force them to pay a fine nor seize their assets; the EU could technically sanction them by stopping EU datacenter providers (like AWS-Germany) from renting compute to Digital Ocean, but maybe not for something like a GDPR violation.
You should always write software for the worst performers. Unless you have a very good reason not to. Writing for the top performers is how we got into the silly mess where computers from 30 years ago have much higher ux then now.
If we were arguing about designing vehicle safety testing suites for the worst performers (a very real problem that we have right now) we wouldn’t even be having this conversation.
Writing multithreaded applications increases the performance ceiling. If an application can’t take use of multiple threads, but is written in a multi-threaded way, there’s no harm done. It simply runs the multi threaded code in a single threaded way (think of ParArray) with a bit of overhead incurred for “becoming multithreaded”.
Reasoning out of adding multithreaded support for long running actions because “most systems can’t take use of the extra threads” is just irrational, especially since most modern commodity systems could have a linear improvement with the additional threads.
The single core systems are barely hurt by the memory overhead involved with provisioning CORE_NUM of worker threads. But the multi core systems can take massive advantages from it.
I don't disagree with your specific point here, it's easy to dynamically allocate threads based on system cores. But I disagree that you should write your code for a medium speced system.
That’s what debate’s about. I do recognize that caring about single threaded workloads and performance do contribute to snappier UI (and backwards compatibility).
This article doesn't say that, what it actually says is "Over 70% of Steam users have a CPU with 4 or more cores."
Steam doesn't even measure publicize information about threads on the survey, which makes it near impossible to check because not that long ago Intel locked out hyperthreading/SMT on their low/mid-grade CPUs.
Additionally, and more importantly: the Steam hardware survey _obviously_ doesn't represent the average consumer PC.
The fact remains that virtually all systems except perhaps old low-end phones now have more than one thread. Not going multi-thread for anything that makes the user wait leaves significant performance on the table.
Low end systems (4 threads or less) have less potential, but they also have the most need for speed, making multi-threading quite important. And high-end systems have more threads, so going multi-thread makes a bigger difference.
I'm about to buy a PC with 16 cores and 32 threads for "normal" money.
The AMD EPYC server CPUs scale to dual sockets with 64-cores each, for a whopping 256 hardware threads in a single box. That's not some sort of esoteric hyper-scale configuration, but completely ordinary off-the-shelf stuff you can get in large quantities from mainstream vendors.
A single-threaded application on a server like that will use between 0.5% to about 50% of the total available performance, depending on where its bottleneck is. It will never reach 100%!
This matters to things like CLI tools, batch jobs, and the like, many of which are written in C, especially in the Linux world. A case-in-point that demonstrates how much performance has been left on the table is ripgrep, which is a multi-threaded Rust replacement for grep.
Today, it's debatable, but if we're talking about programming languages for the future then the future is what's relevant. I don't think it will be long before 50+ thread CPUs are common. Multithreading won't be a nice-to-have feature, it will be a necessity.
They're mixing parallelism and concurrency. (nb: I might be abusing these terms too)
Parallelism aka CPU-bound tasks are limited by the number of cores you have. Concurrency aka IO-bound tasks are not, because they're usually not all runnable at once. It can be faster to go concurrent even on a single core because you can overlap IOs, but it'll use more memory and other resources.
Also, "going faster" isn't always a good thing. If you're a low priority system task, you don't want to consume all the system resources because the user's apps might need them. Or the the user doesn't want the fans to turn on, or it's a passive cooled system that shouldn't get too hot, etc.
And for both of them, it not only makes it easier to write bugs in unsafe languages, but in safe languages you can easily accidentally make things slower instead of faster just because it's complicated.
Using his distinction, concurrency isn't about IO-boundedness (though that's a common use-case for it), but instead is about composing multiple processes (generic sense). They may or may not be running in parallel (truly running at the same time).
On a unix shell this would be an example of concurrency, which may or may not be parallel:
$ cat a-file | sort | uniq | wc
Each process may run at the literal same time (parallelism), but they don't have to, and on a single core machine would not be executing simultaneously.
A succinct way to distinguish both is to focus on what problem they solve:
> Concurrency is concerned about correctness, parallelism concerned about performance.
Concurrency is concerned about keeping things correct[1] when multiple things are happening at once and sharing resources. The reason why those problems arise might be for performance reasons, e.g. multiplexing IO over different threads. As such, performance is still a concern. But, your solution space still involves the thread and IO resources, and how they interleave.
Parallelism is in a different solution space: you are looking at the work space (e.g. iteration space) of the problem domain and designing your algorithm to be logically sub-dividable to get the maximum parallel speedup (T_1 / T_inf).
Now, a runtime or scheduler will have to do the dirty work of mapping the logical subdivisions to hardware execution units, and that scheduler program is of course full of concurrency concerns.
[1] For the sake of pedantry: yes, parallelism is sometimes also used to deal with correctness concerns: e.g. do the calculation on three systems and see if the results agree.
I'm not sure it's fair to say C developers avoid hash tables - I've worked on several projects with hash-table implementations in them.
The 'problem' if there is one, is that such things are rarely picked up from any sort of standard library, and are instead implemented in each project.
I'm also not really sure what the problem is with 'resorting' to void*, it's part of the language. It's not 'safe' in that the compiler won't catch your errors if you stuff any old thing in there, but that's C.
> you would have to be a masochist to write heavily multi-threaded code in C
pthreads makes it relatively straightforward. I've seen (and written) fairly sophisticated thread-pool implementations around them.
C noob here.
Why isn't a hash table implementation merged into the c standard library? Is it because the stdlib has to be as thin as possible for some performance reason or something?
Yeah C doesn't really go in for that sort of thing. The standard library tends to be much more about some minimal support for strings and interfaces to OS features like files, signals, memory allocation etc. It doesn't really provide much in the way of building blocks to be reused by application developers.
The recommendation out there on the net seems to be to look at Glib, which is used by gtk, for that sort of thing.
I used this way back in 2001-3 for a multi-platform project because it provides some good platform abstractions, and it looks like it has a hash-table implementation in amongst its other features.
How was doing C - is it a rewarding career? What did u move to?
Sorry for randomly asking this I'm contemplating moving from Ruby/Go to C because doing web for so long gets old...I'm not feeling like I'm deepening my knowledge anymore.
Honestly I'm happier where I am now, which is generally writing http APIs and cryptography related code in Java (with bits of Go and python thrown in).
Development in C is slow, fraught with unexpected pitfalls and you end up writing an awful lot of stuff from scratch. While this is satisfying in some ways, I find the modern paradigms of the languages I work in now to be more fulfilling - yes you throw a lot of existing components together, but you also get to deliver functionality much more frequently.
There are also a lot of very old-school C shops out there, that don't believe in modern things like CI/CD, git, even automated testing. I'm sure there are good ones too, but there are a lot of dinosaurs in that arena. One of the last ones I contracted for (for about three weeks until I quit) responded to my usual first-day question of "OK, how do we build this product?" with "Oh don't worry about that, we can sort that out later" and later never came.
That all said - I really enjoyed working on a couple of embedded devices. Seeing what you can achieve with 128kB of SRAM and 256kB of flash is challenging, and since I was a kid I've enjoyed making the computer do stuff. With embedded devices that have buzzers, leds, little displays etc, you get immediate feedback. And having to think more about things like memory allocation strategies does influence your thinking in (I think) a good way. You can definitely gain some deep knowledge!
Do you think experience holds better the lower you go down the stack?
Part of my frustration with web development - especially front end, is knowledge decays very fast there. I'm fine with learning new stuff but relearning the same thing all the time and losing my edge is a big annoyance.
So part of my wanting to move lower down the stack is my belief that my knowledge and experience will hold up better there. So I'm considering either that or moving to backend work writing something like Java which I also perceive to be a very good investment. .
"void* and function pointers" behaves essentially the same as templates, assuming the compiler inlines or function clones the function called with constant expression arguments.
It depends on what you're trying to do. In general, marshaling everything through void pointers is possible, but it'll cost you in terms of both bug surface (it's much easier to make mistakes when you've opted out of C's weak types) and performance (you now have at least one mandatory pointer indirection, which is especially egregious when your underlying value type is something simple like an integer).
Anything you can do in C++, you can do in C. But C++ compilers will generally optimize semantically equivalent code better than C compilers will, because C++ gives the compiler more freedom.
Another perfectly good solution is to treat a C hashtable as a sort of acceleration index. That C hashtable then, rather than holding pointers simply holds `hashkey -> i` where i is the position into a typed array.
I.e. your data structure is like this:
generic_hashtable Hash; // (hash->int)
Foo *values; // where my typed data is.
Using void* means the compiler (almost certainly?) can’t see through it to optimize. More importantly, it looses you type safety and the self-documentation that flat_hash_map<K, V> gives you.
My favorite example of a safety feature in Rust which also improves runtime performance is string slices, which are implemented as a pair of a pointer to the first character and the length (used for bounds checking). Not only does this avoid having to scan for the NUL terminator to find the string length (that is, O(1) instead of O(N), which can make the difference between an O(N) and O(N^2) algorithm), but also it allows taking substrings without copying or modifying the original string (also helped by the borrow checker allowing the programmer to safely omit defensive copies of strings).
C has a huge performance burden from 0 terminated strings. Programs are constantly running strlen() or equivalent to get the length. Length-delineated strings, like what D has, are an order of magnitude faster.
Exactly! And because you can't create a substring without mutating the original value, you see that C code often needs to resort to unnecessary copying as well.
No. You can't create a zero-terminated substring other than a proper suffix or a buffer copy. But that's not really a surprise right?
Well then, don't use zero-terminated strings for proper string processing. You don't have to use zero-termination, even when some programmers in the 70s and 80s were convinced enough of it that abominations like strtok() landed in the standard.
Idiomatic is the wrong word here, because it's certainly not idiomatic to do extra allocations when unneeded. Most APIs let you give the length explicitly if it makes any sense. A not very well-known fact is that even printf format strings let you do printf("%.*s\n", 3, "Hello") which only prints "Hel\n". This is in POSIX C, just not sure when it was standardized.
_There are_ programs that are constantly running strlen(). C strings are the default builtin string representation that has an acceptable tradeoff for performance vs space and simplicity for where they are used: Mostly in string literals, which are expected to be small strings. Mostly for printf() and friends. Zero-terminated strings are space efficient and don't allow bike shedding like length-prefixed strings do. And don't get us started about allocation strategies.
"A magnitude faster" for doing what? Typical usages of zero-terminated strings are performance-uncritical. And note that zero-terminated doesn't preclude using separate length fields.
Sane programs use store length of strings explicitly where strings get longer and/or performance is a concern, just as it is the case with other types of arrays.
> that has an acceptable tradeoff for performance vs space and simplicity for where they are used
Is it? I've been programming strings for 45 years now. Including on 8 and 10 bit machines. All that space efficiency goes out the window when one wants a subset of a string that isn't a common tail.
The simplicity goes out the window as soon as you want a substring that isn't a common tail. Now you have memory allocation to deal with.
The performance goes out the window because now the entire string contents has to be loaded into the cache to determine its length.
> length-prefixed
Are worse. Which is why I didn't mention them.
> Sane programs use store length
Meaning they become length-delineated programs, except it's done manually, tediously, and error-prone.
Whenever I review C code, the first thing I look at are the strlen/strncpy/str** sequences. It's almost always got a bug in it, an off-by-one error.
Again, I'm not saying you should represent substrings, or strings in general for that matter, as zero terminated strings, and I'm not saying use zero terminated strings for anything longer than a couple bytes.
No, I recommend everyone to use whatever fits the situation best. It might be a 2 byte start index and a 1 byte length fields that expresses the length as a multiple of 12 bytes. It might be rope data structure. Or it might be whatever. "String" is not a super well defined thing, and I don't understand why everybody is so super concerned about a canonical string data type. String data types are for scripting languages. 99% of my usage of string (literals) is just printf and opening files, and C does these just fine.
Zero terminated strings are only a default thing for string literals that does indeed bring a little bit of simplicity and convenience (no need for a builtin string type and the associated bike shedding, and only need to pass a single pointer to functions like printf).
> Meaning they become length-delineated programs, except it's done manually, tediously, and error-prone.
Not sure when is the last time I found it "manually, tediously, and error-prone". There are very rare cases where I have to construct zero-terminated strings from code, or need to strlen() something because of an API. And even when these cases occur they don't bother me at all. Stuff just works for me generally and I'm moving on. I have probably 500 stupid bugs unrelated to string handling before I once forget a zero terminator, and when that one time happens I just fix it and move on. On the plus side, given that we're in C where there are no slice types, zero-terminated strings spare me to pass extra length values for format strings or filepaths.
Sometimes I envision being able to use slices but I have some concerns if that would be an actual improvement. Importantly it should be about arrays and not just about strings. Strings are arrays, they aren't special.
I think a good design for slices could be one whose length can never be accessed by the programmer, but which can be used for automated bounds checks. Keeping size/capacity/offset and 43 cursors into whatever buffers separate is actually correct in my view from a modularization standpoint, because "String <-> Index/Size/Offset etc." isn't a 1:1 relationship.
> Whenever I review C code, the first thing I look at are the strlen/strncpy/str* sequences. It's almost always got a bug in it, an off-by-one error.
You will have to look quite a bit to find strlen() or strncpy() in my code. I'm not advocating for them, and not advocating to build serious string processing on top of zero-terminated strings.
D doesn't have a builtin string type. A string in D is an array of characters. All arrays are length delineated.
> You will have to look quite a bit to find strlen() or strncpy() in my code. I'm not advocating for them, and not advocating to build serious string processing on top of zero-terminated strings.
Rolling your own string mechanism is simply not a strength of C. The downside of rolling your own is it is incompatible with everyone else's notion of how to avoid using 0 termination.
I haven't even suggested to roll your own "string" type. Not more than rolling any other type of array or slice. In my programs I normally do not define a "string" type. Not a central one at least. Zero-terminated strings work just fine for the quick printf() or fopen().
Instead, I might have many string-ish types. A type to hold strings in the UI (may include layout information!), a type of string slice that points into some binary buffer, a rope string type to use in my editor, a fixed-size string as part of some message payload, a string-builder string that tries to be fast without imposing a fixed length... Again, there is little point in an "optimized" generic string type for systems programming, because... generic and optimized is a contradiction.
Any length delineated string you're using, and you did say you were using length delineation, suffers from the problem of not being compatible with any other C code. There's a good reason operating system API calls tend to use 0 terminated strings.
If you want to do a quick debug printf() on it, well, you could use %.*s, but it's awkward and ugly (I speak from lots of experience). Otherwise, you gotta append the zero.
I'm not a C newbie. I've been programming C for 40 years now. I've written 2 professional C compilers, the most recent one I finished this year. When I started D, a major priority was doing strings a better way, as C ranks among the most inconvenient string processing languages :-)
Sure, I know who you are but I hold opinions too :-)
I don't care about having to provide zero-terminated strings to OS and POSIX APIs, because somehow I almost always have the zero already. Maybe I'm a magician.
Sometimes I have not, but >99% of what I give to printf is actually "text", and that pretty much always has the zero anyway. It's a C convention, you might not like it, but I don't sweat it.
If I want to "print", or rather "write", something other than a zero-terminated string, which is normally "binary data", I use... fwrite() or something analogous.
> C ranks among the most inconvenient string processing languages
I've written my share of parser and interpreters (including also a dysfunctional toy compiler with x64 assembler backend, but doesn't matter here), so I'm not entirely a stranger to this game either.
I find parsing strings in C is extremely _easy_, and I find it in fact easier than say in Python where going through a stream of characters one-by-one feels surprisingly unpythonic.
Writing a robust, human-friendly parser with good error reporting and some nice recovery attributes is on the harder side, but that has nothing to do with C strings. A string input for the average parser isn't even required, you just read char by char, frankly I don't understand what you're doing that is hard about it. It doesn't matter one bit if there's a zero at the end or not.
The inconvenience and inefficiency is apparent when building functions to do things like break up a path & filename & extension into components and reassemble them. You wind up, for each function, dealing with 0 termination or length, separately allocated or not, tracking who owns the memory, etc. There's just no satisfying set of choices. Maybe you've found an elegant solution that never does
a defensive copy, never leaks memory, etc., but I never have, and I've never seen anyone else manage it, either.
I agree filepath related tasks are ugly. But there are a number of reasons for that that aren't related to zero termination. First, there is syntax & semantics of filepaths. Strings (whatever kind, just thinking about their monoidic structure) are a convenient user interface for specifying filepath constants, but they're annoying to construct from, and disassemble into, filepath components programmatically (relative to how easy I think it should be). Because of complicated syntax and especially semantics of components and paths, there are a lot of pitfalls. Filepath handling is most conveniently done in the shell, where also nobody has any illusion about it being fragile.
Second, you're talking about memory allocation, and this is arguably orthogonal to the string representations we're discussing here. Whether you make a copy or not for example totally depends on your specific situation. The same considerations arise for any array or slice type.
Third, again, you're free to make substrings using pointer + length or whatever, and this is in many cases the best solution. I could even agree that format strings should have better standardized support for explicit length, but it's really not a pain point for me. I'm only stating that zero-terminated is an acceptable default for string literals, and I want to stress this with another example: Last time you were looking at a binary using your editor or pager, how much better has your experience been thanks to NUL terminators? This argument can also extend to runtime debugging somewhat.
> memory allocation, and this is arguably orthogonal to the string representations
A substringz cannot be produced from a stringz without doing an allocation.
> you're free to make substrings using pointer + length or whatever, and this is in many cases the best solution
Right, I can. And it's an ongoing nuisance in C to do so, because it doesn't have proper abstractions to build new types with. Even worse, if I switch my stringz to length delimited, and then pass it to fopen() which wants a stringz, I have to convert my length delimited string to stringz even though it is already a stringz. Because my length delimited API has no mechanism to say it also is 0 terminated.
You wind up with two string representations in your code, and then what? Have each string function come in a pair?
Believe me, I've done this stuff, I've thought about it a lot, and there is no happy solution. It annoys me enough that C is just not a tool I want to reach for anymore. I'm just tired of ugly, buggy C string code.
The good news is there is a fix, and I've proposed it, but it gets zero traction:
> You wind up with two string representations in your code, and then what? Have each string function come in a pair?
As said, I don't think this is the end of the world, and I'm likely to add a number of other string representations. While it happens rarely, I don't worry about formatting a temporary string for an API into a temporary before calling it. Because most "string" things are small and dispensable. Zero-terminated strings are the cheap plastic solution that just works for submitting string-literals to printf, and that just works to view directly in a binary. And they're compatible with length delineated in the sense that you can supply a (cheap plastic) zero-terminated string to a (more serious) length delineated API. Also the other way, many length delineated APIs are designed to work with both - supply -1 as length, and you can happily put a string literal as argument, don't even have to macro your way with sizeof then to supply the right length.
> The good news is there is a fix, and I've proposed it, but it gets zero traction
I'm aware of this and I like it ("fat pointers") but I wouldn't like it if the APIs would miss the explicit length argument because there's a size field glued to the slice.
> many length delineated APIs are designed to work with both - supply -1 as length, and you can happily put a string literal as argument, don't even have to macro your way with sizeof then to supply the right length.
I'm sorry, I just have to say "no thanks" to that. I don't really want each string function to test the length and run strlen if it isn't there.
By now, the D community has 20 years experience with length as part of the string type. Nobody wants to go back to the C way. It's probably the most unambiguously successful and undisputed feature of D. C code that gets converted to D gets scrubbed of the stringz code, and the result is cleaner and faster.
D still interfaces with C and C strings. The conversion is done as the last step before calling the C function. (There's a clever way to add a 0 that only rarely requires an allocation.) Any C strings returned get immediately converted with the slice idiom:
string s = p[0 .. strlen(p)];
> I wouldn't like it if the APIs would miss the explicit length argument because there's a size field glued to the slice.
I bet you would like it! (Another problem with a separate length field is there's no obvious connection between it and the string - which is another source of bugs.)
> Last time you were looking at a binary using your editor or pager, how much better has your experience been thanks to NUL terminators?
Not perceptibly better. And yeah, I do look at binary dumps now and then, after all, I wrote the code that generates ELF, OMF, MachO, and MSCOFF object file formats, and librarians for them :-)
I wrote simple ELF and PE/COFF writers too, but independently of that, zero terminators are what lets you find strings in a binary. And what allows the "strings" program to function. It simply couldn't work with without those terminators.
Similarly, the text we're exchanging consists of words and sentences that are terminated using not zero bytes, but other terminators. I'm very happy that they're not length delineated.
I use "grep -w foo" (or something like "grep '\<foo\>'"), because when I look for "foo" I don't want "bazfoobar". grep -w only works because the end of words is signaled in-band (surrounding / terminating words with whitespace).
Zero-terminated strings was a bad decision even back then, let alone now. They make vectorization very painful, and you just needlessly have to iterate over strings at every use-site.
Except nobody cares about vectorization of your printf("Hello, World\n") or other 12-character strings. Vectorization here would in fact be a waste of build time as well as object output size, and the runtime performance would be not measureably different, possibly even slower in some cases. It's a total waste.
When you're processing actual buffers full of text or binary data, and performance matters, of course you are not advised to use an in-band signaled sentinel like zero-terminator is. Use an explicit length for those cases.
Additionally, back in the 8 and 16 bit home computer days, C programms were full of inline Assembly, because C compilers generated garbage machine code.
It was 40 years of throwing UB optimizations at it, that made its current performance come true.
This is nothing special, any language can eventually reach that reality with similar investment.
Rust does runtime bounds checks on array access, which the compiler can't elide in non-trivial cases (e.g. binary search), so if you want to write such algorithms as fast as in C, you need to use a lot of unsafe array indexing.
I would be interested in how big of a difference does it make — these branches will always go one way, so they are trivial to branch predict. Other than very slightly increasing the code size (which can have a big effect if the i cache is no longer sufficient), I fail to see it doing too badly as is.
Rustc may still need to do runtime bounds check if it can’t conclude that a dynamically computed index is in-bounds.
The length is known statically, but whether the index is in-bounds may not be.
The binary search GP talks about is exactly one such case, go on Godbolt, write up a simple binary search (with a static size so you get code), and you’ll see that the compiler checks against the literal array size on every iteration.
this pretty much summarises my opinion - one nitpick - i assume you meant "omit bounds and other checks", not "emit bounds and other checks" which seems to mean the opposite of what you're intending
Rust does emit bounds and other checks, though. Optimization passes can usually clear some of them away, but you'd need to check the assembly output to be sure.
- trying to access an arbitrary element in a slice, the compiler will emit bounds checks (`if index > len: panic()`) to avoid an uncontrolled out-of-bounds memory access — https://godbolt.org/z/cbY5ebzvK (note how if you comment out the assert, the code barely changes, because the compiler is adding an invisible assert of its own)
- if the compiler can infer that `index` will always be less than `len`, then it will omit the bounds check — https://godbolt.org/z/TTashYnjd
Would you please stop trolling HN? You've been here for 15 years. You have a distinguished history of writing good articles. You didn't use to post crap comments like this or https://news.ycombinator.com/item?id=32387218 and we need you to stop it. If you don't we will have to ban you, which I would hate to do.
If you have some substantive critique to make about Rust in appropriate contexts, that's fine of course. But this sort of programming language flamewar is definitely not fine. It leads to lame, dumb threads that we're trying to avoid here.
Dang, I apologize for these comments. Every time I tried Rust, I just couldn’t get anything done in a reasonable amount of time, and I started hating this language. I will refrain from commenting on Rust posts from now on.
Why do you keep posting this? One would think by the 10th heavily downvoted comment in the same vein you would have figured out by now that the flaming is neither clever nor appreciated.
> any safety checks put into the competing language will have a runtime cost, which often is unacceptable.
And what is the runtime cost of all the mitigations put in place because we don't use a memory safe language? Stack canaries, safe stacks, ASLR, control flow integrity, code pointer integrity, runtime attestation, library re-linking and randomization. Not to mention sandboxing techniques and other system level mitigations.
I suppose I should thank the C language for job security as a security engineer.
And who says that faster runtime trumps all other considerations? I would much rather have my computer run slower than have to continuously deal with security vulnerabilities. My time is much more valuable than CPU time.
> I would much rather have my computer run slower than have to continuously deal with security vulnerabilities. My time is much more valuable than CPU time.
To be devil’s advocate and take the other side of this argument: I would rather the CPU I paid for spend its cycles performing actual application logic. Every cycle spent executing all this “safety code” overhead feels like me having to pay for a developer’s sloppiness.
I feel end users pay a high cost for all this runtime checking and all these frameworks and levels of abstraction.
First, not all language safety features have to manifest themselves as run-time checks. A properly designed language will allow many safety checks to be done at compile-time.
And second, would you really rather deal with security holes on an on-going basis?
The problem with C is not that you can write unsafe code in it, it is that the design of the language -- and pointer aliasing in particular -- makes it impossible to perform even basic sanity checks at compile time [EDIT: or even at run-time for that matter]. For example:
int x[10];
...
int y = x[11];
In any sane language, that would be a trivially-checkable compile-time error. But in C it is not because there are all kinds of things that could legally happen where the ellipses are that would make this code not violate any of C's safety constraints. That is the problem with C, and it is a fundamental problem with C's design, that it conflates pointers and arrays. C was designed for a totally different world, and it is broken beyond repair.
I'm not here to defend C, but to point out a problem that many advocates make when they cherry pick examples of how their language is both safer and more performant than C. Simply put, that example is irrelevant when the size of the array is not known at compile time. The moment that you have a dynamically allocated array, you are either dropping safety (e.g. expecting the developer to perform bounds checks when necessary) or performance (e.g. the compiler inserts bounds checks at runtime).
It is also worth considering that there is nothing preventing C compilers from inserting compile time checks to address examples such as yours. I just tried to compile a similar example in gcc and it does catch the unsafe code with warnings and the optimizer turned on.
> The moment that you have a dynamically allocated array, you are either dropping safety (e.g. expecting the developer to perform bounds checks when necessary) or performance (e.g. the compiler inserts bounds checks at runtime).
Yes, that's true. So? The original claim is that it is self-evident that this tradeoff should be resolved in favor of performance in the design of the language, and it just isn't (self-evident). If anything, it is self-evident to me that the tradeoff should be resolved in favor of safety in today's world.
There are patterns, though, that can help. C compilers are capable of recognising and optimising many forms of iteration, but being able to tell the compiler explicitly that you're iterating over a collection gives you that much more confidence that the compiler will do the right thing.
Especially when to get the compiler to do the right thing safely you need to add manual bounds checking with the expectation that the compiler will optimise it away, but without any mechanism to ensure that it actually happens.
It depends greatly on the problem at hand, but there are definitely cases where even with a dynamic array size we can unroll our loop such that we check bounds less than once per iteration.
Bad example. There's nothing[0] you could put in the ellipsis to make that code valid, and both gcc and clang will warn about it (clang on defaults, gcc with -Wall).
[0] Ok, I guess you could #define x something, but that's not interesting from a static analysis perspective.
Warning is one thing, but crashing is better. That's possible to do in a C compiler too of course, because in this example the array hasn't decayed to a pointer and its size can be recovered.
The issue is when you pass the array to another function, it can't track the size without changing the ABI.
If the choice is between a compile time warning and a runtime crash, I will take the warning every single time: much closer to the actual error. You're probably asking for a compile time error instead.
Indeed the `-Werror` option is often a good thing to have (though I don't set it by default on my free software projects, because other people might use other compilers with different warnings, and I don't want to block them outright).
-Werror is an interesting case -- it's an example of a key difference between C and Rust.
Rust's compiler will reject programs unless it can prove them to be valid. C compilers will accept programs unless they can prove them to be invalid. But then C warnings can lead to an indeterminate state: code that looks iffy may be rejected, but we've not necessarily proven that the code is wrong. We're still trusting the programmers' claim that code which may exhibit undefined behaviour with certain inputs won't ever receive those inputs.
I meant a crash. Obviously both at once is best, but you can detect the possibility of the crash at compile time (disassemble your program and see the bounds check), and it turns a possible security issue into a predictable crash so that’s safer.
I don’t really love forcing errors; when a program is “under construction” you should be able to act like it is and not have to clean up all the incomplete parts. It also annoys people testing new compilers against your code.
> Indeed the `-Werror` option is often a good thing to have (though I don't set it by default on my free software projects, because other people might use other compilers with different warnings, and I don't want to block them outright).
Another problem with C. There's way too much implementation-dependent behavior.
Not with Monocypher. That project has one implementation defined behaviour (right shifts of negative integers), and it's one where all platforms all behave exactly the same (they propagate the sign bit). In over 5 years, I haven't got a single report of a platform behaving differently (which would result in public key crypto not working at all).
However I do get spurious warnings, such as mixing arithmetic and bitwise operations even in cases where that's intended. Pleasing every compiler is not trivial.
What do you suppose you could put in the ellipsis that would make your second statement defined behavior other than preprocesor stuff or creating a new scope with a different variable named x? (Neither of which would frustrate a compiler wanting to give you a warning)
The real overhead is not safety code but inefficient design and unnecessary (and often detrimental to the user) features. The added 10% of safety checks is nothing compared to orders of magnitude of bloat modern software has. Moreover, a lot of C codebases have the same safety checks manually implemented into them, sacrificing performance in order to have less bugs. And when it comes to designing a modern C alternative, it's not just a performance/safety tradeoff, the are parts of C that are simply bad, like null-terminated strings, and are long due to be replaced with something better.
I'm suspicious that people arguing that safety checks making things slow are basing that on the very stale now idea that CPU's work sequentially. Which isn't true for super scalar machines at all. And also forgetting compilers will optimize away a lot of them.
There is an asymmetrical risk here. On one hand the compile might emit checks that aren't needed. On the other that the programmer will fail to insert a check that is needed.
Both yours and parent arguments are strong and reasonable. Compared with other overhead that developers deliberately add to their programs, language safety checks are probably a drop in the bucket. We're in a world where developers think it's reasonable to run a chat app on top of an entire browser framework on top of the platform SDK on top of the OS. The runtime performance of checking an array's bounds are the least of our concerns.
>I would rather the CPU I paid for spend its cycles performing actual application logic.
The problem is that for CPU programming errors are application logic too and they are run, cf. protected memory. You don't run everything in ring0 do you?
That's not the point. The point the parent is making is that yes, languages that include safety features are slower, but you will actually need those safety features in C as well, in the code. So your program will also be slower in C.
There is something backwards about how safety checks are done in C.
First it's up to the programmer to put them in.
Then hopefully the compiler will optimize away the unneeded ones. If the programmer though biffs and forgets then oops. And of course classically we blame the programmer not the system he's been forced to work under.
That seems like a reasonable thing in 1982. Which is 40 years ago.
It would be better if the compiler implemented the checks automagically and removed ones it knows it doesn't need. And bonus, if the programmer puts one in, leave it alone.
Yes, I think we're actually all in agreement here. The common point is that TFA's argument that we need to keep C because it's faster is invalid. The GP's point is that it is not in fact faster because of all the things you need to do to produce code that can actually be deployed in today's world. My point is that even if C were faster (which it actually isn't) that would not matter because C is unsafe even with all the extra stuff that people stick onto it to try to make it safe. So the original claim is really a judgement call that speed matters more than safety, and this is not something about which there is any kind of consensus.
What does slower actually mean? Slower in the face of safety, while computers continue to get faster and or cheaper (depending on how you spend your transistors/cost) is a fools paradise. Don't be baited into their flawed logic.
If your transistor/cost curve has a doubling time of 3 years, 156 weeks. A 5% difference is approximately 11 weeks.
If your transistor/cost curve has a doubling time of 2 years, a 5% difference is approximately 8 weeks.
How fast do we have to get before safety is table stakes? Focusing on raw unsafe speed wouldn't be a normalized metric in any other industry. I'll spend 8-11 weeks of performance gains on correctness.
If you remove the mechanically preventable bugs from being a consideration, by definition the only one you now need to focus on are the ones NOT prevented by mechanism.
How is this not a win? We only have so many decisions we can make per day.
So, let's take the most extreme example of pursuing safety, WUFFS.
Unlike general purpose languages WUFFS has a very specific purpose (Wrangling Untrusted File Formats Safely) so it doesn't need to worry that it can't solve some of your problems, which frees it to completely refuse to do stuff that's unsafe, while going real, real fast.
WUFFS gets to completely omit bounds checks, which you probably wouldn't have dared try in C because it's so obviously unsafe, but WUFFS already proved at compile time that it can't ever have bounds misses so it needn't do these checks. WUFFS gets to also omit integer overflow checks, because again it proved at compile time that your code is correct and cannot overflow for any input. And since WUFFS knows how big the data is at all times it gets to automatically generate loop unrolling and suchlike accelerations.
WUFFS isn't a general purpose language, it doesn't have strings, it doesn't have growable arrays (what C++ calls "vectors"), it doesn't have any dynamic memory allocation, but then we weren't talking about how much you love general purpose features, we were talking about safety preventing security bugs. Which is exactly what WUFFS does.
I can't respond to that unless you are more specific about what are "the ones that lead to really bad outcomes". There are a lot of arbitrary-code-execution attacks that are enabled by buffer overflows, and IMHO that's as bad as it gets. There is absolutely no legitimate excuse for a buffer overflow in today's world. Switching to safe(r) language won't prevent all attacks, but it will make a big dent.
It defeats some extremely important classes of exploits. And I'm not sure how they're not ones that lead to really bad outcomes since they lead to fun ones such as arbitrary code execution all the time. I can create a C program with hideous vulnerabilities in about five minutes without doing anything that isn't totally standard and normal (albeit obviously vulnerable so technically buggy). I'd have to actually look up how to make my code vulnerable in languages with more safety features.
And the latest trend, producing C Machines with hardware memory tagging, because none of those mitigations are actually working preventing all those CVE to take place.
You are the security engineer, so you certainly know better than me, but aren't those runtime mitigations aimed at malicious programs? Which is to say, even if a better-C was written that didn't allow people to write a program that would bump into those mitigations, the bad guys could still write their programs in assembly or C or whatever, right?
But the presence of a better language to write innocent programs in wouldn't protect the innocent programs from malicious programs written in C and assembly...
Think about web browsers. We're not really that worried about people running malicious web browsers. It's potentially a problem, but as long as people know not to run software that some random spammer sends them in an email then it's easy to avoid.
On the other hand what computer security people worry about a lot is that the web browsers made by reputable organizations and teams of competent programmers nevertheless contain security flaws that can be exploited by a maliciously-written website to cause those browsers to do unexpected and dangerous things.
Many of those security flaws in otherwise well-regarded software are due to memory management errors that just aren't present in safer languages, or they're due to type errors that wouldn't be present in more type-safe languages.
There are some implementation bugs that could be present in any language no matter how many safety features it has, but many security bugs aren't due to, say, an incorrectly specified algorithm, they're due to the programmer asking the computer to do something that's literally nonsense, like asking for the fourth element of a list that only has three elements, or recording that someone's age is apricot. Programming languages with powerful nonsense filters can remove a lot of those kinds of security bugs. (And powerful type systems often give programmers mechanisms to tell the compiler more about the program so that it can filter out more kinds of nonsense than it would otherwise.)
I don't understand this response. Nothing really stops that, regardless of implementation language. It's why we have an entire bodged and mostly ineffective AV industry, as well as a slightly less bodged and partially effective endpoint monitoring/detection industry.
Runtime mitigations exist to mitigate some of the latent risk associated with programming in unsafe programming languages. We use them because they're our best known approach to continuing to use those languages without letting script kiddies own us like it's 1993.
You don't seem to understand that those malicious programs have no way of running on someone's computer if they can't exploit some other program to get installed on the machine in the first place. If the system software on the target machine is written in a better language and has no exploits, then it doesn't matter what language the attacker uses for their software.
It would prevent or at least mitigate some classes of exploitation. Buffer overflows are very common attack vectors: https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=buffer+over... (386 results, 11 from prior years, so 375 from this year). 23 of those are in kernels, many leading to privilege escalation.
Yes, a better language would protect innocent programs. Take stack canaries. They are to protect stack corruption due to an application bug, e.g. unsafe input handling. Input handling is perfectly safe in many other languages, but C has a lot of footguns.
The language the innocent programs are written in can remove attack vectors that the malicious programs use; I don’t think it matters what the malicious programs are written in?
Sandboxing techniques can also be used for dealing with malicious programs, but most of those things listed are for programs to use to protect themselves, not to stop them from intentionally doing bad things.
I agree. I think of address space context switching overhead as the performance price we pay for not being able to run all our programs in a single address space, which we could safely do if we knew all the programs were emitted by a trusted compiler that disallows unsafe memory access. Imagine if system calls were just ordinary functions that can be called with no more than the normal function call overhead? What if they could even be inlined?
Obviously there's a lot of little details you'd have to work out. Like how to make such a trusted compiler in the first place, and how to sandbox unsafe code and legacy applications that were compiled by an untrusted compiler.
If this seems like it's far-fetched or too much work, consider what lengths high-performance hardware devices like HPC network interfaces go to avoid system calls at all costs, to the point where applications talk to the hardware directly. Is that really a sustainable practice long term? And how can anyone audit the security of such hardware devices?
This is what Microsoft Research's Singularity OS did, with the language being C#. Their argument was that the MMU's address space isolation was a 30% Unsafe Code Tax so even if C# was slower than C if they could get the slowdown to less than that it was still an overall performance win.
I'm pretty sure the discovery of Meltdown/Spectre and similar speculative execution attacks would completely wreck this model. The fix for those exploits has been to make the isolation barriers even stronger but if you don't have them at all you're wide open. If you had such an OS but then had to split it back into separate address spaces you've now lost the performance gains and just have a slower OS that is harder to develop drivers for.
Yeah, speculation attacks are a big problem. It seems like maybe if you're running a specific compiler you might be able to avoid speculation attacks by not emitting dangerous sequences of instructions, but I don't know what the state of the art is when it comes to Spectre mitigations and whether it's possible to have a compiler that can formally verify that a program is immune to any (known) speculation attacks.
> I think of address space context switching overhead as the performance price we pay for not being able to run all our programs in a single address space, which we could safely do if we knew all the programs were emitted by a trusted compiler that disallows unsafe memory access. Imagine if system calls were just ordinary functions that can be called with no more than the normal function call overhead?
That's pretty much how the Amiga worked, and that level of technology achievement is still unsurpassed today.
You already see this in PaaS and serverless for managed runtimes, I don't care if my Java and .NET code runs on bare metal, micro-kernel, unikernel, or whatever.
Rust will not replace anything because it’s impossible to write code in it. Everything you write is a syntax error that requires an exobrain to figure out.
GP has been posting the same thing in many threads when Rust comes up. Apparently they think it's funny, or they're really bad at writing Rust code and feel like venting.
I'm sorry, but the C language toolchain is not great. The only part of the C language toolchain that is good is it's platform support. Every platform has a C compiler.
However, that's hardly something most devs will care about. At this point, we are pretty much all targeting Arm or x86 (Sorry PIC and MIPS devs). And every new language that's cropped up at a minimum supports both those platforms and the major OSes for those platforms.
So once you dismiss that point, what are you left with? Tools for catching memory leaks? Well, sorry, but pretty much every language has those tools and better. Particularly GCed languages. Profiling tools? Debug tools? To say the C debug tools are "good" is to say you've not used any other toolchain.
But let's also point out C's horror story that is "dependency management and building". How do you grab dependencies for and build javascript? npm install, npm build. How about C? Well... are you using auto tools? Cmake? Ninja? Scion? makefiles? etc... Oh, and how do you get dependencies? Well, hope you like reading readme files and decipher cryptic "symbol not found xyz" compiler errors.
Suffice it to say C's toolchain isn't anything to be proud of. It's a relic of the 70s that to this day gets in the way because the community has practically given up on the idea of fixing the glaring issues (Too hard, everything already depends on X working the way X does).
So why risk switching languages? Well, because the C toolchain is a dumpster fire and you likely aren't trying to write new code for IBM RPG servers.
> I'm sorry, but the C language toolchain is not great. The only part of the C language toolchain that is good is it's platform support. Every platform has a C compiler.
Please take a few days to review John Regehr’s excellent blog. I’m certain that you’ll find dispelling your ignorance rewarding.
> How do you grab dependencies for and build javascript? npm install, npm build.
> Please take a few days to review John Regehr’s excellent blog.
Haven't heard of him but will certainly give a read. Thanks for the heads up.
> Npm is the poster child for supply chain attacks.
The only reason C doesn't (often) have similar supply chain attacks is because pulling in dependencies is so hard that you aren't likely to end up with a 10k dependency project.
Hard to say that's really a plus.
Other ecosystems have their own problems and certainly aren't perfect. However, the toolchains are generally leaps and bounds ahead of what C currently has.
> you aren't likely to end up with a 10k dependency project.
> Hard to say that's really a plus.
I'll definitely call that a plus. All dependencies introduce risk. Sure, we can't avoid all dependencies, but carefully evaluating them and keeping the list small is a big win for maintainability and security.
If a random library has a 0.1% chance of having malicious code (I'd say it's more like 1%, but let's be generous), a 10K-dependency program is guaranteed to be pulling in at least something malicious.
>The only reason C doesn't (often) have similar supply chain attacks is because pulling in dependencies is so hard that you aren't likely to end up with a 10k dependency project.
Or the lack of canonical package manager means the developer has to actually spend time to inspect the quality of each dependency instead of `npm install is-odd`ing like there is no tomorrow.
It also acts as a "filter" to prevent adding complexity to the dependency tree.
I do most of my heavy lifting with Git commit hashes via FetchContent and it's been pretty good to be fair, surprising number of well written CMakeLists in popular lib repos.
> Please take a few days to review John Regehr’s excellent blog. I’m certain that you’ll find dispelling your ignorance rewarding.
I regularly read his blog and I found nothing to back you up. He mostly work on a generic compiler toolchain (LLVM) other than C-specific tools and his work easily translates to many other languages using LLVM. And even a rare C-specific tool like C-Reduce is not that hard to adapt for other languages; C-Reduce is actually language-agnostic and works well for many other curly-brace languages as well.
> i might add the blind praise of garbage collection raised an eyebrow for me.
Like it or not, a problem C has is that it's pretty much impossible to determine why a piece of memory is currently being retained. Tools like valgrind and jemalloc can give you good starting points but really can't point to issues where "This memory is being retained because of this linked list over here"
Languages with GCs can, at any point, dump the heap and provide tools for analyzing that heap to point to the exact reason why a bit of memory is currently being retained.
That's not really "blind" praise of GCed languages, simply a fact of how they operate.
You’re not wrong that aliasing presents serious practical problems for building an efficient object graph. The thing is the defining characteristic of a C-like is that it has an ultra lightweight runtime[1]. The way I look at it, every C program has a bespoke garbage collector of variable quality. It’s fair to say the median is pretty dismal sure, but tooling does exist to tame that beast.
I think there’s a wide open research space to work on taming the aliasing problem. There are PhDs to be had, and maybe even Turing awards.
[1] C gets to cheat a little bit here because Unix and friends are its runtime.
> he thing is the defining characteristic of a C-like is that it has an ultra lightweight runtime
Agreed. And with my critique of the toolchain, I didn't mean to imply that C isn't a reasonable choice in many situations. But rather, holding up the C toolchain as a reason to pick C over an alternative isn't reasonable in my opinion.
In particular, because a lot of C alternatives piggy back on the C ecosystem by using LLVM as a backend. So you end up with a superset of tools. The C alternative ecosystem and the C ecosystem.
Pick C if it's all that's available. Pick it if you plan on distributing your app/lib onto that IBM RPG server. Pick it if your software is already written in C. Pick it if your software is a simple app that runs for 1 second then shuts down (and it isn't really mission critical). Pick it if memory safety isn't really a problem. Pick it if that ultra thin runtime is an absolute necessity. Pick it if you have a bunch of C devs that don't want to learn a new language.
But don't pick it because you are afraid new languages won't have something as good as valgrind or gdb. Any semi-popular language born in the last 20 years (that wasn't written yesterday) will almost certainly have better tooling than C.
> But aside from Jai, is anyone C alternative really looking to pursue having killer features? And if it doesn't have one, how does it prove the switch from C is worth it? It can't.
Zig's `zig cc` is a killer feature that doesn't even require using Zig-the-language at all. `zig cc` is an LLVM-based C compiler that gives you trivial cross-compilation for existing C codebases, adds effective caching, and can be easily installed on most operating systems with `tar -xzf`.
What you're seeing is most likely not bugs (it is clang after all), but the result of the much stricter default compilation settings in "zig cc" which tends to break a lot of C code that hasn't been checked against a static analyzer or ASAN before.
Ah alright, some of those macOS specific problems had been fixed a little while ago when work on the macOS parts of the zld linker happened. I still get an occasional "framework headers not found" on macOS in the current Zig head version, but after the first failure I never seem to be able to reproduce it. It also never happens when running in CI, so I'm not sure if it isn't actually a problem with my Xcode setup.
> First of all, pretty much all languages ever will make vacuous claims of "higher programmer productivity". The problem is that for a business this usually doesn't matter. Why? Because the actual programming is not the main time sink. In a business, what takes time is to actually figure out what the task really is. So something like a 10% or 20% "productivity boost" won't even register. A 100% increase in productivity might show, but even that isn't guaranteed.
I stopped reading here. This is like saying typing speed doesn't matter because most of your time is spent thinking about what to type. If your argument basically evaluates to saying that it doesn't matter how productive you are while doing the actual activity of your job itself... at some point it's like, I dunno, do you actually believe what you're saying?
Also, not to go off on too unrelated/unhinged a tangent, but, I've also started noticing in the last five years this widespread fear of actually sitting down and programming. There's a weird meta-culture where people want to do anything but sit down and write code, because it's hard and scary. I hear arguments made all the time for how important everything is — aligning stakeholders, scoping tasks, communication, mentoring juniors, soft skills, whatever. Everything except writing the actual code, which is somehow a mere technical triviality that anyone can do, or something.
I agree especially with that last point. There is something about programming that is very difficult. It gets _more_ difficult if you write complex things, but that's not the essence of it. I think it has something to do with raw concentration.
We have all of these tools that help us to write code while making few mistakes. From constrained DSLs/configuration languages, schemas, static typing to automated tests etc. But in the end we still have to exert all that energy to sit down, focus and do the coding.
Meetings, design, coordination etc. All of those things are important, and they can be exhausting in their own way. But programming is by far the most challenging, hour per hour (pound for pound). Making all of these micro decisions, think about the impact of your code, the readability, the correctness. It is _at least_ two times harder and more draining than anything else I do, even when I'm in a flow state and the actual work is fun, more fun than all of the other tasks.
And I agree: It is the most important thing. I mean all of the other stuff is basically there to support the core task that solves your problems: producing working software (except if it isn't but then you have political problems). There are seemingly infinite dimensions for improvement as well, tradeoffs such as performance, efficiency, robustness, usability, leverage, creativity...
I’ve programmed in C for 20+ years and then recently switched to rust. When fellow C programmers ask me how Rust is going, I say “it’s modern” and “it’s consistent”. What I mean by that unlike C, Rust doesn’t have 50 years of baggage: string functions that handle null termination all differently, complicated implicit promotion rules, null-terminated strings that haven’t been a good idea for decades, apis that we’re completely busted by posix threads but still exist, functions that cannot be used safely that still exist, code snippet examples with horribly broken unsafe code that beginners copy-paste, dubious and bad type-based aliasing, rampant use of UB, etc etc etc
Sadly Rust is not a better C. Maybe it’ll grow into one some day. Today it’s a better C++. Sadly I’d still chose C to do C’s job.
All I want from “better C” is to fix the obviously bad problems from C’s history. Or rather, maybe that’s the committee’s job.
At any rate: it’s embarrassing that we’re still so dependent on a language that’s so broken. And it’s sad that there doesn’t seem to be a path forward.
When I discovered strsep was a thing I was genuinely a bit mad that I'd been "tricked" into using strtok all this time. Since then I never just look up a function in the standard library reference. It's a fair chance its API is far from ideal, especially in string.h
What is C’s job, when would you use it for a green field project? Not trying to start a flame war, genuinely curious as I see less and less need for C, other than maintaining all those existing code bases.
1. C language toolchain -- No specifics (save static analyzers, ... because C needs static analyzers to catch C specific bugs...). What do you actually feel you're missing? Most of these "new C" languages use the same backend as a C compiler. What can't you use?
2., 3., and 4. -- Just chicken and egg FUD.
Not to mention it's "'Better X' doesn't matter" fundamentally doesn't understand the competition.
A sample quote: "That's not just one but eight(!) killer features (re: Java). How many of those unique selling points do the C alternatives have? Less than Java did at least!" WTF?! "But Java had 8!" is laughable.
Another: "[A]ny safety checks put into the competing language will have a runtime cost, which often is unacceptable". We can also produce a car without windows, doors or a chassis that will survive a crash, and it may be faster too. We may even get lucky and live to tell the tale.
Again: "Aside from Jai, is anyone C alternative really looking to pursue having killer features?" Aside from Jai?! Jai. Really? Oh, I remember you just discounted safety and correctness as killer features. ... But Jai. And nothing against Jai. But a language which hasn't been, which may never be, released to the general public. Just, eyes wide open, wow.
I hope this is the peak of anti-Rust/pro-C silliness, but I know it's not.
I would reread the first paragraph. In context, this is clearly not an "anti-Rust/pro-C" essay. It's not a championing of C nor a resignation to its perpetual dominance.
It's a position paper meant to sketch out the negative space that a viable C replacement can't meaningfully compete in, so that its designers can focus attention on things that actually add value. If you don't acknowledge your enemy's strengths, you'll never really be able to fight them.
It explicitly says in the post... for example static analyzers. What is the "Frama-C" equivalent of any language that bills itself as a replacement / competitor of C?
> Most of these "new C" languages use the same backend as a C compiler.
Most of them use LLVM as far as I know, which does not target plenty of obscure platforms.
So the argument is we need the bug finders we've developed for C (because our C is full of bugs)?
> Most of them use LLVM as far as I know, which does not target plenty of obscure platforms.
Yeah, writing software in the old language is more convenient on obscure, rare, niche platforms, because it's older than dirt. More chicken and egg nonsense. It's not impossible to have LLVM add additional targets, right?
> Yeah, writing software in the old language is more convenient because it's older than dirt. More chicken and egg nonsense. It's not impossible to have LLVM add additional targets, right?
It seems like you're being needlessly hostile. There is nothing personal about this discussion, and it's not nonsense. I write embedded software that needs to run on PIC18 microcontrollers. Support for that in LLVM was dropped about 8 years ago. Do you think it's reasonable to say to someone "just add a new LLVM target, it's not literally impossible"?
I'm profoundly disappointed. This article's reasoning is not good.
> Do you think it's reasonable to say to someone "just add a new LLVM target, it's not literally impossible"?
No, but that C is already pervasive because it's been around 50 years isn't a case "against an alternative to C" as much as it is a headwind. That's fair. C has a massive head start and no one should discount that. But I'm not exactly certain that was the argument he was making.
This conversation is about general use of C vs alternatives such as Rust.
If you can't use Rust because of your specific circumstances that's ok - use C! But don't use "my current circumstances prevent me from using Rust" as an argument as an argument against Rust in general - which is what you're doing.
> But don't use "my current circumstances prevent me from using Rust" as an argument as an argument against Rust in general - which is what you're doing.
First of all I'm just summarizing the article, I'm in no way arguing against people using Rust. The article itself argues against C alternatives, and explicitly describes Rust as a C++ alternative. So the article is not even arguing against Rust! If we can't agree on such a basic reading of the article, it just seems like we're going to pointlessly talk past each other in the comments.
This article is just apologism for the status quo. Nothing new here.
> Memory allocation, array and string handling are often tricky, but with the right libraries and a sound memory strategy, it can be minimized.
Such a handwavy deflection. Parsing strings in C is a joke and also a minefield bursting with vulnerabilities. And if you use null terminated strings (which you are, let’s face it) it’s slow.
Nah, strncpy is pretty bad and doesn't do something anyone really expects to want.
(strncpy doesn't 0-terminate when it hit max length, nor stop when it sees its first 0 in the source buffer; it acts almost entirely like memcpy except that it writes zeroes to the full length of the destination upon seeing a zero in the source. This is allegedly useful for populating fixed-sized buffers in historical Unix, and can plausibly be useful for writing out tar headers, but in practice very little code in the wild does anything observably different if you #define strncpy memcpy.)
There is A solution. Is it the best solution? Are there other solutions which may offer better performance or safety.
I don't we should stop trying to improve just because an existing solution exists, especially in the space of systems programming which has barely moved in decades
Typical statement coming from people lacking experience in this field (no offense meant).
The truth is, parsing strings in C is as easy as in any other language. You have a string array and a length field, and you scrub through it from left to right with a cursor. Done.
What you do not get in C is creating lots of string objects willy-nilly, and concatenating them with a plus sign, like you do in scripting languages.
If anything, "replacing C" is a good excuse to stimulate creativity. The scope is known, the land well trodden. I like seeing new things in programming languages and OS design. I like nerds having fun and giving me interesting stuff to wrap my three neurons around in the process. That's why I love computing to begin with. Not increasing productivity, not disrupting who knows what. Just cool, smart shit.
You can strap basically any language on top of the C ABI. Many contender languages either use it natively or offer low-cost/no-cost C ABI bindings, specifically as its the lingua franca ABI.
You can check (5) off your list if your choice of replacement language has easy C ABI interop.
[edit]
(1) isn't particularly relevant either. These days the tooling for detecting memory leaks, data races, bugs, etc, operates at the debug info (DAWRF/dSYM) level. It doesn't matter what compiler or input language generated the debugging information, the tools should work just fine. You can run valgrind on your Rust binary for instance. [1] Is there any such tooling that specifically depends on the input source being C? I guess some static analysis tools?
A platform will define an ABI which, as I understand it, explains how parameters are passed to and returned from a "function", what types are supported and how they're encoded, at the processor register and stack level.
C, being the "portable assembler" [1] that it is, maps its types and functions pretty easily to that platform ABI, but I've been seeing a fair amount of confusion about that ABI being about C rather than any language that can link through that platform ABI. Couldn't it just as easily be called, I dunno, the Ada ABI?
[1] not really, except for the sake of this argument
That's it, no borrow checker, no weird syntax, no nothing
So far D is the answser for me, but i'm worried about its future, will they keep improve the language in that direction? or will they continue with their high level stuff
Zig/Jai/Odin so far are looking interesting, the problem is they all depart from the C syntax, wich is annoying, the only syntax i'd like to change is the primitive type, i32/f32/size/usize, and that's it, no need more
I have a similar list, but I think part the difficulty with C replacements is that the sort of people who want to go out and write a new language generally do so because they have something more radical in mind.
I have pondered forking tinycc and using that as a startpoint for a more conservative set of changes like you describe. It would limit the amount of work to be done and lead to something useable quite quickly.
Whenever a CVE is discussed, even if the code was from Google, Microsoft, or Apple, commenters immediately point out that these companies must have hired some "bad programmers", because a "good programmer" wouldn't have made such mistake. Given that even top software companies continue producing CVEs on a regular basis, there is a shortage of people capable of writing C.
A lot of programming skills are transferable (algorithms, architecture, system APIs, debugging), so you don't really need to hire "language X programmer".
The problem with making a new language is that it seems like people do it because they want to make a new language. Don't make a new language just to make a new language. Do it because you are solving a problem by doing it, and if you are solving a problem, you already win regardless of how many people will use the language. Have the problem before you make the solution.
> It turns out that people don't always agree on what the pain points with C is.
People have different problems so they should probably use different languages. Don't even try to make a language that will solve everyone's problems.
> Also, while the company has experience recruiting for C developers, it doesn't know how to recruit for this new language.
If the language is so hard that people have to have special knowledge to use it and can't learn it quickly then it's probably not a good language.
> For a business it's whether despite the downsides the language can help the bottom line: "Is this valuable enough to outweigh the downsides?"
Exactly, it's solving real problems that matters.
> The "build it and they will come" idea is tempting to believe in
A better idea is "build it and we will come because we need it. Then maybe other people will come if they need it too."
> any safety checks put into the competing language will have a runtime cost, which often is unacceptable
This is completely wrong. The best counterexample is probably ATS http://www.ats-lang.org which is compatible with C, yet also features dependent types (allowing us to prove arbitrary statements about our programs, and check them at compile time) and linear type (allowing us to precisely track resource usage; similar to Rust)
> It may seem that using cairo functions in ATS is nearly identical to using them in C (modulo syntatical difference). However, what happens at the level of typechecking in ATS is far more sophisticated than in C. In particular, linear types are assigned to cairo objects (such as contexts, surfaces, patterns, font faces, etc.) in ATS to allow them to be tracked statically, that is, at compile-time, preventing potential mismanagement of such objects. For instance, if the following line:
val () = cairo_surface_destroy (sf) // a type error if omitted
> is removed from the program in tutprog_hello.dats, then a type-error message is issued at compile-time to indicate that the resource sf is not properly freed. A message as such can be of great value in practice for correcting potential memory leaks that may otherwise readily go unnoticed. ATS is a programming language that distinguishes itself in its practical and effective support for precise resource management.
I still occasionally write C (and C++), but have come to appreciate the sparseness of Go (although not the intricate abstractions of Rust) and keep wondering if Zig (which does cross-compilation almost as effortlessly as Go) shouldn’t be more popular.
In the end, though, you can accomplish the same (from an outcome perspective, not necessarily from a satisfactory one) in all of them, so I try not to be overly biased, although convenience vs correctness seem to be the main issues at play here.
(Edit: why the downvotes? Is Zig out of favor? Is it because this may read as a slight on Rust?)
I like Zig a lot and think it hits precisely the right niche for a C replacement. Rust is a great language, but it’s definitely not C like when it comes to the overall UI and philosophy.
I’d like to use Zig a bit more, but the standard library is still going through some significant changes (it seems) which makes it slightly harder to commit to at the moment.
In general I wonder if some of these languages would actually benefit from Jai’s approach of deferring availability until the language is ready. While getting early adopters is great for getting feedback early, it can also disuade potential users that try the language before its ready and decide to leave it alone as a result.
> This leads to a strategy of only having checks in "safe" mode. Where the "fast" mode is just as "unsafe" as C.
This is hard to describe without getting deep into Rust details, but I think the ability to encapsulate unsafe code in safe APIs is just as important as what pure safe code can do. (Of course those two things end up being intimately related.)
> And worse, what if the language omits crucial features that are present in C? Features that C advanced programmers rely on?
Seems like this might be a good place to ask: which of the C replacement languages mentioned (or not mentioned) support inline assembly? Do any of them do it better than GCC/Clang? For me, I feel like the ability to drop to assembly in a pinch is a necessary feature. I'm sure I could technically get by linking an external file, but I'd be more interested in a language with inline ASM support than one that doesn't allow it. Thanks!
C is great mostly because it's easy to learn, and it has many other advantages.
I dislike all those new languages because they have too many features, and they're non trivial to learn.
I've looked at some rust codebases and I don't think there will be a lot of people who will want to maintain them. Rust is difficult, even though it's an awesome language.
A language that could compete with C would:
* Be fast to compile
* Have some quality of life things you can find in the C++ STL: map, string, etc
* Look a lot like C, meaning it would not try to innovate too much.
To summarize, it would be a lighter version of C++.
What exactly do you mean? Learn the basics? Become productive? Memorize the entire specification? I'd argue C isn't significantly faster at any of these than most other languages.
Actually, yes. Go does a lot of the things everyone seems to want in these C replacements, the problem is that it has GC which, to a lot of people, automatically disqualifies it as an option for writing certain kinds of software you can write with manual memory management. A lot of people also tend to think they need manual memory management for $reasons when GC is actually fine.
A "better C" language doesn't have to "win the popularity race" to be useful, it just needs to (a) be easy to learn coming from C, (b) integrate well into the existing C/C++/ObjC ecosystem, and (c) easy to install and across all supported platforms.
A "better C" cannot and should not replace C for all use cases, but it should at least augment C by fixing its design wart.
The actually depressing thing is the "winner takes all" mentality though that seems to be prelevant in the programming world.
Why do you put that quote? I'm not sure anybody "tried to get him understand it". I suppose the post comes mostly because of what the author learned or concluded from his own project. There might be other motivations. I think the post has some genuinely good points that need to get more mindshare.
There is also a more general "case against the self-indulgent narcissism in creating a new language". There are an insane number of languages with the longevity of a mushroom mentioned on HN.
An oft-overlooked pain point is that when interacting with C interfaces, it's not good enough to implement a C ABI FFI. There's a world of software out there where the supporrted interface is a C API, at the source level. Compatibility across different versions, or across different implementations, is assured only to programs which interact with the interfaces in ways that correspond particular C source patterns.
The biggest culprit I've met on this front is the POSIX stat family of functions.
Like any other technology that wishes to supplant the status-quo, a new programming language cannot be just 'a little bit better'. It has to be substantially better in order to overcome the inertia that the entrenched technology has established.
This is true for both software and hardware. There have been plenty of leaps and bounds such as SSD speeds compared to HDD speeds; but even there, SSDs will not supplant HDDs until there is price parity between the two technologies. Until then, data will be stored on SSD when fast access is critical and on HDD when there is too much of it to justify the cost.
Getting a company to switch operating systems, database vendors, or cloud providers is likewise an uphill battle. They don't call them 'walled gardens' for nothing.
I am building a new data management system that I think is much better than file systems at storing and managing unstructured data and better than RDBMS at managing structured data; but I wouldn't have even dreamed of starting the project if I wasn't confident that it was at least 2x better at a minimum.
It's real but without public access it's largely useless. I understand why Jonathan might want to avoid dealing with a peanut gallery trying to argue with him about what his language should be like, and that's fine, his language his rules and all of that. But until this situation changes it's impossible for anyone to really care about Jai.
You can't say Jai is not real, because there are beta testers using it. However, there are other languages that have taken and implemented various concepts that Jai had to offer. By the time Jai comes out, and is open to the public, there might not be anything compelling about it anymore.
The simplest solution is a preprocessor. The original founders of C made one, probably because they needed it. But adding another lightweight one is simple. C++ has bloat. Compilation time can be slow.
I realize that this is radical, but if people could work with me on this, this could solve all your problems, no really, it could solve all your problems. And if it can't solve all your problems yet, then modify it to
You could write a 100-line python script that compiles from your custom language to C code and it's still be better than C. There's lots of ways to re-use existing tooling and the bar really isn't that high for better languages
Why do you think more of these projects don't take this approach? I would be 10x more likely to try one in a professional setting if I knew the portion I was taking chance on was source translation layer, and I could rely on the well tested and supported C infrastructure.
I was considering this for my programming language but I will most likely not use it because it makes optimisations and garbage collection harder, and the C would not be human readable (just look at the output of the chicken compiler for an example) so there's not a huge benefit. C-- could be an option but it isn't very active or well used.
In terms of relying on tested and supported infrastructure lots of these projects use llvm.
> Looking at C++ alternatives there are languages like D, Rust, Nim, Crystal, Beef, Carbon and others.
Why do people keep lumping Rust into the category of a "C++ alternative". Rust is a C alternative and can be used places that C can but C++ cannot. i.e the linux kernal and baremetal silicon (C++ can be used in the embedded but I've never heard of it being used without modification for that environment).
> My language (C3) is fairly recent, there are others: Zig, Odin, Jai and older languages like eC.
Further he seems to claim that Jai is a C alternative when it's explicitly described by it's creator as a C++ alternative.
"Any C alternative will be expected to be on par with C in performance. The problem is that C have practically no checks, so any safety checks put into the competing language will have a runtime cost, which often is unacceptable. This leads to a strategy of only having checks in "safe" mode. Where the "fast" mode is just as "unsafe" as C."
This one is the main point. Highest possible speed while consuming as little resource as possible. Put in checks, and most of those checks will run redundantly compromising performance.
While true, its lack of restrictions is also a hinderance in terms of performance. As pointers can overlap by default (unless you throw a __restrict on them), a whole class of optimizations cannot be performed. That's why Fortran and Rust can in theory beat C from a performance perspective even though they do more checking as a rule. That's just one example, but it's a pretty big one.
"We have seriously regressed, since C developed. C has destroyed our ability to advance the state of the art in automatic optimization, automatic parallelization, automatic mapping of a high-level language to the machine."
Yep. A few years ago I implemented a skip list based rope library in C[1], and after learning rust I eventually ported it over[2].
The rust implementation was much less code than the C version. It generated a bigger assembly but it ran 20% faster or so. (I don't know why it ran faster than the C version - this was before the noalias analysis was turned on in the compiler).
Its now about 3x faster than C, thanks to some use of clever layered data structures. I could implement those optimizations in C, but I find rust easier to work with.
C has advantages, but performance is a bad reason to choose C over rust. In my experience, the runtime bounds checks it adds are remarkably cheap from a performance perspective. And its more than offset by the extra optimizations the rust compiler can do thanks to the extra knowledge the compiler has about your program. If my experience is anything to go by, naively porting C programs to rust would result in faster code a lot of the time.
And I find it easier to optimize rust code compared to C code, thanks to generics and the (excellent) crates ecosystem. If I was optimizing for runtime speed, I'd pick rust over C every time.
The valgrind suite is a very effective answer to these needs and it's used by most C programmers. Helgrind for data races. Memcheck for memory ownership. Not to mention sophisticated tools for cache profiling, call graph analysis and heap profiling. No, they're not static tools but being dynamic has a host of advantages too.
Valgrind's memcheck is great for exercising known paths in a program, and determining whether they're likely to contain memory errors. It doesn't help you discover those paths, which is what attackers are doing when they research and then exploit your program.
We've known for decades that compiler mitigations, fuzz testing, and dynamic instrumentation are excellent and necessary components of writing more secure C. But they don't secure C programs, because C itself is fundamentally unsafe.
> No, they're not static tools but being dynamic has a host of advantages too.
I think the main point is that a good alternative C doesn't compete directly with C's strongest points.
Here, you can of course make a more safe language without runtime penalty just by supporting richer build-time/static checks, but the modest and meaningful gains you make within that constraint will never be enough to displace C.
You need to be providing something meaningfully more in areas where C hasn't already squeezed out most gains. The OP tastefully avoided making this article directly about C3, but if you go and look at the summary of features, you can see how this philosophy connects with what they're working to build.
C is used because it has to be used, i.e. low level environments etc. not because 'performance' most of the time.
If you could pay in performance a little bit, but have a clean, portable, beautiful debugger experience, nice libraries like 'Java' with a 'stable ABI' ... nobody would use C. We'd all use 'that new thing'. And we would just buy slightly faster hardware for the IoT whatever.
We use C++ smart pointers all over the place, and they come with a 'cost'. We mostly 'happily pay the price' because it's worth it most scenarios.
Engineers obsession with performance is a bit of a curse. Rarely do we really need that much. Kernel code, sure, IoT and remotely controlled toy cars, mostly not.
Honestly, I hate includes and preprocessor directives so much that any "better C" variant which replaces them with more modern solutions would probably be enough in my book - with the caveat being that I also like having an IDE very much and wouldn't want to revert to coding in something like emacs with plugins. BTW, C++20 has a good replacement for includes (modules), but IDE support is not there yet and thus it's not usable for me.
I wholeheartedly agree, the speed of processing includes annoys me most about C.
The sad part is that the mainstream alternatives that I have on the radar are _even slower_ to build.
And usage of the preprocessor in APIs is allowing for good things, at least for a compiled systems language like C: Macros allow to get rid of boilerplate on the syntax level and if-defs allow for compatibility. Both points' significance is reduced a long way in other languages, because they have stronger facilities for abstraction, BUT this abstraction comes at a non-significant price and cannot replace all significant uses of the preprocessor.
So I'm still stuck with C. The solution is designing header files carefully, resulting in build times that are lightning fast, at least in relation to most other mainstream languages. And there is always the option to go for "unity" builds.
What bother me most is not even the speed of compilation (although it's a major pain too), but the fact that I need to create a header file for every .c file I want to have used from outside it's compilation unit, and add method declarations in it. In 2022 it's SO BACKWARDS it hurts.
I almost never found that painful. Maybe a couple things that help: 1) Improve modularization. APIs should be small (as a rule). 2) Don't write a new file for each "class". It's not a problem to have files with more than 1 KLOC. But it is problematic to have a lot of really small files, as is (I think) customary with say Java.
I come from the Java world, so I'm very used to every semantically separate bit of code sitting in its own file. I think it's superior to having multi-KLOC source files (as is common in C/C++ world, because it minimizes the header writing and maintenance busywork), because then you can browse code via a file browser, which is typically a panel in your IDE. The C alternative to that would be having the same file opened in multiple tabs, and heavy usage of (IDE) bookmarks? I don't find that as convenient.
As to improving modularization, in order to fix compilation speed issues I was trying out the "program as a single compilation unit" approach advocated by handmade people, and I found that I had to spent a good amount of time in moving code between modules, and their subsequent headers. So annoying... Perhaps it wouldn't be so infuriating if I didn't spend the past 15 years coding in languages where this is a problem solved automatically and invisibly by the compiler.
> any safety checks put into the competing language will have a runtime cost, which often is unacceptable.
Suppose you have bounds checks on array accesses, but your program is 100% correct and the panic case is never hit. Doesn't the branch predictor essentially make the bounds check free? Or very low cost at least?
If what comes after the check is a write or any other destructive operation, then the check is not so easy to mask away. I don't have actual numbers, but my gut is a strong 'no'.
That said, a lot of pedestrian code that isn't running particularly hot would probably be better off including runtime checks by default.
It is no different from QuakeC, GOAL, and a myriad of other programming languages that game studios never had as milestone to release to the wieder public, yet they were quite usefull to their owners.
Babylonians used it. That's the origin of our time counting system of hours, minutes and seconds and our angle measuring system of degrees, minutes and seconds. Anyway, if we someday change our number system away from 10, I hope the change will be towards a smaller base, like 8 or 6, because when one is a child, it sucks to learn the times tables. Base 12 would increase the number of entries by 44%. Base 8 would decrease them by 46%, and base 6 would have only 6*6=36 very easy to remember entries.
Rust is not only a C++ alternative, it's a C alternative too. You don't have to have methods in your structs or use generics, traits, closures, interators etc.
You won't be able to use the standard library, but it's doable.
C has already a syntax way too rich. ISO should fix and remove from the syntax. Yes, break backward compatibility starting from "c24".
Let's go in a fantasy world where we have a B+ language which should have been C: remove typedef/_generic/typeof/restrict/enum/etc, well all those horrible things. Only one loop statement, loop {}, no switch ofc. Only sized/signed core types with properly sized/signed literals: ub/sb or u8/s8 with integer literals like 123ub/123sb, sw/sd..., udw/sdw... , uqw/sqw..., fdw/1.23fdw (f32), fqw/1.23fqw (f64) (for some hardware ymm/xmm/zmm with fancy literals, sorry I am not aware of the fine details of the C/CPP lexer). Ofc, no integer promotion and no implicit cast (void* would still not require any cast). Compile-time and runtime cast (not like with that horrible c++ syntax), same thing for consts (even though nowadays it is the the compiler which detects real compile-time constants), yes you may have to "static const" your literals. Oh, and for the C preprocessor, do fix that variadic-args function-like macro for good (I wonder if gcc the way is not better than c++ ISO way).
Without all that, it should be "reasonable" for a small team of average-skilled or an individual to write a naive compiler until they don't forget they can hit a goto at any line. And they are those who say "but feature A is cheap to implement"... 1 million cheap features later... well, you get the picture.
The really hard part for ISO was to keep C actually "cheap" to implement, this is where they are failing hard.
As for the gcc extension based C dialect for kernel developement, no salvation here, C is really too alien (or even B+). Namely the little ISA abstraction is not "enough", so either the kernel goes full "slow" with plain C and assembly (_not_ inline assembly ofc), or full assembly on all of its fast-paths which would require proper binary defined specs (I am currently researching how bad this could be).
I do believe the open source communities have the strength to maintain parallel assembly _significant_ code bases (which do the same thing), big monolithic corpos... not so sure...
> [...] Why? Because the actual programming is not the main time sink. In a business, what takes time is to actually figure out what the task really is. So something like a 10% or 20% "productivity boost" won't even register.
This reminds me of Amdahl's Law [1], about how a performance optimization (parallelism) has small returns because the performance improvement only affects a small portion of the overall process.
Counterpoint is that a language that allows rapid iteration/prototyping (and thus, increases "programmer productivity") is useful precisely because it helps figuring out what the task really is.
This annoys me so much in Java debates. The argument goes something like "It doesn't matter that Java is verbose and requires hundreds of files because the IDE generates them for you". And this is true... when you write the code. But the real friction is when you must read or change the code in response to new requirements, which is exactly what we as engineers spend the bulk of our time doing.
Ive been thinking about this topic for years and OP was finally able to put it in writing. I dont think anyone would claim english is the best, but it just so happens that the most powerful empire on earth (until recently?) uses it as its primary language. Its good enough at what it does, and has a lot of history backing it up
With this knowledge I choose to align myself with languages and platforms that respect C and complement it (Lua, Objective-C, etc) instead of languages and platforms that try to supplant it (c++, java, swift, rust, etc)
I concur with you - everything I want to do in Rust or Golang, I can do with C+Lua just as easily and with far less tooling hassle. I'm yet to find a good reason to switch from just using C+Lua for everything - but the Rust guys sure are trying. Maybe Rust+Lua makes the most sense, though ..
I will never use any other language for server apps than C. C is a language for getting things done. It's finished and final. It will not change. There will be no new surprises, no new operators, and no unpublished packages. There are just source files. Write, compile, ship, repeat. You used it in 1990 to ship, you used it in 2000 to ship, you used it in 2010 to ship, you used in 2020 to ship, and you will use it in 2030 to ship. Learn once, ship forever.
Aways nice to see good reasoning. Though I'd veer away from it if the task is too large and the program needs to be updated frequently by multiple people.
Also, you got any recommendations for a c library for server apps?
Hypothetical: I need a small, fast systems language to use for writing small, portable programs.^1 Using the language the system is written in would be ideal.
1. In other words, the size of a "hello world" program matters, as does the storage space and time I need for the compiler toolchain and libraries.
What language should I use. I am not interested in writing large, complex programs.
Is there a "standard C" llvm target? That would maybe chip away at the "if you are working on some obscure embedded platform, C might be the only supported high level language" advantage.
(4) is extremely important. Labor pool is smaller and there won't be a wealth of soft documentation and people that can provide immediate support for common questions.
It is, on the other hand it's a lot of easier to write correct code in, well, almost every language other than C.
To give a concrete example, I was able to write production-ready code with 2 weeks of Rust experience and nobody to provide support for questions (beyond reading StackOverflow), and that code was more reliable and less buggy then the Python/JavaScript code our company was otherwise writing, even though we were collectively much more experienced in those stacks. I doubt we'd have gotten that with C, even though we had a bunch of developers who were experienced with C.
And I wrote a firmware in C for 4-quadrant torque control in about the same time. I do not have any formal proof on how buggy/or not it is but the thing runs properly for 10 years already. If that is not "production code" I do not know what is. I am not inexperienced programmer. Rather quite opposite. But this was after I did not touch C and any microcontrollers for like 10 years.
I feel like C is somewhat manageable on microcontrollers. Programs tend to be smaller (micro!), you can often do static allocation which mitigates a lot of the memory management issues, and you're also often dealing with simple types and bit manipulation which is where C shines.
In larger programs on full-fat operating systems, programs tend to be much larger (esp. if you include libraries - dealing with libraries being one of the most complex things in C), and you have to deal with sophisticated allocation patterns, and complex types (webs of pointers), abstractions and business logic which is where C basically leaves you on your own and provides very little in terms of structure or guard rails.
This is how I’ve started to think about things too. If every program was authored by a single individual and was relatively small, C would already be the perfect programming language. All the advanced safety features of competitors really only become relevant once you introduce scale.
I tend to disagree at this notion. I've used C for almost a decade now and I've seen many developers who were far more talented than me make similar mistakes all the time.
Some problems require solutions which are unable to be solved in C in a safe way.
the only better c that will count is if c add some features that are critical like a standard library that is more powerful i know c++ 17 have some improvements but need alternatives for most things is not good for the ecosystem, and the old only if you want, is 2022 most embedded system use this tools, we need c to have a reasonable standard library, the whole world depend on it.
> The status of C as the lingua franca of today's computing makes it worthwhile to write tools for it, so there are many tools being written.
The vast, vast majority of programmers aren’t doing their computing by writing C anymore. It’s been the better part of 20 years since you could plausibly call it the Lingua Franca of computing.
Lingua Franca in this case typically refers to the fact that you write a library with at least C ABI support if you want it to have the largest user base. Almost all languages' FFIs support some C ABI for their libraries.
I don't think this is true, in the general case: Rust has shown that languages can be safe in ways that improve runtime performance.
In particular, languages like Rust allow programmers to express stronger compile-time constraints on runtime behavior, meaning that the compiler can safely omit bounds and other checks that an ordinary C program would require for safety. Similarly, Rust's (lack of) mutable aliasing opens up entire classes of optimizations that are extremely difficult on C programs (to the extent that Rust regularly exposes bugs in LLVM's alias analysis, due to a lack of exercise on C/C++ inputs).
Edit: Other examples include ergonomic static dispatch (Rust makes things like `foo: impl Trait` look dynamic, but they're really static under the hood) and the entire notion of a "zero-cost abstraction" (Rust's abstractions are no worse than their "as if" equivalent, meaning that the programmer is restricted in their ability to create suboptimal implementations).