Mazda cars used to have a bug where they used printf(str) instead of printf("%s", str) and their media system would crash if you tried to play the "99% Invisible" podcast in them. All because the "% In" was parsed as a "%n" with some extra modifiers. https://99percentinvisible.org/episode/the-roman-mars-mazda-...
I don't like a lot of things in C++, but one thing worth praising in particular is std::format
std::format specifically only works for constant† format strings. Not because they can't make it work with a dynamic format, std::vformat is exactly that, but most of the time you don't want and shouldn't use a dynamic format and the choice to refuse dynamic formats in std::format means fewer people are going to end up shooting themselves in the foot.
Because it requires constant formats, std::format also gets to guarantee compile time errors. Too many or not enough arguments? Program won't build. Wrong types? Program won't build. This shifts some nasty errors hard left.
† Not necessarily a literal, any constant expression, so it just needs to have some concrete value when it's compiled.
The flag is -Wformat-nonliteral or -Wformat=2. -Wformat-security only includes a weaker variant that will warn if you pass a variable and no arguments to printf.
One reason is locale-dependent format strings which are loaded from resource files.
Also, in personal projects, I almost always used custom wrapper functions for printf/fprintf/sprintf for various reasons, so that default wouldn’t be of much use, unless maybe I could enable it for the custom functions.
I've never seen locale-dependent format strings work well. The translators will change the formatting codes, and you can't change the order of the formatted arguments. You are much better off with some other mechanism for this.
(I have no recommendations. When I've seen this stuff done properly, on the occasions I've managed not to avoid doing it, it's always been using some in-house system.)
It is specified by POSIX, but not by ISO C (or C++). So most Unix(-like) systems support it. But the printf in Microsoft's C runtime doesn't. However, Microsoft does define an alternative printf function which does, printf_p, so `#define printf printf_p` will get past that.
I think the real reason you rarely see it, is it is only used with internationalisation–the idea being if you translate the format string, the translator may need to reorder the parameters for a natural translation, given differences in word order in different languages. However, a lot of software isn't internationalised, or if it is, the internationalisation is in end-user facing text, which nowadays usually ends up in a GUI or web UI, so printf has less to do with it. And the kind of lower-level tools/components for which people still often use C are less likely to be internationalised, since they are targeted at a technical audience who are expected to be able to read some level of English.
Which is fine for English, but doesn't work at all for other languages. The problem is not just that the plural ending is something other than `s` – if it was just that, it wouldn't be too hard. The problem is that the `count != 1` bit only works for English. For example, while 0 is plural in English, in French it is singular. Many other languages are much more complex. The GNU gettext manual has a chapter which goes into this in great detail – https://www.gnu.org/software/gettext/manual/html_node/Plural...
printf() has zero hope of coping with this complexity. gettext provides a special function to handle this, ngettext(), which is passed the number as a separate argument, so it can select which plural form to use. And then the translated message files contain a header defining how many plural forms that language has, and the rules to choose which one to use. And for some languages it is crazy complex. Arabic is the most extreme, for which the manual gives this plural rule:
That might be possible if the "resource file" was processed at compile time. I've never seen a C toolchain that did it that way, though - I've only seen them read in at runtime. And at that point, the preprocessor can't save you.
Since the locale is only determined at runtime, and might even change at runtime, the format strings are usually dynamically loaded from text files, and are not in the form of string literals seen by the compiler.
In principle, they are not enabled by default because a C compiler must be able to compile standard C by default.
One practical reason I can think of is because not everyone compiles their own code.
You must most definitely look for and enable such flags as they become available in your own projects. (eg I was rooting for -Wlifetime but it did not land for various reasons)
But when you compile other people's code, your breaking your local build doesn't help anyone. Best you can do is to submit a bug report, which may or may not be ignored.
It's a standard library function meaning the compiler can assume that it follows the standard. Specifically for GCC [0]:
> The ISO C90 functions abort, abs, acos, asin, atan2, atan, calloc, ceil, cosh, cos, exit, exp, fabs, floor, fmod, fprintf, fputs, free, frexp, fscanf, isalnum, isalpha, iscntrl, isdigit, isgraph, islower, isprint, ispunct, isspace, isupper, isxdigit, tolower, toupper, labs, ldexp, log10, log, malloc, memchr, memcmp, memcpy, memset, modf, pow, printf, putchar, puts, realloc, scanf, sinh, sin, snprintf, sprintf, sqrt, sscanf, strcat, strchr, strcmp, strcpy, strcspn, strlen, strncat, strncmp, strncpy, strpbrk, strrchr, strspn, strstr, tanh, tan, vfprintf, vprintf and vsprintf are all recognized as built-in functions unless -fno-builtin is specified (or -fno-builtin-function is specified for an individual function).
Builtin here doesn't mean that GCC won't ever emit calls to library functions, only that it reserves not to and allows itselfs to make assumptions about how the functions work, including diagnosing misuse.
The library functions themselves might also be marked with __attribute__(format(...)) as the sibling comment notes but that is not necessarily required for GCC to check the format strings.
The %n functionality also makes printf accidentally Turing-complete even with a well-formed set of arguments. A game of tic-tac-toe written in the format string is a winner of the 27th IOCCC.
- sez wiki.
A not so fun fact:
Because the %n format is inherently insecure, it's disabled by default.
>The %n functionality also makes printf accidentally Turing-complete
No it doesn't. Printf has no way to loop so it's not Turing complete. Even if you did what the IOCCC entry did with putting it into a loop it still wouldn't be Turing complete as it would not have an infinite memory.
Nothing real is "Turing complete" if it requires infinite memory. That's a property only abstract machines can have. In common parlance, something is Turing complete if it can compute arbitrary programs that can be computed with M bits of memory, where M is arbitrary but finite.
I'd say they're Turing complete (in common parlance) if they can reasonably viewed as a Turing-complete system that's been hobbled with an arbitrary memory limitation. FSAs generally can't be viewed this way as you can't just "add more memory" to an FSA. By way of contrast, consider a pushdown automaton with two stacks. While any physically real implementation of such a device will necessarily have some kind of limit on the size of the stacks, you can easily see how the device would behave if this limit were somehow removed.
It's definitely a bit fuzzy. I'm sure lots of philosophy papers have been written on when exactly it is or isn't appropriate to consider a finite computational system as a finite approximation to a Turing-complete system. In realistic everyday cases, however, it's usually clear enough what should and shouldn't count as such.
Yes, but it also changes the state transition logic. You can't just 'add 100 more states' to an FSA in the same way that you can 'add 100 more stack slots' to a bounded pushdown automaton.
As I said previously, these are somewhat fuzzy distinctions, and I'm not saying that they're easy to make mathematically precise. They do however seem clear enough in most cases of practical interest. There are many real-world computing systems that would be Turing-complete if they had unbounded memory. There are others that are not Turing-complete for more fundamental reasons than memory limitations. Again, I acknowledge that 'more fundamental' is not a mathematically precise concept.
Well, I'm pretty sure all existing digital computers are finite state automata, so they are not, strictly speaking, Turing complete. But that doesn't make any sense.
Strictly speaking there are essentially no programming languages that are even theoretically Turing-complete because they can only address a bounded amount of memory. For example, in C `sizeof(void*)` must be a well-defined, finite integer. But that definition is not useful in practical use.
Even if that was true, any definition of Turing completeness that includes JavaScript and excludes C is worse than useless in practice. It's useless for communication, it's useless for education, it's useless for reasoning about capabilities. There's simply no place for such a definition in a civilized society.
Turing machines themselves are a useless concept in our society. Since C is lower level and tied more to the physical hardware it makes sense that it is not Turing complete because Turing machines are not applicable to the real world. Computers do not work anything like an infinite tape. I've never seen a practical program have to implement a Turing machine.
This is one of those annoying little problems that is easily picked up by the vet command (https://pkg.go.dev/cmd/vet) when writing Go code. There are, of course, many linters that do the same thing in C, but it's nice to have an authoritative one built in as part of the official Go toolchain, so everyone's code undergoes the same basic checks.
Very nice collection. My favorite C feature is actually a gcc/clang feature : the __INCLUDE_LEVEL__ predefined macro. It made me code&maintain my C projects exactly twice as fast as before because file count dropped to half : https://github.com/milgra/headerlessc .
Is having two files really that much of a bother? I have my editor set switch between the .c(pp) and the .h with a keyboard shortcut and that seems easier than scrolling between declaration and definition when you want to change something.
I just include them, everything works as before. If I have to create a library then I create a separate header file for the api functions and only the internals are headerless.
> This qualifier tells the compiler that a variable may be accessed by other means than the current code (e.g. by code run in another thread or it's MMIO device), thus to not optimize away reads and writes to this resource.
It's dangerous to mention cross-thread data access as a use case for volatile. In standard C, modifying any non-atomic value on one thread, while accessing it on another thread without synchronization, is always UB. Volatile variables do not get any exemption from this rule. In practice, the symptoms of such a data race include the modification not being visible on the other thread, or the modified value getting torn between its old and new states.
These days it can be chained with _Atomic to achieve the desired effect. That said, oftentimes you need more serious synchronization mechanisms your library would provide.
_Atomic is indeed the correct qualifier to use for unsynchronized cross-thread access. The volatile qualifier doesn't add anything useful on top of that. Really, the only things volatile should be used for are MMIO, debugging, performance testing, and certain situations with signal handlers or setjmp within a single thread.
From what I gather, _Atomic alone will not ensure that the variable contents are actually loaded every time you load them, and can optimize loops away as a result. You'll often want both.
Sure, in principle, compilers can combine certain repeated atomic accesses. But in practice, compilers respect the fact that intervening code takes a nonzero amount of time to run, and always try to load the latest value of the variable. (I am entirely unable to coerce a compiler into combining atomic accesses, even with memory_order_relaxed where it would theoretically be permissible.) Volatile accesses are the same in the sense that the compiler can move the rest of the code around it (and are known to have done so in practice): the only difference is that repeated volatile accesses can't be combined even in theory, and they can't be omitted even if the result is discarded.
What use case do you have in mind where this theoretical possibility would cause issues?
"Expert C Programming: Deep C Secrets" is a really good book to learn a lot of C tricks and quirks, plus some history. I read it a few years ago and loved it.
I was a grad when I read it and remember annoying my older coworkers for a few weeks with little gotchas I picked up. "hey what do you think THIS example prints?" "Stop sending me these!"
When I wrote a lot of cross platform performant numerics I loved the bus error thrown by the Sun machines.
In essence it was a RISC machine saying "I can't deal with unaligned data" and a sign that code was (say) storing a 32 or 64 bit int that straddled a word boundary and the hardware was not coping on a MOV.
Why did that matter and why was it good thing?
It forced programmers to think about data alignment and resulted in code that run faster on CISC Intel chips.
The dirty secret about Intel chips was they "just did it" w/out complaint - and it slowed them down significantly on pipelined computations if they were constantly double handling unaligned data to get it from memory (across a word boundary) to bus (aligned for transit) to memry again (across a word boundary).
Compound Literals in C are great. They're no surprise to anyone coming from more sophisticated languages, but I've never seen them used in the C codebases I've worked on.
What with C also allowing structures as return values, another rarely-used feature, they're really useful for allowing a richer API than the historical `int foo(...)` that so many people are used to seeing.
C has so much legacy that it's really hard for even decades-old (C99!) feature to impose themselves. Or perhaps that's MSVC's lagging support that's to blame :p
I remember working on a commercial project in the mid-2000's that still had #ifdefs for K&R C prototypes (meaning, pre-ANSI C.) This was a recent-ish project at the time, started in 2000. Were people going to go back in time and compile it on an old architecture? I doubt it.
MSVC supports C11 and C17, minus the C99 stuff that was made optional in C11.
Anyway given the option, one should always favour C++ over C, if they care about secure code, which while not perfect it is much better than any C compiler will do.
> Anyway given the option, one should always favour C++ over C
Eh. I work in embedded, where C reigns supreme. C++ has its own issues in the area, namely that you need to construct your own sub-dialect that removes some features of C++ to make it fit embedded constraints. Commonly, it's C++-but-no-exceptions, sometimes C++-but-no-templates, and others.
That said, I'll grant that us embedded developers are effectively "traumatized" and have difficulty accepting new approaches because we're too focused on certain paradigms that are no longer relevant (See my previous about returning structures, which has encountered responses like "but then it might do an extra memcpy()!!11")
I can understand exceptions, but what constraints require you to ban templates? If its just code size then it seems a bit arbitrary to ban them completely.
AFAIK most users of C++ do ban some features in their projects so I don't see why that specifically is holding embedded back. Disabling exceptions specifically is something that is not unheard of outside embedded either.
The bit about "register" is old enough that I don't think it's meaningful anymore.
The stock verbiage about how modern compilers ignore "register" because they can do better but it may be useful on simpler ones, has been around in this exact form 20 years ago already. And one curious thing is that even back then, such statements would never list specific compilers where "register" still did something useful.
So far as I can tell, "register" was in actual use back when many C compilers were still single-pass, or at least didn't have a full-fledged AST, and thus their ability to do things like escape analysis was limited. With that in mind, "register" was basically a promise to such a compiler to not take the address of a local in the function body (this is the only standard way in which it affects C semantics!). But we haven't had such compilers for a very long time now, even when targeting embedded - the compilers themselves run on full-power hardware, so there's no reason for them to take shortcuts.
It definitely is, but it's that ability to bind it to a specific register that makes all the difference - and, of course, that can't be portable C (although it would be nice if we had non-portable supersets per arch, so that e.g. all x86 compilers would do this the same way).
I think register is closer to const, as in: it's a hint to the programmer not the compiler.
So if you want to make absolutely sure that a variable can always be in a register then you should consider adding the register specifier to stop other programmers from taking the address of that variable.
Taking the address of a variable does not prevent the compiler from putting it in a register. Indeed, it can be convenient to have small utility functions which are always inlined, which take pointers to their results; this should not and does not prevent those results from staying in registers.
Correct, but preventing people from taking the address of things actually makes a difference for certain constructs in the standard (fore example, when it comes to trap representations).
Yes, you can treat "register" as a hint "no pointer to this ever exists anywhere in this program" for the reader. But you can only use it for locals, and that kind of metadata would be most useful (to other devs) on globals and fields - they can already see if a local ever has & applied to it or not.
OTOH I didn't recall this bit, but apparently you can also apply it to arrays. Which keeps them indexable, but you can't take addresses of elements nor of the whole array anymore. Now I'm not sure if the compiler can do anything useful with this, but it would allow it to play with alignment and padding - e.g. storing "register bool x[10]" as 10 words rather than 10 bytes. Is there any architecture on which that would be beneficial, though?
Never quite understood why compound literals are lvalues, but fine, whatever, I guess, it's so that you can write "&(struct Foo){};" instead of "struct Foo tmp; &tmp;"... which, on a tangential note, reminds me about Go: the proposals to make things like &5 and &true legal in Go were rejected because "the implied semantics would be unclear" even though &structFoo{} is legal and apparently has obvious semantics.
It's useful when a function has a out or in/out struct parameter whose value at the end you're not interested in. Or in functions where the struct is an input parameter, but they return it as a return value too, which you can then assign to a pointer variable or immediately pass to another function.
Note that the struct values thus created have longer lifetimes than temporary C++ objects created directly inside the argument list of a function call.
In C compound literals have a relatively long lifetime compared to C++ temporaries. With these lifetime rules it makes sense that they are lvalues, although I like C++ rvalues (especially prvalues) more.
> If the compound literal occurs outside the body of a function, the object has static storage duration; otherwise, it has automatic storage duration associated with the enclosing block.
Compound literals are anonymous variables, i.e. like variables except that there is no label visible in the C program. If you look at the assembly you'll see that they are declared exactly like a variable except with a compiler generated internal label.
That's extemely useful for invoking a function that takes a pointer to a struct, but you don't want 'taint' the code with a temporary variable that's only needed for the function call:
The enum idea is interesting. I've previously used an extern with a conditional size of either 1 (valid) or -1 (invalid). This requires no additional boilerplate, and is #define-able into a static assert when built with a recent enough compiler. Something like this, from memory:
#define STATIC_ASSERT(COND) extern char static_assert_cond_[(COND)?1:-1] /* C99 or earlier */
#define STATIC_ASSERT(COND) _Static_assert(COND) /* C11 or later */
As both are declarations, I don't think you'll end up in a situation where one is valid and the other isn't - but I could be wrong, and I suspect it would rarely matter in practice anyway.
the c training course at a popular uk training company (the instruction set) had duff's device on something like page 5 of their c course - expunging it was one of the first things i did when i joined them. there were many others.
Designated init and compound literals were added in C99. I think there are two reasons for those features not being better known:
1) C++ 'forked' their C subset before C99 (ca. "C95"), and while C++20 finally got its own version of designated init, this has so many restrictions compared to C99 that it is basically pointless.
2) MSVC hasn't supported any important C99 features until around 2016
There is a reason why modern languages use keywords like "func", "def", "fn", "var", "let" to discern between different types of declarations, for example. I dont think many languages are LL(k) (please correct me if im wrong), but C is as far away from that as it gets, for small k.
I've thought it would be nice to add func to the language. Even if it's just sugar it would help. Type inference + allowing functions to return an anonymous struct or a tuple would be super.
The simplest languages tend to be the most difficult. Brainfuck, Binary Lambda Calculus, Unlambda, and other "Turing tarpits" are all extremely difficult to use for anything even mildly complex.
Stack machines look cool until you realize you could save all the stack mumbling by using registers. At this point, you could try to make a full VM instead.
I always felt like unlambda and other SKI calculus esque esolangs(iota comes to mind) could have some kinda strange use case in some kind of generalised genetic programming. It should be possible to create a binary notation for SKI calculus where arbitrary bitstrings will be valid, and so one could randomly mutate and recombine arbitrary programs. Though I've never delved deeper into genetic algorithms and evolutionary programming, my sense is that genetic algorithms tend to be restricted to parameterised algorithms where the "genes" determine the various parameters. Which can be great for optimisation problems.
It's one of those weird ideas I've had kicking about for years but never did anything about, and yet I keep coming back to it.
I assume that an invalid program would not compile/parse, and so would die and fail to reproduce. The issue is more that if the space of invalid programs is too large compared to the space of valid ones, generating valid offspring by combining two programs would be too rare and the population would die off.
Though if the space is small enough I imagine you could get past that. It's a bit of a gnarly point, hard to tell how this would turn out without trying I suppose.
As for the halting problem there's of course no clever solution there other than limiting CPU time. So I guess pick a reasonable limit that makes sense for whatever you're trying to do.
One non-obvious thing about named function types is that they can also be used to declare (but not define) functions:
typedef void func(int);
func f;
void f(int) {}
I don't think I've ever seen a practical use for this in C, though. In C++, where this also works, and extends to member functions, this can be very occasionally useful in conjunction with decltype to assert that a function has signature identical to some other function - e.g. when you're intercepting and detouring some shared library calls:
int foo();
decltype(foo) bar;
I suppose with typeof() in C23 this might also become more interesting.
Oh yes! I totally agree with that approach; I find it much clearer to typedef the function type rather than the function pointer type:
typedef int (*CallbackPtr)(int,int);
void foo(..., CallbackPtr callback);
always reads less clearly to me than
typedef int Callback(int,int);
void foo(..., Callback* callback);
I find this especially useful in C++ where the callback type could conceivably be some other type like std::function. Seeing that * helps me know at a glance it's probably just a plain old function pointer.
Though I think maybe clearest of all is to not use a typedef, provided it doesn't cause other readability problems:
void foo(..., int (*callback)(int,int));
(Not meaning to steal your thunder here... just wanted to write out an example in case anyone else was curious.)
At that point you might as well typedef the function pointer type anyway, since it's just another *, and const/volatile variations for functions don't make sense.
int n = 3, m = 4;
int (*matrix_NxM)[n][m] = malloc(sizeof *matrix_NxM); // `n` and `m` are variables with dimensions known at runtime, not compile time
if (matrix_NxM) {
// (*matrix_NxM)[i][j] = ...;
free(matrix_NxM);
}
Well, that makes much easier a few things I'm doing atm, really glad I read it.
There are three macros which I find indispensable and which I use in all my C projects, namely LEN, NEW and NEW_ARRAY. I keep them in a file named Util.h:
I found a lot of bugs went away when I switched to STL (Standard Template Library) arrays and ditched managing my own memory. That's C++, I guess it's not available in straight C?
>That's C++, I guess it's not available in straight C?
No, because C doesn't have templates. The best you can do for a "vector" in C is macros like above, that also realloc, or write an API around structs for each type.
Too bad. STL deque's are non-contiguous, allow for much bigger arrays. I had an application that used vector<>, ran out of contiguous memory. deque<> solved the problem.
I wish there was a language "between" assembly and C: basically assembly with some quality-of-life improvements.
Shortcuts to reduce redundant chores (like those multiple instructions to load one 64-bit number into an ARM register) but minimal "magic" or unintended consequences as in C. Things like maybe a function call syntax like:
I don't believe those are functionally / semantically equivalent - couldn't care less does imply a min() value of care.
In contrast, the author is suggesting a comparative only.
And, on careful re-reading, I suspect the author is having a play on syntax & semantics here -- the context of the quote is:
> You may ask, since when C has such operator and the answer is: since never. --> is not an operator, but two separate operators -- and > written in a way they look like one. It's possible, because C cares less than more about whitespace.
Given that '--' is decrement (kind of 'lessen') and > is greater than (kind of 'more'). Perhaps I am reading too much into that.
(I feel 'couldn't care less' is perhaps more common in northern America than elsewhere, and while TFA has a Gabon TLD, appears to be resident in Poland, so automatically receives a lot of leeway in their use of idiomatic English.)
> The 0 width field tells that the following bit fields should be set on the next atomic entity (char).
This isn't correct since int can't be less than 16-bits. Fields are placed on the nearest natural alignment for the target platform, which might not support unaligned access.
C is fundamentally confused, because it offers (near) machine-level specifications but then leaves just enough wiggle room for compilers to "optimize" (through alignment and such) while ruining the precision of a specification. You end up not getting exactly what you want at the machine level. It's infuriating.
The bitfield stuff in C would be fantastic if it weren't fundamentally broken. E.g. some Microsoft compilers in the past interpreted bit fields as signed...always. In V8 we had a work around with templates to avoid bitfields altogether. Fail.
I think the problem with this is the C compiler has to find a solution which work with all the architectures it is expected to support. It order to achieve this, it must generalize in some areas and have flexibility in others. C programmers are required to be familiar with both the specifics of the architectures they are building for and the idiosyncrasies of their complier. I always assumed most other compiled languages were like this since I started with C and moved to x86 assembly from there. However, the more I read about people disliking C for this reason, the more I believe this may not be the case.
> The bitfield stuff in C would be fantastic if it weren't fundamentally broken
Bitfields in C can be manageable. Each compiler has it's own set of rules for how it prefers to arrange and pack them. Despite them not always being intuitive, I use them regularly since they are so succinct. If you are concerned about faithful and predictable ordering you generally just have a test program which uses known values to verify your layout as part of your build configuration or test battery.
> ... some Microsoft compilers ...
I've used many C compilers but I have always avoided microsoft ones, going so far to carry a floppy disc with my own when working in the lab at school.
For the use case of specifying a more efficient representation of a fiction confined to the program, then no harm, no foul. But the use case of specifying a hardware- or network- or ABI-specified data layout, then you need those bits in exactly the right spot, and the compiler should have no freedom whatsoever. (I'm thinking of the case of network protocol packets and hardware memory-mapped registers).
This shouldn't be an issue unless you are attempting to have the bit field span across your type boundary. Bit fields by definition are constituents of a type. It doesn't restrict you from putting the bits where you want them but how you accomplish that. In this case, you'd either have to split the value across the boundary into two fields or use a combination of struct/unions to create an aliased bit field using a misaligned type (architecture permitting of course). You either sacrifice some convenience (split value) or some readability (union) but it is still reasonable.
The compiler itself is not taking a liberal approach to bit field management, it is only working within the restriction of the type (I am speaking for GCC here, I can't vouch for others). But if you think of them as an interface to store packed binary data freely without limitations I can understand why they seem frustrating. They are much more intuitive when you consider them as being restricted to the type.
I’ve seen them used for hardware register access a lot. But there were usually a separate set of header files for when you are not using GCC/Clang - I didn’t look at those
The entire placement of bitfields is implementation defined.
So yeah, the placement in memory might be `xxxxx000 00000000 yyyyyyy0` but it could also be `yyyyyyy0 00000000 xxxxx000` or `yyyyyyy0 00000000 00000000 xxxxx000` or anything else.
Bitfields are very misunderstood and really only safe to treat as an ADT with access through their named API, not their bit placement ABI. People misuse them a lot.
You can just expand your example to use 16-bit values or switch to uint8_t. Bitfields with signed integers are also a minefield so it's best to never attempt it.
Can't say for all, but I am reasonably certain C++ does not support designated initializers/sparse array definitions. Some of these features where added in more recent revisions to the C specification from which C++ has diverged from. I would expect most of the differences would become more pronounced starting with C99.
I remember once upon a time I thought C was fairly simple, so I decided to write a program to generate ASTs from C programs. I was very wrong and it was kind of a nightmare. There are so many weird little quirks or lesser-used features that I never saw in the wild even in large production codebases; I feel like you really don't _need_ a lot of these features. I can't imagine doing proper compiler work, especially for something like C++. Nice article.
> I remember once upon a time I thought C was fairly simple, so I decided to write a program to generate ASTs from C programs.
Oh man, I think we all have been this young and naive at some point.
I have spent time working with compilers for this purpose (having realized I did not want to attempt parsing source and generating the AST) and decided it is much easier to let them do the work. That being said, it can still be more than a handful (both GCC and Clang have their eccentricities) and depending on how you are using it you still might be in over your head.
When you start a project like this and end up failing because you simply do not have the depth of knowledge or time to see it to completion it often feels a bit demoralizing from the loss of investment. Truthfully though, having started many such ventures (emulators for 6502 and 80386 to name a few), you get all the benefit of experience from working on a difficult problems without the misery of debugging and model checking until everything until is more/less perfect. It's great fun, you learn a lot, and you should never avoid trying simply because it might be too much to handle.
It’s cool to have these, it’s fun to use them for fun.
But please don’t use them in production code.
Also don’t assume most of them will he known by other developers.
Well, it’s either a lesser known trick or it’s something people should be using.
In general using lesser known tricks it’s not a good idea for production code. But I understand there are cases where there is no good alternative, so it’s warranted.
My point is that these aren't "lesser known tricks". They're important language features which solve real problems and which anyone writing production C should be at least aware of, if not actively using for the advantages they provide.
I was taking for granted the storyline, and assumed they are indeed lesser known and tricks. To me it’s never about what I know, it’s about what the others know, my production code is not mine alone, and I also don’t want to be responsible for it forever. So I try to be explicit and use no tricks wherever possible.
I agree with this in principle, but where do you draw the line between language features (good) and obscure tricks (bad)? Is your team really writing only ANSI C89 from the K&R 2nd edition book and ignoring the last ~35 years of language improvements?
Some of those are regular modern C features and definitely should be used, most importantly compound literals and designated init, since they make the code both safer and more readable (and I blame the Visual Studio compiler team for dragging their asses for 16 years to support at least a subset of C99 for those features to be 'lesser known').
Mazda cars used to have a bug where they used printf(str) instead of printf("%s", str) and their media system would crash if you tried to play the "99% Invisible" podcast in them. All because the "% In" was parsed as a "%n" with some extra modifiers. https://99percentinvisible.org/episode/the-roman-mars-mazda-...