A macro is effectively preprocessing facilitated by the language. You could always preprosess externally if you wanted, and there's nothing stopping you from doing that in the "Powerful Language (TM)" either.
Now whether people use macros and preprocessing usefully is another question, but not one to which the answer is "abolish macros for more language features". When used correctly, macros ARE power.
Macros are useful, as long as they're used sparingly. I think that in this case, it's used well - the struct is still perfectly readable, and the sole purpose of it is to make it so that you don't have to manually name the dummy fields. But you could totally just write out dummy1, dummy2, dummy3 etc. yourself if you want to get rid of the macros.
> Macros are useful, as long as they're used sparingly.
Everybody says that. Everybody believes it. And everybody goes to town making a rat's nest with macros, just like that snarl of cables under my desk that resist all attempts to make it nice.
Myself included. I've even written an article about clever C macros. Look, ma! I was so proud of myself.
But then I got older. I started replacing the macros in my C code with regular code. It turns out they weren't that necessary at all. I liked the C code a lot better when it didn't have a single # in it other than #include.
I want to be clear about your meaning, because I don’t know if I’m reading your comment correctly. Are you referring explicitly to syntax based, preprocessor macros? Or does your comment extend to other metaprogramming techniques? I am inclined to think you mean the first considering the amount of emphasis on generic programming in D? Just curious.
I'm referring to both syntax based (AST) macros, and text based (preprocessor) macros. The latter, of course, are much worse.
An example of the former is so-called "expression templates" in C++. I've seen them used to create a regular expression language using C++ expression templates. The author was quite proud of them, and indeed they were very clever.
However nice the execution, the concept was terrible. There was no way to visually tell that some ordinary code was actually doing regular expressions.
C++ expression templates had their day in the sun, but fortunately they seem to have been thrown onto the trash pile of sounds-like-a-good-idea-but-oops.
(I wrote an article showing how to do expression templates in D, mainly to answer criticisms that D couldn't do it, not because it was a good idea.)
> Anytime macros are used for metaprogramming, it's time to reach for a more powerful language.
> I'm referring to both syntax based (AST) macros ...
This surprises me greatly. Various lisps are among the most powerful languages I know of and a large part of the reason is macros coupled with their ability to execute arbitrary code at compile time (which itself uses additional macros, which in turn invoke more code, and so on). What's your take on this?
I'm a noob when it comes to Lithp. But I'm told what happens with their macros is the language is fairly unusable until you write lots of macros. The macros then become your personal undocumented wacky language, which nobody else is able to use.
I've seen this happen with assembler macro languages, too.
> most powerful
It's like putting a 1000 hp motor in a car. It's main use is to wreck the car and kill the driver.
BTW, D is the first language of its type (curly brace static compilation) to be able to execute arbitrary code at compile time. It started as kind of "let's see what happens if I implement this", and it spawned an explosion of creativity. It has since been adopted by other languages.
> the language is fairly unusable until you write lots of macros
This is not (typically) the case. It would be like saying that you need to write lots of templates to get things done in D. Metaprogramming is certainly very nice to have but it's not a requirement for the vast majority of tasks.
It's important to note that Lisps are an entire family of languages; some implementations are batteries included while others are extremely minimal. Where things can get a bit confusing is that many macro implementations are so seamless that significant pieces of core language functionality are built in them. Schemes tend to take this to an extreme, with many constructs that I would consider essential to productive use of the language provided as SRFIs.
> macros then become your personal undocumented wacky language
That's Doing It Wrong™. You could as well argue to remove goto from a language because sometimes people abuse it and write spaghetti. C++ has operator overloading. D has alias this. If (for example) a DSL is the appropriate tool then being able to use macros to integrate it seamlessly into the host language is a good thing.
> it's not a requirement for the vast majority of tasks.
Right, but the temptation to do it is irresistible.
> That's Doing It Wrong™
Of course it's doing it wrong. The point is, that seems to always happen because the temptation is irresistible.
> goto
I rarely see a goto anymore. It just doesn't have the temptation that macros do.
> alias this
Has turned out to be a mistake.
> integrate it seamlessly into the host language is a good thing
Supporting the creation of embedded DSLs is a good thing. Hijacking the syntax of the language to create your own language is a bad thing. I've seen it over and over, it never works out very well. It's one of those things you just have to experience to realize it.
D's support for DSLs comes from its ability to manipulate string literals at compile time, generate new strings, and mixin those strings into the code. This is clearly distinguishable in the source code from ASTs.
Fair enough, it seems we have very different philosophies on this particular issue. Without oft maligned features such as alias this and operator overloading I never would have given D a try as an alternative to C++.
I'd like to suggest that I think you might be missing some perspective here. You say that misuse of macros always happens and that you've seen it over and over. Yet if you explore the Scheme ecosystem you might notice that significant parts of any given implementation often take the form of macros. Racket in particular fully embraces the idea of the programmer mixing customized languages together and while examples of bad code certainly exist it seems to work out quite well on the whole.
To be clear, I do appreciate having easy access to tools that are simple and safe. I just also like having seamless access to and interop with a set of powerful ones that don't try to protect me from my own poor decisions. I shouldn't need to do extra work to make use of an alternative more powerful tool for a small part of a project. At that point it becomes very tempting to drop the safer tool altogether in favor of the more powerful one just to avoid the obviously needless and therefore particularly irritating overhead.
I much prefer the approach of providing limited language subsets that can be opted into and out of in a targeted manner. Having the compiler enforce a simple one by default provides a set of guard rails without getting in the way when it matters.
If I could write the majority of my code in something resembling Go and just a small bit of it in an alternative dialect with expressive power comparable to Common Lisp that would be ideal. To that end, I'm a huge fan of features like @system, @safe, and @nogc in D while very much disliking the need to use string mixins to write a DSL, the various restrictions placed on CTFE behavior, and other similar things.
> BTW, D is the first language of its type (curly brace static compilation) to be able to execute arbitrary code at compile time. It started as kind of "let's see what happens if I implement this", and it spawned an explosion of creativity. It has since been adopted by other languages.
I don't know much about D compile time evaluation. How is it better than macros/templates?
Also, what do you think about Haskell's and Rust's approach of generics with typeclass/trait bounds?
> D compile time evaluation. How is it better than macros/templates?
CTFE isn't better than templates, it's a completely different tool. CTFE computes a result at compile time as though you had written a literal in the source code. Templates generate blocks of specialized code on the fly based on various parameters (typically types). They solve different problems.
I think that Walter was trying to say that D gets by doing the same things that people use macros for, but without macros. I interpreted this as meaning that D uses CTFE in those same situations. Am I wrong?
I think he was saying that sufficiently powerful languages provide enough features that you won't need macros in the first place. If you feel that you do for some reason, go find a more powerful language instead.
> What does D do, then?
Most mainstream languages, D included, very intentionally don't provide any features that could potentially be used to extend the language itself on the fly. (At least not in a straightforward manner. Obviously the C preprocessor kind of sort of facilitates a bit of this.)
As a counterexample, Rust does provide some of this in the form of procedural macros but doesn't provide (to the best of my knowledge) an equivalent to Lisp reader macros.
I don't know what code Walter is referring to but I had similar experiences during a presentation of the Boost Spirit library (at our Silicon Valley C++ Meeting; must be about a decade ago now).
The speaker was proud and beaming for showing how powerful C++ is and the audience was in awe.
I was incredulous! Jaw open! The "solution" was horrible with a bunch of workarounds for a bunch of shortcomings. It was a "the emperor does not have cloths" moment for me.
Boost Spirit may be better today with newer C++ features; I don't know.
Oh man, you sure know how to drive a stake in my heart! What have I done! Some curses should just not be uttered. I'm not sure where it is on my backups.
But it was done just like you'd do it in C++. The same thing, just with D syntax.
We're making a numerical computing library that expression templates are used for compile-time evaluation of expressions. It's a very niche hpc library, so it might be one of the few places it's appropriate.
Doesn’t work in a lot of cases unfortunately. If you’re writing a library designed to be consumed by other languages you’re stuck with writing C abi compatible code which can be written in other languages that can “extern” them but it puts limits on what’s possible in those libraries.
It is a terrible thing. It's possible to do without them, and you'll like your code better. Your symbolic debugger and syntax directed editor will work properly. The poor schlub who has to fix the bugs in your code after you leave will be grateful. Your spouse will be happy and your children will prosper.
For example,
#define foo(x) ((x) + 1)
replace with:
int foo(int x) { return x + 1; }
The compiler will inline it for you. There's no penalty.
#define FOO 3
Replace with:
enum { FOO = 3 };
or:
const int FOO = 3;
Replace:
#if FOO == 3
bar();
#endif
with:
if (FOO == 3) bar();
The optimizer will delete bar() if FOO is a manifest constant that is not 3.
Having made a significant contribution to Simon Tatham's PuTTY, I have to respectfully disagree. At the time the entire SSHv2 key exchange and user authentication protocols were implemented in a single function using a massive Duff's device (for asynchrony) implemented with C metaprogramming macros. It was a surprisingly pleasant experience.
I think it was RMS who argued that conditional compilation should be considered harmful in general. Code that you put in an inactive #if(def) block will not be maintained, and is basically guaranteed to rot. If it's needed in the future, it'll likely have to be rewritten from scratch.
According to this stance, any code that's suppressed by the C preprocessor should either be written in an if {} statement so that it will at least continue to compile as the surrounding code changes, or be replaced with comments describing what it does (or did), if it's important enough to keep track of.
Can't really think of many good counterarguments to this. Machine dependence might be one, but then you could argue that the preprocessor is being used to cover up for an inadequate HAL.
Making it clear in the code itself that something is platform specific is useful as well. Also, I'd say such stubs and extra files would clutter the project.
I tried it your way for years. It's common practice in the industry. I have written and seen plenty of rats' nests of #if and #ifdef snarls so bad one cannot figure out what is being compiled and what isn't without running a just-the-preprocessor pass. It will often #if on the wrong condition, like operating system when it's a CPU dependency.
Rewriting it the way I suggest tends to force a cleanup of all that. You'll like the results.
BTW, if you want to try transitioning to the next level, try getting rid of all the `if` statements, too!
They can be made to work by introducing stubs for every os/arch/compiler/dependency I support, but that's way more work for dubious gain.
And who is to say I'm doing it wrong? Just because conditional compilation CAN lead to a rat nest doesn't mean it MUST. And if it happens to be well-organized and works as is, there is no problem.
Oh, have I heard that before. Here's what happens, over the years, to such code:
1. #ifdef's on the wrong feature. For example, #ifdef on operating system for a CPU feature.
2. Overly complex #if expressions, that steadily get worse.
3. Fixing support for X that subtlety breaks F and Q support. It goes unnoticed because your build/test machines are not F and Q.
4. People having no idea what #defines to set on the command line, nor what the ones that are set are doing (if anything).
4. People having no idea how to fold in support for new configurations.
5. Code will #ifdef on predefined macros that have little or no documentation on what they're for or under exactly what circumstances they are set. If the writer even bothered to research that. gcc predefines what, 400 macros?
> Just because conditional compilation CAN lead to a rat nest doesn't mean it MUST.
> Here's what happens, over the years, to such code:
Over how many years? Some of my projects are 30+ years old and support a few dozen combinations of OSes, compilers, and architectures, and the method I described hasn't led to a rat's nest yet.
> #ifdef's on the wrong feature. For example, #ifdef on operating system for a CPU feature
Sure, same thing happens when you decide whether to compile in your stubs or not. But if I make a mistake, the code usually won't compile on the configuration in question; if you make a similar mistake, it'll compile with your stub, it might even link, but you'll get odd runtime behavior. Sounds worse.
> Overly complex #if expressions, that steadily get worse
All code gets worse over time and bit rots if you don't maintain it, I don't see things necessarily worse in this area.
> Fixing support for X that subtlety breaks F and Q support. It goes unnoticed because your build/test machines are not F and Q
If you aren't testing on tier one configurations, they shouldn't be tier one. And if a tier three configuration breaks when you do finally get around to testing it, it's tier three, so it gets fixed when it can be and there's no problem. Similar to normal code, I'd say.
> People having no idea what #defines to set on the command line, nor what the ones that are set are doing (if anything)
> People having no idea how to fold in support for new configurations
These problems exist for normal code as well. Someone has to own the build configurations, and they have to be maintained, like anything else.
> Code will #ifdef on predefined macros that have little or no documentation on what they're for or under exactly what circumstances they are set. If the writer even bothered to research that. gcc predefines what, 400 macros?
We only use a handful, and they are fairly well documented. But even if they weren't, since the code works on all tier one platforms, we'd know if something was changed out from underneath us because it was underspecified or accidentally relied upon.
But again the same can happen with normal code and system headers, compiler extensions, etc - I don't see why pre-processor macros have to be singled out here.
One of the very few things from C that I miss in C++ is anonymous structs and enums. I really don’t understand why they are not allowed.
That is, C style enums don’t have to have a name but “type safe” (enum class) ones do. One classic use is to name an otherwise boolean option in a function signature; there’s typically no need to otherwise name it.
C++ incompatibly requires a name for all struct and class declarations, again a waste when you will only have a single object of a given type.
> I really don’t understand why they are not allowed.
I don't, either. Such were in D from 2000 or so.
I also don't understand why `class` in C++ sits in the tag name space. I wrote Bjarne in the 1980s asking him to remove it from the tag name space, as the tag name space is an abomination. He replied that there was too much water under that bridge.
D doesn't have the tag name space, and in 20 years not a single person has asked for it.
This did cause some trouble for me with ImportC to support things like:
struct S { ... };
int S;
but I found a way. Although such code is an abomination. I've only seen it in the wild in system .h files.
The only explanation I saw was that C++ standards guys were horrified by the idea of unpredictable side effects as a result of initialization of a struct.
I think C++ though is adding them.
What I'd like in c is designated function parameters.
// these the same
bar(.a = 10, .b = 12);
bar(.b = 12, .a = 10);
The main drawback is that all parameters are now optional: it will not complain if you forget to assign all parameters, it will silently set them to 0 :-/
> The only explanation I saw was that C++ standards guys were horrified by the idea of unpredictable side effects as a result of initialization of a struct.
I don't understand. How would struct or class initialization be any different from simply doing, say, `for (auto& a : { x, y, z }) frob (a);` which is perfectly legal?
I didn't mention. I think the thought was with designated initializers the order of initialization is what? The order of the elements of the struct? Or the order where it's initialized. In C probably matters little as side effects are usually blatant. C++ I think cryptic side effects are common.
No, they are forbidden by the standard (take a look at cppreference). Some compilers implement the C behavior as an extension, so tell your compiler to follow the standard strictly.
I don’t use extensions, even convenient ones, as I have to be able to run my code on a variety of compilers. If you don’t have to do that, some extensions (like this one) are really handy.
From the standard, the enum-name is marked as optional in the enum grammar: in [dcl.enum](https://eel.is/c++draft/dcl.enum#11) ; it is also referenced e.g. in [dcl.dcl]:
> An unnamed enumeration that does not have a typedef name for linkage purposes ([dcl.typedef]) and that has a first enumerator is denoted, for linkage purposes ([basic.link]), by its underlying type and its first enumerator; such an enumeration is said to have an enumerator as a name for linkage purposes.
> A class-specifier whose class-head omits the class-head-name defines an unnamed class.
So both are entirely fine (and likewise, unions are too).
Note that my links are for the current draft, but I just checked and this was already the case as far back as C++11. So I wonder where this persistent myth seems to come from.
This is awesome. I referenced cppreference, but that is not authoritative. Unfortunately, in the final draft, [class.pre] grammar makes the name mandatory even though the language you quote remains in the first textual paragraph following the grammar specification!
The part of enums you quoted was C-compatible enums; anonymous scoped enums are explicitly forbidden: "The optional enum-head-name shall not be omitted in the declaration of a scoped enumeration" (dcl.enum 2).
Sigh. I will send in a clarification at least on the class/struct/union side. Ideally the grammar would be fixed rather than that paragraph.
> Unfortunately, in the final draft, [class.pre] grammar makes the name mandatory even though the language you quote remains in the first textual paragraph following the grammar specification!
Each instance of "enum struct { abandon, save }" would denote a different type, yes? How would you write a compatible definition to go with your prototype?
I don’t care that they are different types; if anything that would be a feature.
The point is to prevent the “mysterious bool arguments” class of error.
The question is if ADL could infer the scope of the enum, as template instant is toon can now infer the right thing and don’t always need the <T> notation.
I agree that you would want them to be different types; otherwise you would just use an unscoped enum. But that implies that if you write…
/* example.h */
void f(enum struct { x, y } arg);
/* example.c */
void f(enum struct { x, y } arg) {
/* do something with arg */
}
…then you've just created a function with two different overloadings based on distinct anonymous types which just happen to be spelled the same way. Without a type name I don't see any way you could define a function whose prototype would be compatible with the forward declaration. You also have conflicting definitions of "x" and "y" with the same names and scope but different types. Perhaps with GNU extensions you could use typeof(x) for the argument and avoid the conflict, but that isn't standard C++.
IMO you can still be explicit about field offsets by writing the struct in a usual way, and using static assertions to ensure offsets match the intended layout.
All undefined behaviour is well defined for each compiler, what it really means is implementation defined and subject to change without notice or documentation with every compiler version or host os what flags are enabled or a thousand other things.
Why use an approach which strictly relies on specific versions of specific compilers, rather than a completely portable and standard compliant struct with char array and a few char pointers? Or if you want a convenient interface and aren't explicitly writing for the kernel, switch to a restricted subset of C++ and do it right?
What you're saying is very close to the common fallacious idea about UB, which is that the compiler and computer are machines, and therefore must have some deterministic behaviour when they encounter any given piece of code. But the point and meaning of the term "undefined behaviour" is that it describes the result of operations which are off limits for legal reasons.
There is nothing good to be gained from trying to claim that GCC extensions are actually a kind of undefined behaviour. If you know what is meant by UB in the C standard, you won't muddy the waters like this.
"Somebody once told me that in basketball you can't hold the ball and run. I got a basketball and tried it and it worked just fine. He obviously didn't understand basketball."
https://blog.regehr.org/archives/213
GCC explicitly states znd documents that type punning through unions will work as expected.
This doesn't work by accident. It works very explicitly by design of GCC.
Different GCC versions aren't randomly going to change documented behavior. And when they accidentally do, they will consider it a bug.
Regardless, type punning though a union is not undefined behavior in C the first place [1] [2]. On the other hand, it is undefined in C++, but GCC allows it there too.
What is referred to as implementation defined behaviour is a different class, yes, obviously. I guess nasal deamons was too complicated a piece of irony for you.
There are two kinds of undefined behaviour being invoked in using this. Its a horrible idea and a horrible code smell, get rid of it if you ever see something like this.
I don't see any undefined behavior here. As I mentioned below, gcc explicitly documents type punning via unions as being well defined. But yes, this is compiler specific and is not guaranteed to work elsewhere.
Accessing packed struct members works fine on x86, but will blow up at runtime or do weird things on platforms which don't support unaligned loads or stores.
The correct way to access packed structs is through memcpy, just like you'd access any other potentially unaligned object.
For architectures where unaligned accesses are illegal, gcc will generate multiple load/store instructions when accessing packed struct fields by name. The main caveat to look out for is taking the address of a packed struct member and then dereferencing it.
There is absolutely undefined behaviour there.
Undefined behaviour is defined not as nasal daemons but as:
The compiler implementer does not guarantee that this behaviour will be hardware, circumstance, compiler version, or os consistent, nor that we will warn if we change this.
Packed is technically not a undefined behaviour, but it is certainly a trap. Especially because the compiler macros leads people to make defines which select packed by compiler automatically. Then the special case of didn't recognize compiler is just left empty, meaning compiles but no longer does what you think.
You don't get to decide what UB means. It really does mean nasal demons are a possibility: all bets are off when you run that executable. Use of the term "undefined behaviour" to mean something else may be on the increase, unfortunately (https://mars.nasa.gov/technology/helicopter/status/298/what-...), but if we're talking about C, it's meaning is fixed.
I'm using anonymous nested structs extensively for grouping related items, but I consider the extra field name a feature, not something that should be hidden:
The result is uglier and less maintainable than a pair of macros. Or just stop trying to hide syntax. This is ultimately on the same level as typedefing pointers.
You're right; thanks for noticing and I've updated the first example. My C is a bit rusty these days and I didn't check it with a compiler the way I should have.
Doesn’t matter for C, but in C++ this could make your contexpr functions UB since you can only use one member of a union in constexpr contexts (the “active” member).
Hmm. How do C++ namespaces help with the structure naming problem in this example? They seem completely orthogonal.
C++ namespaces are a way to avoid library A's symbol "cow" clashing with library B's symbol "cow" without everything being named library_a_cow and library_b_cow all over the place which is annoying. I agree C would be nicer with such a namespace feature.
However this technique is about what happens when you realise your structure members x and y should be inside a sub-structure position, and you want both:
d = calculate_distance(s.x, s.y); // Old code
and
d = calculate_distance(s.position.x, s.position.y); // New
First of all, C++ 11 may feel like thirty years ago, and certainly some of its proponents look thirty years older than they did at the time, but it was only ten years ago. C++ namespaces date to standardisation work (so after the 1985 C++ but before the 1995 standard C++) but they don't get this job done. Inline namespaces are a newer feature.
Secondly this technique does something different. The C hack doesn't touch the old code. But this "inline namespace" trick means old code has to explicitly opt into this backward compatibility fix or else it might blow up.
Lastly, I didn't try this, but presumably you did. Are the two separately namespaces classes the "same thing" as far as type checking is concerned? A vital feature of this union trick is that it's just one structure, it type checks as the same structure because it is the same structure. At a glance, I think the C++ solution results in two types with similar names, so that would fail type checking.
Ah, another of those threads, ok lets set the years straight.
Yes, inline namespaces were only introduced in C++11, about 10 years ago, now lets dive into article.
"Learning that you can use unions in C for grouping things into namespaces"
Grouping into namespaces, so when did C++ get said feature?
ANSI/ISO C++89 released to the world in September 1998, which makes around 23 years, or 24 years if we consider the release of C++ compilers already supporting it the year before, like Borland C++.
This C hack definitly does touch old code, as it requires the code to be written to take advantage of the technique and is also touched again, when changes to the structs are required.
And naturally recompilation.
With inline namespaces, assumign recompilation you can naturally also change which set of identifiers and type aliases are visibile by default.
Nope. C considers that a.field2 is still a perfectly reasonable name for well... a.field2, even though a.sub.field2 is now also a name for that same field. The old code that cares about a.field2 works, no changes. You can recompile it, or not, either way, still a.field2
New code (or, sure, rewritten code if you have the budget to go around rewriting all your code) gets to talk about the new-fangled a.sub and it all works together.
Whereas with the C++ namespace hack that doesn't work.
Which is fine -- no reason your C++ namespace hack isn't great for whatever you wanted that for, and this hack is great for what it wanted to do. But where we end up with this thread is your claim that since Bjarne was pestered into adding the namespace feature to C++ in about 1990 this C hack isn't necessary for C++ even though the two are orthogonal.
Yes C++ has namespaces. Yes that's a good feature. No it doesn't help you solve this problem even with the later "inline namespace" feature.
> Grouping into namespaces, so when did C++ get said feature?
What you've done there, and perhaps in this whole thread, is assume that your context is the only context. There is some irony in the fact that this is the sort of problem namespace features in programming languages often aim to prevent. std::cmp::Ordering is very different from std::sync::atomic::Ordering
In your context "namespace" means the C++ feature. But in the author's context, as a C programmer, it just meant the plain fact that C considers a.foo to be a field in a, while a.b.foo is a field in b, which is in turn a field in a, and these names are separate, they don't shadow, they don't clash. The same way member names from different classes are in separate namespaces.
Based on the writeup, this technique isn't really about enabling you to start writing `s.position.x` where the old code would have written `s.x`. If that were all you wanted, you'd just keep writing `s.x`. It's about enabling you to write `s.x` everywhere, in old code and new code, while also being able to pass `s.position` to memcpy calls. You're never supposed to write `s.position.x`.
That's nice, but the blog post doesn't say anything about migrating libraries across versions? Looking at the comment thread, I see tialaramex's sibling comment suggested the blog post was about migration, but it's not.
I suppose migration is another possible use case for the union trick, and for that case C++ inline namespaces can be used as part of an implementation that achieves a broadly similar goal, but in a completely different way. As tialaramex notes, with inline namespaces you still end up with two different types.
Constexpr unions is the sane/safe way to use them. Its great, because accessing a member which isnt the last one written, constexpr will explicitly prevent it compile time. Whereas all other examples here are explicitly undefined behaviour!
This is also known as the most common invocation of undefined behaviour in game programming. If you do this, write to y, then read from [1]. You are invoking undefined behaviour, and compilers doing different things here between windows, linux mac, and different compiler versions is a common cause of "why isnt my game working right on XXX, it works fine on YYY questions.
Type punning is not undefined, it's implementation defined in C. In practice, every major C compiler will be fine with type punning, though it may disable some optimizations.
The story is different in C++, but in practice many compilers support it the same as in C. Especially for games, where VC++ (PC, Xbox) and Clang (PS4/PS5) are the most commonly used compilers, it also works as expected. The trick is to only use type punning for trivial structs that don't invoke complications like con/de-structors or operators. The GP's example of a Vec3 struct that puns float x,y,z with float[3] is a very common one in games.
Something being very common and a very common source of portability issues isn't exactly contradictory. Its a bad idea, and it is outright being taught in modern game programming courses that its a bad idea, but common in older guides, specifically because it caused so many problems. Im pissed at this specific construct because I got it handed to me in a huge game library and had to spent a long time figuring out why it wasn't working in rare, but important cases.
But my point is that on the platforms that matter, it's not really a source of portability issues, and not a problem. For gamedev, anything outside of VC++ and Clang are niche and thus largely ignored.
Op probably means cpp, where it is indeed undefined behavior. not sure about c. I doubt that if this would cause a "my game does not work on XXX" though. Is there really a compiler out there that will handle such abuse differently?
yes its undefined behaviour in both C and C++. Yes, a number of compilers treat this differently, its also poorly supported on custom hardware using standard compilers like gcc. So compiling for some mobile device with slightly custom ... good luck.
Bleurgh. I have a deep soft spot for C, and I'm known to get twisted pleasure from using obscure language features in new ways to annoy people, but this is a level of abuse that even I can't get behind. If you need namespacing, use C++. As much as I love C, it's terrible for large projects.
Linux kernel is large project and clearly C is sufficient for it, given the fact that migrating to C++ would probably be very easy (not using all C++ features, but just selected ones), yet it did not happen.
I think that C++ is better than C, but C is not that bad, even for large projects.
> Linux kernel is large project and clearly C is sufficient for it
Sure, and operating systems have been written in assmebly too. The question is whether it would be better than just sufficient if Linux were written in C++, today (ie C++17 or 20, not something old). Switching now probably wouldn't be feasible (even ignoring technical reasons, the kernel developer community is familiar with the C codebase and code standards and bought into it), but if Linux were started today, would it be a better choice?
Maybe the answer is still no and C would still be chosen, but the choice today is very different than it was when Linux was started. Of course, maybe Rust or something would be chosen today instead.
Cantrill did a talk on which he touches on C, C++ and Rust for systems programming [1].
His tl;dr being that Rust feels very much like a proper systems programming language, and more of a « better C » than C++.
I don't entirely know what to make of it, but my instinct is that something like C++ with such an opportunity space for baroque concoctions (leading to an obsession with design patterns) is just playing with fire.
Yeah they should have upgraded to some restricted subset of C++ or new restrictive language ages ago. I mostly buy the arguments against having exceptions, perhaps even against polymorphism in general, but the argument against destructors, or atomics... hell no.
C has polymorphism. Inheritance-based virtual dispatch is just one kind of polymorphism. It's common to wire up polymorphism in C with bespoke data structures using tagged unions it function pointers. Changing an implementation at link time is even a form of polymorphism.
And the Kernel devs would probably get really annoyed if you try to push this kind of name-spacing.
> C++ would probably be very easy
Not necessary, besides some small? problems due to the C++ allowing "more magic optimizations" then C they would switch to a sub-set of C++, and it might be so you would need to communicate to all contributors that a lot of C++ things are not allowed. And it might be easier to simple not use C++. I mean if it would be that easy the kernel likely would have switched.
A big issue with introducing C++ into a codebase is that it's incredibly hard to stick to a particular subset or standard. There's always a well-justified argument for the next standard or "just this one additional feature". Eventually you end up with the whole kitchen sink, regardless of where you started.
I've had far more success hard-firewalling C++ into its own box where programmers can use whatever they can get running than trying to limit people to subsets.
This is probably a terrible idea, remember that if you have written one member of a union, all other members remain public, yet accessing any of them in any way is undefined behaviour. This is made way worse by most compilers mostly choosing to let you do what you think it will. They just dont guarantee they always will or in all cases.
I believe you are mistaken. The C11 standard, section 6.5.2.3 "Structure and union members" pgf 6, says "One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members." And that seems to be what's being used here.
The union is only as big as necessary to hold its largest data member. The other data members are allocated in the same bytes as part of that largest member. The details of that allocation are implementation-defined but all non-static data members will have the same address (since C++14). It's undefined behavior to read from the member of the union that wasn't most recently written. Many compilers implement, as a non-standard language extension, the ability to read inactive members of a union.
What 6.5.2.3 simplifies is the use of unions of the type:
struct A{int type; DataA a;}
struct B{int type; DataB b;}
union U{A a;B b};
U u;
switch(u.type)...
Its not what is beeing used here.
std::variant is designed to deprecate all legitimate uses of union
The post is about C, not C++. My comment stands, as the original post has two structs in a union, and they start the same way, so it’s exactly the case covered in the C11 Standard.
It's actually weirder than that. The C standard allows type punning through unions, but not because of the clause you mentioned. It allows it because of footnote 95:
> If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type punning’’)
This is broader than the common initial subsequence clause, and allows punning between completely different types, e.g. int, char[4], and float.
You might ask, what is the point of the "common initial subsequence" rule then? It's to allow certain accesses that don't go directly through the union, so the compiler doesn't know for sure whether there's a union involved. Only problem is that all major compilers completely ignore this rule. [1] (But they do implement the first clause I mentioned, where the accesses do go through the union.)
Your response to GP is based on the C++ reference and his explicitly is based on the C standard. Your assertion that ‘ [t]he details of that allocation are implementation-defined but all non-static data members will have the same address (since C++14)’ seems to directly conflict with the C11 standard. Also, your closing comment about std::variant is clearly only applicable to C++. I am just curious why you are using C++ when the article and GP are specifically addressing C?
You've mentioned this several times on this page, but this is still incorrect.
The C standard references "struct or union" all over the place because the two are so similar. The distinction is of course made clear in multiple places, but one that seems relevant here is:
> As discussed in 6.2.5, a structure is a type consisting of a sequence of members, whose storage is allocated in an ordered sequence, and a union is a type consisting of a sequence of members whose storage overlap. (ISO/IEC 9899:201x, §6.7.2.1, #6)
That's it. There's nothing about undefined behavior if you access one member and then another later. In fact there's even a paragraph which mentions doing just that:
> The size of a union is sufficient to contain the largest of its members. The value of at most one of the members can be stored in a union object at any time. A pointer to a union object, suitably converted, points to each of its members (or if a member is a bitfield, then to the unit in which it resides), and vice versa. (ISO/IEC 9899:201x, §6.7.2.1, #16)
A pointer to the union points to each of its members, and can be dereferenced to access it.
std::variant is not used in C; C and C++ are two different languages.
> The size of a union is sufficient to contain the largest of its members.
Correct me if I'm wrong, but there is no part of the C spec that says this:
When initializing a union member that is smaller than the largest member, the remaining bytes will always automatically be initialized to zero.
If I'm right then the following caveat must be added to your statement:
> A pointer to the union points to each of its members, and can be dereferenced to access it.
... if and only if the member which was originally initialized is at least as large as the other member being accessed.
In other words, if you write your program in a way that ensures it will only compile when all union members are exactly the same size, and you have mandatory tooling to make sure that any changes to said union follow the same rule by force of compilation errors, then and only then can you claim what you claimed without the threat of undefined behavior.
Like the parent poster, when I read the article I assumed that there was no conceivable reason to ever use this feature in a real C program. Let me just say that I'm pleasantly surprised to be proven wrong!