Hacker News new | past | comments | ask | show | jobs | submit login
Learning that you can use unions in C for grouping things into namespaces (utcc.utoronto.ca)
167 points by deafcalculus on Aug 1, 2021 | hide | past | favorite | 147 comments



Anonymous nested structs are also quite useful for creating struct fields with explicit offsets:

    #include <stdio.h>
    #include <stdint.h>
    
    #define YDUMMY(suffix, size) char dummy##suffix[size]
    #define XDUMMY(suffix, size) YDUMMY(suffix, size)
    #define PAD(size) XDUMMY(__COUNTER__, size)
    
    struct ExplicitLayoutStruct {
        union {
            struct __attribute__((packed)) { PAD(3); uint32_t foo; };
            struct __attribute__((packed)) { PAD(5); uint16_t bar; };
            struct __attribute__((packed)) { PAD(13); uint64_t baz; };
        };
    };
    
    int main(void) {
        // offset foo = 3
        // offset bar = 5
        // offset baz = 13
        printf("offset foo = %d\n", offsetof(struct ExplicitLayoutStruct, foo));
        printf("offset bar = %d\n", offsetof(struct ExplicitLayoutStruct, bar));
        printf("offset baz = %d\n", offsetof(struct ExplicitLayoutStruct, baz));
        return 0;
    }


Anytime macros are used for metaprogramming, it's time to reach for a more powerful language.


I hear your quote, but it's only a quote.

A macro is effectively preprocessing facilitated by the language. You could always preprosess externally if you wanted, and there's nothing stopping you from doing that in the "Powerful Language (TM)" either.

Now whether people use macros and preprocessing usefully is another question, but not one to which the answer is "abolish macros for more language features". When used correctly, macros ARE power.


Macros are useful, as long as they're used sparingly. I think that in this case, it's used well - the struct is still perfectly readable, and the sole purpose of it is to make it so that you don't have to manually name the dummy fields. But you could totally just write out dummy1, dummy2, dummy3 etc. yourself if you want to get rid of the macros.


> Macros are useful, as long as they're used sparingly.

Everybody says that. Everybody believes it. And everybody goes to town making a rat's nest with macros, just like that snarl of cables under my desk that resist all attempts to make it nice.

Myself included. I've even written an article about clever C macros. Look, ma! I was so proud of myself.

But then I got older. I started replacing the macros in my C code with regular code. It turns out they weren't that necessary at all. I liked the C code a lot better when it didn't have a single # in it other than #include.


I want to be clear about your meaning, because I don’t know if I’m reading your comment correctly. Are you referring explicitly to syntax based, preprocessor macros? Or does your comment extend to other metaprogramming techniques? I am inclined to think you mean the first considering the amount of emphasis on generic programming in D? Just curious.


I'm referring to both syntax based (AST) macros, and text based (preprocessor) macros. The latter, of course, are much worse.

An example of the former is so-called "expression templates" in C++. I've seen them used to create a regular expression language using C++ expression templates. The author was quite proud of them, and indeed they were very clever.

However nice the execution, the concept was terrible. There was no way to visually tell that some ordinary code was actually doing regular expressions.

C++ expression templates had their day in the sun, but fortunately they seem to have been thrown onto the trash pile of sounds-like-a-good-idea-but-oops.

(I wrote an article showing how to do expression templates in D, mainly to answer criticisms that D couldn't do it, not because it was a good idea.)


> Anytime macros are used for metaprogramming, it's time to reach for a more powerful language.

> I'm referring to both syntax based (AST) macros ...

This surprises me greatly. Various lisps are among the most powerful languages I know of and a large part of the reason is macros coupled with their ability to execute arbitrary code at compile time (which itself uses additional macros, which in turn invoke more code, and so on). What's your take on this?

(Continuations are also pretty nice ...)


I'm a noob when it comes to Lithp. But I'm told what happens with their macros is the language is fairly unusable until you write lots of macros. The macros then become your personal undocumented wacky language, which nobody else is able to use.

I've seen this happen with assembler macro languages, too.

> most powerful

It's like putting a 1000 hp motor in a car. It's main use is to wreck the car and kill the driver.

BTW, D is the first language of its type (curly brace static compilation) to be able to execute arbitrary code at compile time. It started as kind of "let's see what happens if I implement this", and it spawned an explosion of creativity. It has since been adopted by other languages.


> the language is fairly unusable until you write lots of macros

This is not (typically) the case. It would be like saying that you need to write lots of templates to get things done in D. Metaprogramming is certainly very nice to have but it's not a requirement for the vast majority of tasks.

It's important to note that Lisps are an entire family of languages; some implementations are batteries included while others are extremely minimal. Where things can get a bit confusing is that many macro implementations are so seamless that significant pieces of core language functionality are built in them. Schemes tend to take this to an extreme, with many constructs that I would consider essential to productive use of the language provided as SRFIs.

> macros then become your personal undocumented wacky language

That's Doing It Wrong™. You could as well argue to remove goto from a language because sometimes people abuse it and write spaghetti. C++ has operator overloading. D has alias this. If (for example) a DSL is the appropriate tool then being able to use macros to integrate it seamlessly into the host language is a good thing.


> it's not a requirement for the vast majority of tasks.

Right, but the temptation to do it is irresistible.

> That's Doing It Wrong™

Of course it's doing it wrong. The point is, that seems to always happen because the temptation is irresistible.

> goto

I rarely see a goto anymore. It just doesn't have the temptation that macros do.

> alias this

Has turned out to be a mistake.

> integrate it seamlessly into the host language is a good thing

Supporting the creation of embedded DSLs is a good thing. Hijacking the syntax of the language to create your own language is a bad thing. I've seen it over and over, it never works out very well. It's one of those things you just have to experience to realize it.

D's support for DSLs comes from its ability to manipulate string literals at compile time, generate new strings, and mixin those strings into the code. This is clearly distinguishable in the source code from ASTs.


Fair enough, it seems we have very different philosophies on this particular issue. Without oft maligned features such as alias this and operator overloading I never would have given D a try as an alternative to C++.

I'd like to suggest that I think you might be missing some perspective here. You say that misuse of macros always happens and that you've seen it over and over. Yet if you explore the Scheme ecosystem you might notice that significant parts of any given implementation often take the form of macros. Racket in particular fully embraces the idea of the programmer mixing customized languages together and while examples of bad code certainly exist it seems to work out quite well on the whole.

To be clear, I do appreciate having easy access to tools that are simple and safe. I just also like having seamless access to and interop with a set of powerful ones that don't try to protect me from my own poor decisions. I shouldn't need to do extra work to make use of an alternative more powerful tool for a small part of a project. At that point it becomes very tempting to drop the safer tool altogether in favor of the more powerful one just to avoid the obviously needless and therefore particularly irritating overhead.

I much prefer the approach of providing limited language subsets that can be opted into and out of in a targeted manner. Having the compiler enforce a simple one by default provides a set of guard rails without getting in the way when it matters.

If I could write the majority of my code in something resembling Go and just a small bit of it in an alternative dialect with expressive power comparable to Common Lisp that would be ideal. To that end, I'm a huge fan of features like @system, @safe, and @nogc in D while very much disliking the need to use string mixins to write a DSL, the various restrictions placed on CTFE behavior, and other similar things.


> BTW, D is the first language of its type (curly brace static compilation) to be able to execute arbitrary code at compile time. It started as kind of "let's see what happens if I implement this", and it spawned an explosion of creativity. It has since been adopted by other languages.

I don't know much about D compile time evaluation. How is it better than macros/templates?

Also, what do you think about Haskell's and Rust's approach of generics with typeclass/trait bounds?


> D compile time evaluation. How is it better than macros/templates?

CTFE isn't better than templates, it's a completely different tool. CTFE computes a result at compile time as though you had written a literal in the source code. Templates generate blocks of specialized code on the fly based on various parameters (typically types). They solve different problems.


I think that Walter was trying to say that D gets by doing the same things that people use macros for, but without macros. I interpreted this as meaning that D uses CTFE in those same situations. Am I wrong?

What does D do, then?


I think he was saying that sufficiently powerful languages provide enough features that you won't need macros in the first place. If you feel that you do for some reason, go find a more powerful language instead.

> What does D do, then?

Most mainstream languages, D included, very intentionally don't provide any features that could potentially be used to extend the language itself on the fly. (At least not in a straightforward manner. Obviously the C preprocessor kind of sort of facilitates a bit of this.)

As a counterexample, Rust does provide some of this in the form of procedural macros but doesn't provide (to the best of my knowledge) an equivalent to Lisp reader macros.


I wish you luck with it. If you send me an email, when I find what I wrote about it I'll pass it along to you.


I think you replied the wrong comment


Oops, indeed I did. Replied to the right one!


Can you link to it? We're using expression templates on a new library and I find it useful.


I don't know what code Walter is referring to but I had similar experiences during a presentation of the Boost Spirit library (at our Silicon Valley C++ Meeting; must be about a decade ago now).

The speaker was proud and beaming for showing how powerful C++ is and the audience was in awe.

I was incredulous! Jaw open! The "solution" was horrible with a bunch of workarounds for a bunch of shortcomings. It was a "the emperor does not have cloths" moment for me.

Boost Spirit may be better today with newer C++ features; I don't know.


That's kind of the key. We require c++17. Without that it gets extremely ugly and more verbose.


Oh man, you sure know how to drive a stake in my heart! What have I done! Some curses should just not be uttered. I'm not sure where it is on my backups.

But it was done just like you'd do it in C++. The same thing, just with D syntax.


We're making a numerical computing library that expression templates are used for compile-time evaluation of expressions. It's a very niche hpc library, so it might be one of the few places it's appropriate.


I wish you luck with it. If you send me an email, when I find what I wrote about it I'll pass it along to you.


They probably aren't the only people interested in that...


Doesn’t work in a lot of cases unfortunately. If you’re writing a library designed to be consumed by other languages you’re stuck with writing C abi compatible code which can be written in other languages that can “extern” them but it puts limits on what’s possible in those libraries.


One thing I do is to make my own interface to the 3rd party library with the ugly interface. Then that interface file is only one that sees it.

It also may be true that one can't replace every use of the preprocessor. That shouldn't stop replacing what one can.


You might as well have written that "any time you're reaching for C, it's time to reach for a more powerful language".

But if -sadly- you must use C, metaprogramming using macros is not a terrible thing.


> is not a terrible thing

It is a terrible thing. It's possible to do without them, and you'll like your code better. Your symbolic debugger and syntax directed editor will work properly. The poor schlub who has to fix the bugs in your code after you leave will be grateful. Your spouse will be happy and your children will prosper.

For example,

    #define foo(x) ((x) + 1)
replace with:

    int foo(int x) { return x + 1; }
The compiler will inline it for you. There's no penalty.

    #define FOO 3
Replace with:

    enum { FOO = 3 };
or:

    const int FOO = 3;
Replace:

    #if FOO == 3
    bar();
    #endif
with:

    if (FOO == 3) bar();
The optimizer will delete bar() if FOO is a manifest constant that is not 3.


Having made a significant contribution to Simon Tatham's PuTTY, I have to respectfully disagree. At the time the entire SSHv2 key exchange and user authentication protocols were implemented in a single function using a massive Duff's device (for asynchrony) implemented with C metaprogramming macros. It was a surprisingly pleasant experience.


There is a reason why such clevernesses are forbiden by security standards like MISRA-C.


Agree except the last example - the clarity of intent that you want a conditional compilation is lost. Also, non-optimized debug builds are affected.


I think it was RMS who argued that conditional compilation should be considered harmful in general. Code that you put in an inactive #if(def) block will not be maintained, and is basically guaranteed to rot. If it's needed in the future, it'll likely have to be rewritten from scratch.

According to this stance, any code that's suppressed by the C preprocessor should either be written in an if {} statement so that it will at least continue to compile as the surrounding code changes, or be replaced with comments describing what it does (or did), if it's important enough to keep track of.

Can't really think of many good counterarguments to this. Machine dependence might be one, but then you could argue that the preprocessor is being used to cover up for an inadequate HAL.


When it comes to cross-platform code, you build for several platforms, so the block is active (for particular platforms) and maintained.


> clarity of intent

Conditional compilation is an optimization, not a semantic intent.

> non-optimized debug builds are affected

The code size will be larger. Doesn't matter.


> Conditional compilation is an optimization, not a semantic intent.

Why can't it have semantic intent? I frequently use it for cross-platform adjustments, and those don't usually compile on the defined-away platform.


> those don't usually compile on the defined-away platform

Then you're doing it wrong :-/

All these can be made to work.

P.S. compiling successfully is not the same thing as linking successfully. Think stubs and deciding which files to link together.


Making it clear in the code itself that something is platform specific is useful as well. Also, I'd say such stubs and extra files would clutter the project.


I tried it your way for years. It's common practice in the industry. I have written and seen plenty of rats' nests of #if and #ifdef snarls so bad one cannot figure out what is being compiled and what isn't without running a just-the-preprocessor pass. It will often #if on the wrong condition, like operating system when it's a CPU dependency.

Rewriting it the way I suggest tends to force a cleanup of all that. You'll like the results.

BTW, if you want to try transitioning to the next level, try getting rid of all the `if` statements, too!


They can be made to work by introducing stubs for every os/arch/compiler/dependency I support, but that's way more work for dubious gain.

And who is to say I'm doing it wrong? Just because conditional compilation CAN lead to a rat nest doesn't mean it MUST. And if it happens to be well-organized and works as is, there is no problem.


> that's way more work for dubious gain

Oh, have I heard that before. Here's what happens, over the years, to such code:

1. #ifdef's on the wrong feature. For example, #ifdef on operating system for a CPU feature.

2. Overly complex #if expressions, that steadily get worse.

3. Fixing support for X that subtlety breaks F and Q support. It goes unnoticed because your build/test machines are not F and Q.

4. People having no idea what #defines to set on the command line, nor what the ones that are set are doing (if anything).

4. People having no idea how to fold in support for new configurations.

5. Code will #ifdef on predefined macros that have little or no documentation on what they're for or under exactly what circumstances they are set. If the writer even bothered to research that. gcc predefines what, 400 macros?

> Just because conditional compilation CAN lead to a rat nest doesn't mean it MUST.

Jyust yew wite, 'enry 'iggins, jyust yew wite!


> Here's what happens, over the years, to such code:

Over how many years? Some of my projects are 30+ years old and support a few dozen combinations of OSes, compilers, and architectures, and the method I described hasn't led to a rat's nest yet.

> #ifdef's on the wrong feature. For example, #ifdef on operating system for a CPU feature

Sure, same thing happens when you decide whether to compile in your stubs or not. But if I make a mistake, the code usually won't compile on the configuration in question; if you make a similar mistake, it'll compile with your stub, it might even link, but you'll get odd runtime behavior. Sounds worse.

> Overly complex #if expressions, that steadily get worse

All code gets worse over time and bit rots if you don't maintain it, I don't see things necessarily worse in this area.

> Fixing support for X that subtlety breaks F and Q support. It goes unnoticed because your build/test machines are not F and Q

If you aren't testing on tier one configurations, they shouldn't be tier one. And if a tier three configuration breaks when you do finally get around to testing it, it's tier three, so it gets fixed when it can be and there's no problem. Similar to normal code, I'd say.

> People having no idea what #defines to set on the command line, nor what the ones that are set are doing (if anything) > People having no idea how to fold in support for new configurations

These problems exist for normal code as well. Someone has to own the build configurations, and they have to be maintained, like anything else.

> Code will #ifdef on predefined macros that have little or no documentation on what they're for or under exactly what circumstances they are set. If the writer even bothered to research that. gcc predefines what, 400 macros?

We only use a handful, and they are fairly well documented. But even if they weren't, since the code works on all tier one platforms, we'd know if something was changed out from underneath us because it was underspecified or accidentally relied upon.

But again the same can happen with normal code and system headers, compiler extensions, etc - I don't see why pre-processor macros have to be singled out here.

> Jyust yew wite, 'enry 'iggins, jyust yew wite!

How long?


   The optimizer will delete bar() if FOO is a manifest constant that is not 3. 
Yes, but the compiler won't delete it and complain if bar is not defined. if constexpr is not a direct replacement for if macro


> the compiler won't delete it and complain if bar is not defined

    extern void bar();
will define it to the satisfaction of the compiler. The linker won't complain if the compiler removes the call to it.


And then you have the issue of not having types. consider this

  void my_bar_class_type::bar(my_bar_type<my_bar_type_2>::my_bar_inside_type paramater);
and you are calling something like

   myObject->bar(mySecondObject.getWhatever());
You have to mock up like everything


Or go up an abstraction layer so the type details are not needed.


One of the very few things from C that I miss in C++ is anonymous structs and enums. I really don’t understand why they are not allowed.

That is, C style enums don’t have to have a name but “type safe” (enum class) ones do. One classic use is to name an otherwise boolean option in a function signature; there’s typically no need to otherwise name it.

C++ incompatibly requires a name for all struct and class declarations, again a waste when you will only have a single object of a given type.


> I really don’t understand why they are not allowed.

I don't, either. Such were in D from 2000 or so.

I also don't understand why `class` in C++ sits in the tag name space. I wrote Bjarne in the 1980s asking him to remove it from the tag name space, as the tag name space is an abomination. He replied that there was too much water under that bridge.

D doesn't have the tag name space, and in 20 years not a single person has asked for it.

This did cause some trouble for me with ImportC to support things like:

    struct S { ... };
    int S;
but I found a way. Although such code is an abomination. I've only seen it in the wild in system .h files.


The only explanation I saw was that C++ standards guys were horrified by the idea of unpredictable side effects as a result of initialization of a struct.

I think C++ though is adding them.

What I'd like in c is designated function parameters.

  // these the same
  bar(.a = 10, .b = 12);
  bar(.b = 12, .a = 10);


I suspect that to many C++ programmers, most initializations of structs have unpredictable side effects because of how complex they are ;)


You can somewhat fake it by replacing your functions parameters list with a single struct parameter.

    struct bar_arguments {
       int a, b;
    };
    int bar(struct bar_arguments args) { return 2*args.a + args.b;}

    #define bar(...) bar((struct bar_arguments) {__VA_ARGS__})

    // usage (will print 32 three times)
    printf("%d\n", bar(10, 12));
    printf("%d\n", bar(.a = 10, .b = 12));
    printf("%d\n", bar(.b = 12, .a = 10));

The main drawback is that all parameters are now optional: it will not complain if you forget to assign all parameters, it will silently set them to 0 :-/

    printf("%d\n", bar(10));
    printf("%d\n", bar(.a = 10));
    printf("%d\n", bar(.b = 12));
will print 20, 20 and 12.

You can change those "default values", but then calling the function with regular positional parameters is impaired :-/


> The only explanation I saw was that C++ standards guys were horrified by the idea of unpredictable side effects as a result of initialization of a struct.

I don't understand. How would struct or class initialization be any different from simply doing, say, `for (auto& a : { x, y, z }) frob (a);` which is perfectly legal?


I didn't mention. I think the thought was with designated initializers the order of initialization is what? The order of the elements of the struct? Or the order where it's initialized. In C probably matters little as side effects are usually blatant. C++ I think cryptic side effects are common.


> C++ incompatibly requires a name for all struct and class declarations

You're right about "enum class", but anonymous classes and structs are perfectly valid in C++:

https://godbolt.org/z/7MbcqhnoK


Try

  struct S { struct { int x; }; };
under -pedantic and you'll get

  warning: ISO C++ prohibits anonymous structs [-Wpedantic]


Pedantic is for the older C++ standard, its not pedantic for the latter e.g c++11, I think this changed.


No, pedantic is for disabling compiler extensions. You still need to explicitly specify a standard.


gcc pedantic ignores the language flag, and clang and intel state they mirror gcc. So pedantic would be not C++11 even if you added that.


Well that blows my mind, I never realized pedantic ignores the language setting. Is this the only case where it does that?


... ? That's definitely not true, both anonymous structs and enums work fine in c++.

https://wandbox.org/permlink/ICaQJXCaVOt9mXdP


No, they are forbidden by the standard (take a look at cppreference). Some compilers implement the C behavior as an extension, so tell your compiler to follow the standard strictly.

I don’t use extensions, even convenient ones, as I have to be able to run my code on a variety of compilers. If you don’t have to do that, some extensions (like this one) are really handy.


From the standard, the enum-name is marked as optional in the enum grammar: in [dcl.enum](https://eel.is/c++draft/dcl.enum#11) ; it is also referenced e.g. in [dcl.dcl]:

> An unnamed enumeration that does not have a typedef name for linkage purposes ([dcl.typedef]) and that has a first enumerator is denoted, for linkage purposes ([basic.link]), by its underlying type and its first enumerator; such an enumeration is said to have an enumerator as a name for linkage purposes.

And for classes/structs, [class.pre](https://eel.is/c++draft/class.pre#def:class,unnamed) has explicit wording:

> A class-specifier whose class-head omits the class-head-name defines an unnamed class.

So both are entirely fine (and likewise, unions are too).

Note that my links are for the current draft, but I just checked and this was already the case as far back as C++11. So I wonder where this persistent myth seems to come from.


This is awesome. I referenced cppreference, but that is not authoritative. Unfortunately, in the final draft, [class.pre] grammar makes the name mandatory even though the language you quote remains in the first textual paragraph following the grammar specification!

The part of enums you quoted was C-compatible enums; anonymous scoped enums are explicitly forbidden: "The optional enum-head-name shall not be omitted in the declaration of a scoped enumeration" (dcl.enum 2).

Sigh. I will send in a clarification at least on the class/struct/union side. Ideally the grammar would be fixed rather than that paragraph.

The draft I looked at is https://timsong-cpp.github.io/cppwp/n4868/ (2020-10-18, shortly after the standard was approved).


> Unfortunately, in the final draft, [class.pre] grammar makes the name mandatory even though the language you quote remains in the first textual paragraph following the grammar specification!

That's pretty strange, considering e.g. this paper for quite some time ago: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p022... which is written as if anonymous structs always were a thing.

I wonder if there isn't a deep confusion somewhere where "anonymous" and "unnamed" mean different things to different persons.


Use a enum in a namespace, or anonymous namespace


This is an example of the desired use case:

    static obj& some_call (obj& o, enum struct { abandon, save } disposition) { ... };
This is a common case (and should be more common) to avoid using an obscure boolean flag, which can lead to bugs. It shouldn't need a name.

An anonymous namespace just means the name itself won't leak out; under C++ rules I need the name even to specify the enum tag, which is absurd.


Each instance of "enum struct { abandon, save }" would denote a different type, yes? How would you write a compatible definition to go with your prototype?


I don’t care that they are different types; if anything that would be a feature.

The point is to prevent the “mysterious bool arguments” class of error.

The question is if ADL could infer the scope of the enum, as template instant is toon can now infer the right thing and don’t always need the <T> notation.


I agree that you would want them to be different types; otherwise you would just use an unscoped enum. But that implies that if you write…

  /* example.h */
  void f(enum struct { x, y } arg);

  /* example.c */
  void f(enum struct { x, y } arg) {
    /* do something with arg */
  }
…then you've just created a function with two different overloadings based on distinct anonymous types which just happen to be spelled the same way. Without a type name I don't see any way you could define a function whose prototype would be compatible with the forward declaration. You also have conflicting definitions of "x" and "y" with the same names and scope but different types. Perhaps with GNU extensions you could use typeof(x) for the argument and avoid the conflict, but that isn't standard C++.


IMO you can still be explicit about field offsets by writing the struct in a usual way, and using static assertions to ensure offsets match the intended layout.


Do foo and bar deliberately overlap?


Yes, I was looking to demonstrate the flexibility of the approach by including overlapping fields.


If you write to either, accessing the other, even on the overlap, is undefined behaviour


Type punning/aliasing with unions is well defined in gcc. Linus even has a humorous rant about it on the topic:

https://www.yodaiken.com/2018/06/07/torvalds-on-aliasing/

Sure, it's compiler-specific, but I'm already using `__attribute__((packed))` anyways.


All undefined behaviour is well defined for each compiler, what it really means is implementation defined and subject to change without notice or documentation with every compiler version or host os what flags are enabled or a thousand other things. Why use an approach which strictly relies on specific versions of specific compilers, rather than a completely portable and standard compliant struct with char array and a few char pointers? Or if you want a convenient interface and aren't explicitly writing for the kernel, switch to a restricted subset of C++ and do it right?


What you're saying is very close to the common fallacious idea about UB, which is that the compiler and computer are machines, and therefore must have some deterministic behaviour when they encounter any given piece of code. But the point and meaning of the term "undefined behaviour" is that it describes the result of operations which are off limits for legal reasons. There is nothing good to be gained from trying to claim that GCC extensions are actually a kind of undefined behaviour. If you know what is meant by UB in the C standard, you won't muddy the waters like this.

"Somebody once told me that in basketball you can't hold the ball and run. I got a basketball and tried it and it worked just fine. He obviously didn't understand basketball." https://blog.regehr.org/archives/213


GCC explicitly states znd documents that type punning through unions will work as expected. This doesn't work by accident. It works very explicitly by design of GCC.

Different GCC versions aren't randomly going to change documented behavior. And when they accidentally do, they will consider it a bug.


Regardless, type punning though a union is not undefined behavior in C the first place [1] [2]. On the other hand, it is undefined in C++, but GCC allows it there too.

[1] https://stackoverflow.com/questions/11639947/is-type-punning...

[2] https://stackoverflow.com/questions/25664848/unions-and-type...


> All undefined behaviour is well defined for each compiler

This is not true.

> what it really means is implementation defined

Implementation-defined behavior is a thing in the standard and is separate from undefined behavior.


What is referred to as implementation defined behaviour is a different class, yes, obviously. I guess nasal deamons was too complicated a piece of irony for you.


There are two kinds of undefined behaviour being invoked in using this. Its a horrible idea and a horrible code smell, get rid of it if you ever see something like this.


I don't see any undefined behavior here. As I mentioned below, gcc explicitly documents type punning via unions as being well defined. But yes, this is compiler specific and is not guaranteed to work elsewhere.


Accessing packed struct members works fine on x86, but will blow up at runtime or do weird things on platforms which don't support unaligned loads or stores.

The correct way to access packed structs is through memcpy, just like you'd access any other potentially unaligned object.


For architectures where unaligned accesses are illegal, gcc will generate multiple load/store instructions when accessing packed struct fields by name. The main caveat to look out for is taking the address of a packed struct member and then dereferencing it.


Ah, right. Thanks for the correction.


There is absolutely undefined behaviour there. Undefined behaviour is defined not as nasal daemons but as: The compiler implementer does not guarantee that this behaviour will be hardware, circumstance, compiler version, or os consistent, nor that we will warn if we change this.

Packed is technically not a undefined behaviour, but it is certainly a trap. Especially because the compiler macros leads people to make defines which select packed by compiler automatically. Then the special case of didn't recognize compiler is just left empty, meaning compiles but no longer does what you think.


You don't get to decide what UB means. It really does mean nasal demons are a possibility: all bets are off when you run that executable. Use of the term "undefined behaviour" to mean something else may be on the increase, unfortunately (https://mars.nasa.gov/technology/helicopter/status/298/what-...), but if we're talking about C, it's meaning is fixed.


I didnt, the deterministic nature of computer programs guarantees that what I wrote is the actual outcome.


I'm using anonymous nested structs extensively for grouping related items, but I consider the extra field name a feature, not something that should be hidden:

https://github.com/floooh/sokol-samples/blob/bfb30ea00b5948f...

(also note the 'inplace initialization' which follows the state struct definition using C99's designated initialization)


The result is uglier and less maintainable than a pair of macros. Or just stop trying to hide syntax. This is ultimately on the same level as typedefing pointers.


The first example seems wrong, instead of `struct sub { ... };` what is meant is `struct { ... } sub;`


You're right; thanks for noticing and I've updated the first example. My C is a bit rusty these days and I didn't check it with a compiler the way I should have.

(I'm the author of the linked-to article.)


Doesn’t matter for C, but in C++ this could make your contexpr functions UB since you can only use one member of a union in constexpr contexts (the “active” member).


In other words: please always be wary of differences in C and C++, for instance type punning [0].

[0] https://stackoverflow.com/a/25672839


Triggering UB is a compiler error in constexpr code.

https://shafik.github.io/c++/undefined%20behavior/2019/05/11...


True, you’ll hopefully get a compiler error.


In C++ we have namespaces for 30 years now, no need for such tricks.


Hmm. How do C++ namespaces help with the structure naming problem in this example? They seem completely orthogonal.

C++ namespaces are a way to avoid library A's symbol "cow" clashing with library B's symbol "cow" without everything being named library_a_cow and library_b_cow all over the place which is annoying. I agree C would be nicer with such a namespace feature.

However this technique is about what happens when you realise your structure members x and y should be inside a sub-structure position, and you want both:

d = calculate_distance(s.x, s.y); // Old code

and

d = calculate_distance(s.position.x, s.position.y); // New

... to work while you transition to this naming.


You can use inline namespaces for versioning symbols.

https://www.foonathan.net/2018/11/inline-namespaces/


First of all, C++ 11 may feel like thirty years ago, and certainly some of its proponents look thirty years older than they did at the time, but it was only ten years ago. C++ namespaces date to standardisation work (so after the 1985 C++ but before the 1995 standard C++) but they don't get this job done. Inline namespaces are a newer feature.

Secondly this technique does something different. The C hack doesn't touch the old code. But this "inline namespace" trick means old code has to explicitly opt into this backward compatibility fix or else it might blow up.

Lastly, I didn't try this, but presumably you did. Are the two separately namespaces classes the "same thing" as far as type checking is concerned? A vital feature of this union trick is that it's just one structure, it type checks as the same structure because it is the same structure. At a glance, I think the C++ solution results in two types with similar names, so that would fail type checking.


Ah, another of those threads, ok lets set the years straight.

Yes, inline namespaces were only introduced in C++11, about 10 years ago, now lets dive into article.

"Learning that you can use unions in C for grouping things into namespaces"

Grouping into namespaces, so when did C++ get said feature?

ANSI/ISO C++89 released to the world in September 1998, which makes around 23 years, or 24 years if we consider the release of C++ compilers already supporting it the year before, like Borland C++.

This C hack definitly does touch old code, as it requires the code to be written to take advantage of the technique and is also touched again, when changes to the structs are required.

And naturally recompilation.

With inline namespaces, assumign recompilation you can naturally also change which set of identifiers and type aliases are visibile by default.


> This C hack definitly does touch old code

Nope. C considers that a.field2 is still a perfectly reasonable name for well... a.field2, even though a.sub.field2 is now also a name for that same field. The old code that cares about a.field2 works, no changes. You can recompile it, or not, either way, still a.field2

New code (or, sure, rewritten code if you have the budget to go around rewriting all your code) gets to talk about the new-fangled a.sub and it all works together.

Whereas with the C++ namespace hack that doesn't work.

Which is fine -- no reason your C++ namespace hack isn't great for whatever you wanted that for, and this hack is great for what it wanted to do. But where we end up with this thread is your claim that since Bjarne was pestered into adding the namespace feature to C++ in about 1990 this C hack isn't necessary for C++ even though the two are orthogonal.

Yes C++ has namespaces. Yes that's a good feature. No it doesn't help you solve this problem even with the later "inline namespace" feature.

> Grouping into namespaces, so when did C++ get said feature?

What you've done there, and perhaps in this whole thread, is assume that your context is the only context. There is some irony in the fact that this is the sort of problem namespace features in programming languages often aim to prevent. std::cmp::Ordering is very different from std::sync::atomic::Ordering

In your context "namespace" means the C++ feature. But in the author's context, as a C programmer, it just meant the plain fact that C considers a.foo to be a field in a, while a.b.foo is a field in b, which is in turn a field in a, and these names are separate, they don't shadow, they don't clash. The same way member names from different classes are in separate namespaces.


Based on the writeup, this technique isn't really about enabling you to start writing `s.position.x` where the old code would have written `s.x`. If that were all you wanted, you'd just keep writing `s.x`. It's about enabling you to write `s.x` everywhere, in old code and new code, while also being able to pass `s.position` to memcpy calls. You're never supposed to write `s.position.x`.


C++ namespaces are unrelated to this. They don’t accomplish the same thing.


The goal of inline namespaces is exactly to allow for migrating libraries across versions.


That's nice, but the blog post doesn't say anything about migrating libraries across versions? Looking at the comment thread, I see tialaramex's sibling comment suggested the blog post was about migration, but it's not.

I suppose migration is another possible use case for the union trick, and for that case C++ inline namespaces can be used as part of an implementation that achieves a broadly similar goal, but in a completely different way. As tialaramex notes, with inline namespaces you still end up with two different types.


Constexpr unions is the sane/safe way to use them. Its great, because accessing a member which isnt the last one written, constexpr will explicitly prevent it compile time. Whereas all other examples here are explicitly undefined behaviour!


Imo this is not “perverse”. In my vector library I alias a vec3 as float x,y,z and float[3] using this technique.


This is also known as the most common invocation of undefined behaviour in game programming. If you do this, write to y, then read from [1]. You are invoking undefined behaviour, and compilers doing different things here between windows, linux mac, and different compiler versions is a common cause of "why isnt my game working right on XXX, it works fine on YYY questions.


Type punning is not undefined, it's implementation defined in C. In practice, every major C compiler will be fine with type punning, though it may disable some optimizations.

The story is different in C++, but in practice many compilers support it the same as in C. Especially for games, where VC++ (PC, Xbox) and Clang (PS4/PS5) are the most commonly used compilers, it also works as expected. The trick is to only use type punning for trivial structs that don't invoke complications like con/de-structors or operators. The GP's example of a Vec3 struct that puns float x,y,z with float[3] is a very common one in games.


Something being very common and a very common source of portability issues isn't exactly contradictory. Its a bad idea, and it is outright being taught in modern game programming courses that its a bad idea, but common in older guides, specifically because it caused so many problems. Im pissed at this specific construct because I got it handed to me in a huge game library and had to spent a long time figuring out why it wasn't working in rare, but important cases.


But my point is that on the platforms that matter, it's not really a source of portability issues, and not a problem. For gamedev, anything outside of VC++ and Clang are niche and thus largely ignored.


I don’t see the undefined behavior here?


Op probably means cpp, where it is indeed undefined behavior. not sure about c. I doubt that if this would cause a "my game does not work on XXX" though. Is there really a compiler out there that will handle such abuse differently?


yes its undefined behaviour in both C and C++. Yes, a number of compilers treat this differently, its also poorly supported on custom hardware using standard compilers like gcc. So compiling for some mobile device with slightly custom ... good luck.


It is not undefined behavior in C. And the OP was about C.


I don't mean cpp


The C programming language, brought to you by Cthulhu.

You don't need eval(), you've got strcpy()!


I don't regard this as a "perverse" hack. If I ever do embedded memory mapped stuff in C11 this is way too tempting.


You are practically guaranteed to invoke undefined behaviour if you do. Just use a map on a std::array of e.g. std::byte


they said C11 tho


Bleurgh. I have a deep soft spot for C, and I'm known to get twisted pleasure from using obscure language features in new ways to annoy people, but this is a level of abuse that even I can't get behind. If you need namespacing, use C++. As much as I love C, it's terrible for large projects.


Linux kernel is large project and clearly C is sufficient for it, given the fact that migrating to C++ would probably be very easy (not using all C++ features, but just selected ones), yet it did not happen.

I think that C++ is better than C, but C is not that bad, even for large projects.


> Linux kernel is large project and clearly C is sufficient for it

Sure, and operating systems have been written in assmebly too. The question is whether it would be better than just sufficient if Linux were written in C++, today (ie C++17 or 20, not something old). Switching now probably wouldn't be feasible (even ignoring technical reasons, the kernel developer community is familiar with the C codebase and code standards and bought into it), but if Linux were started today, would it be a better choice?

Maybe the answer is still no and C would still be chosen, but the choice today is very different than it was when Linux was started. Of course, maybe Rust or something would be chosen today instead.


Cantrill did a talk on which he touches on C, C++ and Rust for systems programming [1].

His tl;dr being that Rust feels very much like a proper systems programming language, and more of a « better C » than C++. I don't entirely know what to make of it, but my instinct is that something like C++ with such an opportunity space for baroque concoctions (leading to an obsession with design patterns) is just playing with fire.

[1] https://www.youtube.com/watch?v=LjFM8vw3pbU


the kernel has to live with the choices it made in the 90s, you don't


Yeah they should have upgraded to some restricted subset of C++ or new restrictive language ages ago. I mostly buy the arguments against having exceptions, perhaps even against polymorphism in general, but the argument against destructors, or atomics... hell no.


> ...against polymorphism...

C has polymorphism. Inheritance-based virtual dispatch is just one kind of polymorphism. It's common to wire up polymorphism in C with bespoke data structures using tagged unions it function pointers. Changing an implementation at link time is even a form of polymorphism.


> Changing an implementation at link time is even a form of polymorphism.

I never truly appreciated how polymorphism can take so many different forms!


yes, and that is generally worse


And the Kernel devs would probably get really annoyed if you try to push this kind of name-spacing.

> C++ would probably be very easy

Not necessary, besides some small? problems due to the C++ allowing "more magic optimizations" then C they would switch to a sub-set of C++, and it might be so you would need to communicate to all contributors that a lot of C++ things are not allowed. And it might be easier to simple not use C++. I mean if it would be that easy the kernel likely would have switched.


> And the Kernel devs would probably get really annoyed if you try to push this kind of name-spacing.

Actually, they use it themselves. [0]

[0] https://lwn.net/SubscriberLink/864521/d704bdcced0c5c60/


Yes, it's scary.

But it's also not used <to have namespacing> but to <improve on cross-field memory operation>.


A big issue with introducing C++ into a codebase is that it's incredibly hard to stick to a particular subset or standard. There's always a well-justified argument for the next standard or "just this one additional feature". Eventually you end up with the whole kitchen sink, regardless of where you started.

I've had far more success hard-firewalling C++ into its own box where programmers can use whatever they can get running than trying to limit people to subsets.


People will make a mess of a large project regardless of the language.


This is probably a terrible idea, remember that if you have written one member of a union, all other members remain public, yet accessing any of them in any way is undefined behaviour. This is made way worse by most compilers mostly choosing to let you do what you think it will. They just dont guarantee they always will or in all cases.


I believe you are mistaken. The C11 standard, section 6.5.2.3 "Structure and union members" pgf 6, says "One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members." And that seems to be what's being used here.


No: from https://en.cppreference.com/w/cpp/language/union.

The union is only as big as necessary to hold its largest data member. The other data members are allocated in the same bytes as part of that largest member. The details of that allocation are implementation-defined but all non-static data members will have the same address (since C++14). It's undefined behavior to read from the member of the union that wasn't most recently written. Many compilers implement, as a non-standard language extension, the ability to read inactive members of a union.

What 6.5.2.3 simplifies is the use of unions of the type:

struct A{int type; DataA a;}

struct B{int type; DataB b;}

union U{A a;B b};

U u;

switch(u.type)...

Its not what is beeing used here.

std::variant is designed to deprecate all legitimate uses of union


The post is about C, not C++. My comment stands, as the original post has two structs in a union, and they start the same way, so it’s exactly the case covered in the C11 Standard.


It's actually weirder than that. The C standard allows type punning through unions, but not because of the clause you mentioned. It allows it because of footnote 95:

> If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type punning’’)

This is broader than the common initial subsequence clause, and allows punning between completely different types, e.g. int, char[4], and float.

You might ask, what is the point of the "common initial subsequence" rule then? It's to allow certain accesses that don't go directly through the union, so the compiler doesn't know for sure whether there's a union involved. Only problem is that all major compilers completely ignore this rule. [1] (But they do implement the first clause I mentioned, where the accesses do go through the union.)

[1] https://stackoverflow.com/questions/34616086/union-punning-s...


Your response to GP is based on the C++ reference and his explicitly is based on the C standard. Your assertion that ‘ [t]he details of that allocation are implementation-defined but all non-static data members will have the same address (since C++14)’ seems to directly conflict with the C11 standard. Also, your closing comment about std::variant is clearly only applicable to C++. I am just curious why you are using C++ when the article and GP are specifically addressing C?


You've mentioned this several times on this page, but this is still incorrect.

The C standard references "struct or union" all over the place because the two are so similar. The distinction is of course made clear in multiple places, but one that seems relevant here is:

> As discussed in 6.2.5, a structure is a type consisting of a sequence of members, whose storage is allocated in an ordered sequence, and a union is a type consisting of a sequence of members whose storage overlap. (ISO/IEC 9899:201x, §6.7.2.1, #6)

That's it. There's nothing about undefined behavior if you access one member and then another later. In fact there's even a paragraph which mentions doing just that:

> The size of a union is sufficient to contain the largest of its members. The value of at most one of the members can be stored in a union object at any time. A pointer to a union object, suitably converted, points to each of its members (or if a member is a bitfield, then to the unit in which it resides), and vice versa. (ISO/IEC 9899:201x, §6.7.2.1, #16)

A pointer to the union points to each of its members, and can be dereferenced to access it.

std::variant is not used in C; C and C++ are two different languages.


> The size of a union is sufficient to contain the largest of its members.

Correct me if I'm wrong, but there is no part of the C spec that says this:

When initializing a union member that is smaller than the largest member, the remaining bytes will always automatically be initialized to zero.

If I'm right then the following caveat must be added to your statement:

> A pointer to the union points to each of its members, and can be dereferenced to access it.

... if and only if the member which was originally initialized is at least as large as the other member being accessed.

In other words, if you write your program in a way that ensures it will only compile when all union members are exactly the same size, and you have mandatory tooling to make sure that any changes to said union follow the same rule by force of compilation errors, then and only then can you claim what you claimed without the threat of undefined behavior.


Don't actually do this.


The Linux kernel is using this for bounds checking. https://news.ycombinator.com/item?id=28015263


Like the parent poster, when I read the article I assumed that there was no conceivable reason to ever use this feature in a real C program. Let me just say that I'm pleasantly surprised to be proven wrong!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: