I understand the use case of this, but when I see it I always wonder if, and think I would prefer, some external code generation step instead rather than falling back on macros in the preprocessor. Like an external script or something.
Now you have a additional stage in your build, a bunch of new code to maintain, and either a bespoke language embedded in your standard C++ or a bunch of code emitting C++ separately from the code it logically belongs with.
Compare with a solution that's 100% standard C++, integrates into your build with zero work, can be immediately understood by anyone reasonably skilled in the language, and puts the "generated" code right where it belongs.
CMake makes this pretty painless. My codegen targets have only two additional instructions to handle the generation itself and dependencies: add_custom_command to call the codegen exec, and then add_custom_target to wrap my outputs in a "virtual" target I can then make the rest of my program depend on, but this is just for tidying up.
And I'll dispute the fact that any complex C prepro task "can be immediately understood by anyone reasonably skilled in the language". Besides, code should ideally be understood by "anyone reasonably likely to look at this code to work in it", not "reasonably skilled".
This isn't complex. It's a bit unusual, but not hard to understand if you understand the basics of how #include and #define work.
If you're working on the sort of C++ codebase that would benefit from this sort of code generation, and you're not reasonably skilled in C++, then god help you.
Are you talking about the X macro itself, or more generally?
I may be the obtuse one here, but for a more complex example, it took me a few hours to manage to make nested loops using Boost PP (for explicit instantiations). Even so, I avoid having to write a new one that's not a quick copy-paste because it's quite different from usual C++ programming, so my painfully acquired understanding quickly evaporated... as I suspect is the case of anyone who doesn't particularly focus on the C prepro.
In the end, it's just simpler to get some Python script or C++ program to write a string and dump that to a file than to write something illegible with the C preprocessor, if doing something at all complicated (in my opinion).
I'm talking about X-macros. There's a wide range of preprocessor shenanigans, from "everybody needs to know this" to "oh my god why." Each construct needs to be evaluated on its merits. IMO X-macros are closer to the simpler side of that spectrum. Consider writing things out by hand if you just have a few, but if you have a lot of things repeating like this, they're a fine tool to use. Boost PP is a whole different level of ridiculousness and I don't see ever using that sort of thing for anything serious.
> Now you have a additional stage in your build, a bunch of new code to maintain, and either a bespoke language embedded in your standard C++ or a bunch of code emitting C++ separately from the code it logically belongs with.
The preprocessor is already a bespoke language embedded in your C++, and code written in it is generally harder to maintain than, like, Python.
The cost of doing something non-standard is real, but not infinite; at some point the benefit in code maintainability and sanity is worth it.
I agree that you can go too far with it and it becomes better to do it a different way, but the X-macros technique is straightforward and easy to understand.
I've done this in C with the C preprocessor and Java with m4[0].
The upside of doing it natively is that it keeps the build simpler. And everybody at least knows about the existence of the C preprocessor, even if they don't know it well. And it's fairly limited, which prevents you from getting too clever.
The big downside of doing it with the C preprocessor is that the resulting code looks like vomit if it's more than a line or two because of the lack of line breaks in the generated code. Debugging it is unenjoyable. I'd recommend against doing anything super clever.
The upside of doing it out of band is that your generated source files look decent. m4 tends to introduce a little extra whitespace, but it's nothing objectionable. Plus you get more power if you really need it.
The downside is that almost nobody knows m4[1]. If you choose something else, it becomes a question of what, does anyone else know it, and is it available everywhere you need to build.
Honestly, integrating m4 into the build in ant really wasn't too bad. We were building on one OS on two different architectures. For anything truly cross-platform, you'll likely run into all the usual issues.
ETA: Getting an IDE to understand the out of band generation might be a hassle, as other folks have mentioned. I'm a vim kinda guy for most coding, and doing it either way was pretty frictionless. The generated java code was read-only and trivial, so there wasn't a lot of reason to ever look at it. By the time you get to debugging, it would entirely transparent because you're just looking at another set of java files.
[0] This was so long ago, I no longer remember why it seemed like a good idea. I think there was an interface, a trivial implementation, and some other thing? Maybe something JNI-related? At least at first, things were changing often enough that I didn't want to have to keep three things in sync by hand.
[1] Including me. I re-learn just enough to get done with the job at hand every time I need it.
This is what I do, these days. Whenever I would previously have reached for X-macros or some other macro hack, I tend to use Cog [1] now instead.
It's quite a clever design; you write Python to generate your C++ code and put it inside a comment. Then when you run the Cog tool on your source file, it writes the generated code directly into your C++ file right after your comment (and before a matching "end" comment).
This is great because you don't need Cog itself to build your project, and your IDE still understands your C++ code. I've also got used to being able to see the results of my code generation, and going back to normal macros feels a bit like fiddling around in the dark now.
IDEs understand preprocessor macros, so IDE features (jump2def, etc) work with this. IDEs also can expand the macro invocations. So, I prefer the macros when possible :-).
The C# "source generator" approach is a good compromise; it runs within the build chain so has the ease-of-use of macros in that respect, but they don't need to be written in a weird macro language (they are C# or can call external tool) and when you debug your program, you debug through the generated source and can see it, more accessible than macros. Not sure if there is something similar in C/C++ integrated with the common toolchains.
But when working outside C/C++ I've found myself missing the flexibility of macros more times than I can count.
> But when working outside C/C++ I've found myself missing the flexibility of macros more times than I can count.
Me to, and that's even in Lisp!
Preprocessor macros are hard and bugprone because they share the failings of Unix philosophy of "text as universal interface" - you're playing with unstructured (or semi-structured) pieces of text, devoid of all semantics. And this is also what makes them occasionally useful - some code transformations are much, much easier to do when you can manipulate the text form directly, ignoring syntax and grammar and everything.
Only the final value must be correct code - starting point and intermediary values can be anything, and you don't need to make sure you can get from here to there through valid data transformations. This is a really powerful capability to have.
(I also explicitly compared preprocessor macros to "plaintext everything" experience that's seen as divine wisdom, to say: y'all are slinging unstructured text down the pipes way too much, and using preprocessor way too little.)
Using the C preprocessor is standard, available, compatible and the major usage patterns are "known". For a lot of cases, they're way easier to reason about rather than learning how an external generation tool is used to generate the code. In order to understand these macros all I need is to read the source code where they're used.
Nothing C++ related in the pattern though. This C preprocessor trickery is practically so classic you couldn't necessarily even call it a "trick".
After trying to wrangle Boost PP and other advertised compile-time libraries such as Boost Hana (which still has some runtime overhead compared to the same logic with hardcoded values), I've finally converged to simply writing C++ files that write other C++ files. Could be Python, but I rather keep the build simple in my C++ project. Code generation is painless with CMake, no idea with other build configuration utilities.
CMake has a particularly irritating flaw here, though, in that it makes no distinction between host and target which cross-compiling, which makes it really difficult to do this kind of code generation when supporting this use-case (which is becoming more and more commoon).
Right, I hadn't thought of that, to be honest. If I understand correctly, you're saying the codegen targets will be compiled to the target arch, and then can't be run on the machine doing the compiling?
I think one solution might be to use target_compile_options() which lets you specify flags per target (instead of globally), assuming you're passing flags to specify the target architecture.
That only works if it's mostly the same compiler, unfortunately. They could be completely different executables, calling conventions, etc. I don't know why CMake still has such a huge hole in its feature set, but it's quite unfortunate.
One case I benchmarked was Bernstein/Bézier and Lagrange element evaluation. This is: given a degree d triangle or tetrahedron, given some barycentric coordinates, get the physical coordinate and the Jacobian matrix of the mapping.
"Runtime" here means everything is done using runtime loops, "Hana" using Boost Hana to make loops compile-time and use some constexpr ordering arrays, "hardcoded" is a very Fortran-looking function with all hardcoded indices and operations all unrolled.
As you see, using Boost Hana does bring about some improvement, but there is still a factor 2x between that and hardcoded. This is all compiled with Release optimization flags. Technically, the Hana implementation is doing the same operations in the same order as the hardcoded version, all indices known at compile time, which is why I say there must be some runtime overhead to using hana::while.
In the case of Bernstein elements, the best solution is to use de Casteljau's recursive algorithm using templates (10x to 15x speedup to runtime recursive depending on degree). But not everything recasts itself nicely as a recursive algorithm, or I didn't find the way for Lagrange anyways. I did enable flto as, from my understanding (looking at call stacks), hana::while creates lambda functions, so perhaps a simple function optimization becomes a cross-unit affair if it calls hana::while. (speculating)
Similar results to compute Bernstein coefficients of the Jacobian matrix determinant of a Q2 tetrahedron, factor 5x from "runtime" to "hana" (only difference is for loops become hana::whiles), factor 3x from "hana" to "hardcoded" (the loops are unrolled). So a factor 15x between naive C++ and code generated files. In the case of this function in particular, we have 4 nested loops, it's branching hell where continues are hit very often.
That would be fairly interesting to look at the actual code you've used, and have a look at the codegen. By a chance, is it viable for you to open-source it? I'd guess it should bear lots of interest for Hana author/s.
What compiler/version did you use? For example, MSVC isn't (at least wasn't) good at always evaluating `constexpr` in compile-time...
> hana::while creates lambda functions, so perhaps a simple function optimization becomes a cross-unit affair if it calls hana::while. (speculating)
Hmm, I'd say it (LTO) shouldn't influence, as these lambdas are already fully visible to a compiler.
I never thought to contact them, but I might do that, thanks for the suggestion. This is something I tested almost two years ago, I have these benchmarks written down but I've since deleted the code I've used, save for the optimal implementations (though it wouldn't take too long to rewrite it).
I tested with clang on my Mac laptop and gcc on a Linux workstation. Version, not sure. If I test this again to contact the Hana people, I'll try and give all this information. I did test the constexpr ordering arrays by making sure I can pass, say, arr[0] as a template parameter. This is only possible if the value is known at compile time. Though it's also possible the compiler could be lazy in other contexts, as in not actually evaluating at compile time if it figures out the result is not necessary to be known at compile time.
Oh yeah, you're right, I was confusing translation unit and function scope.
Yeah, it's all done automatically when you build, and dependencies are properly taken into account: if you modify one of the code generating sources, its outputs are regenerated, and everything that depends on them is correctly recompiled. This doesn't take much CMake logic at all to make work.
In my case, no, it's dumb old code writing strings and dumping that to files. You could do whatever you want in there, it's just a program that writes source files.
I do use some template metaprogramming where it's practical versus code generation, and Boost Hana provides some algorithmic facilities at compile time but those incur some runtime cost. For instance, you can write a while loop with bounds evaluated at compile time, that lets you use its index as a template parameter or evaluate constexpr functions on. But sometimes the best solution has been (for me, performance/complexity wise) to just write dumb files that hardcode things for different cases.
External codegen introduces a lot of friction in random places. Like how your editor can no longer understand the file before you start building. Or how it can go out of date with respect to the rest of your code until you build. If you can do it with a macro it tends to work better than codegen in some ways.