For those who are confused: this allows you to associate a type with a number. You can always retrieve the type you previously stored via decltype(loophole(tag<N>{}))
I was aware that the C++ type checker had memory and even that it was Turing complete. This is still non-intuitive to me. Only an Uber C++ expert who knows exactly what constructs cause the type checker to retain memory would be able to design this.
Maybe I've just been away from C++ for too long (~2 years) but I wasn't able to follow what was going on there. What does `sizeof` have to do with anything (doesn't work without it, though)? I would have enjoyed an article that talked me through this in a bit more detail.
The sizeof() forces instantiation of the template.
The trick is that the template declares a friend function with a body. Being a "friend" declaration, it is not scoped within the template. However, the body is defined within the template and is allowed to use the template's parameters.
The result is that the friend function's definition depends on how the template was instantiated, even though the friend function is not scoped within the template. If you instantiate the template two different ways, the compiler actually complains that the friend function has two different definitions.
What this all means is that you can write some code that instantiates a template, and then later on, you can observe how the template was instantiated. Instantiating a template has a programmatically-observable side effect.
And you can use that to trick the compiler into giving you more information about other declarations. SFINAE on steroids basically.
Let's break the program down into pieces, although I'm not going to go in program order because it makes the exposition a little easier if I don't.
The first point of interest is this line:
auto loophole(tag<0>);
C++ parsing is a complex thing, and this is actually one of those cases where I had to stare at it for a while to figure out what it actually classifies as. At first glance, it can be a variable declaration or a function declaration, but Most Vexing Parse kicks in and this is a function declaration.
What threw me here was this is a trailing return type declaration of a function where the trailing return type is omitted, and I wasn't sure at first if this is legal. But it is a function declaration, which means we can't validly use it until we have a definition.
Now the next thing you might be interested is that the struct tag is a template. But, no, that doesn't matter one iota [0]. So we'll ignore it. Instead, let's move on to the template definition:
template<typename T, int N>
struct loophole_t {
friend auto loophole(tag<N>) { return T{}; };
};
There's two separate things to talk about here. Again eschewing the textual order, let's focus on the the friend line first. We have a definition of a function that is a friend--this causes the friend definition to be added as a member of the enclosing namespace of the class, as if it had been defined outside of the class, so it's as if we had this code:
auto loophole(tag<N>) { return T{}; }
template <typename T, int N>
struct loophole_t {
friend auto loophole(tag<N>);
};
But wait, you notice. N and T are undefined when we pull out of the template! Well, this is the second point to bring up. Templates in C++ are really macros of a fashion. When you declare, or even define, a template, nothing happens. At least, not yet. Internally, the compiler basically saves the state of the template AST internally. When the template is instantiated, then it creates an entire copy of the template body as if it were defined, with the values of the template parameters substituted in the body of the template.
I'll use the tag<0> bit to give an example of what the code looks like at the point of our actual auto loophole declaration:
// internal compiler reference to a templatable tag struct
// internal compiler reference to a templatable loophole_t struct
// this is autogenerated template instantiation
struct tag<0> {};
auto loophole(tag<0>);
Now we get to the kicker line, the sizeof, reproduced here:
sizeof( loophole_t<std::string, 0> );
What does sizeof do? Well, nothing. It's a constant expression--even if you drop an expression in the argument of sizeof, that argument is not evaluated. But that's an expression context. Instead, however, we passed in a type name. This type name is a templated type--which means we instantiate that template. So let's add that to our running AST example:
// Autogenerate the template expansion
auto loophole(tag<0>) { return std::string{}; }
struct loophole_t<std::string, 0> {
friend auto loophole(tag<0>);
};
sizeof( loophole_t<std::string, 0> );
Because of the friend body definition, by instantiating the template, we created the function definition for the auto declaration that we provided. It is as if we had defined the function specifically at the point of the instantiation of the template, the first use of the template with specific arguments [1]. So the instantiation of loophole_t<std::string, 0> provides the body for auto loophole(tag<0>) that lets us deduce that its return type is actually std::string.
So decltype looks up what the return type of calling the function named loophole with a prvalue of type tag<0> would be. There is only one such candidate function--the one we have a declaration and instantiation-generated definition for--and so we the proper type is the return type of said function, which is deduced from its body (in the instantiation-generated definition) as std::string.
If we omit the sizeof line, there is no instantation of loophole_t anywhere. And without that instantation, there is no definition of the loophole itself provided.
[0] Not for explaining what it does. It's for what it's for--you can easily generate several numbers with the template parameter and associate each number with a particular type.
[1] Okay, I'm shortcutting a lot of complexity here. [temp.inject]¶1 provides that "When a [class] template is instantiated, the names of its friends are treated as if the specialization had been explicitly declared at its point of instantiation."--in other words, precisely as I described it here. But this is specifically only true for friends. For member functions or regular templated functions, a lot of the name lookup actually happens at the point of declaration, not instantiation. But some of it doesn't, based on whether or not it's based (directly or indirectly) on a template parameter. It's complicated!
For those who are looking to understand what is going on, I would recommend the blog series I wrote -almost- more than half a decade ago at https://b.atch.se
Compiler intrinstics havet changed a lot since then, but the wording in the standard remains quite stable in this department
Disclaimer; I did not read your post in full (on set, working atm).
Edit: time flies, it has been more than half a decade.
I have been able to replicate this technique--using the example linked to godbolt from the blog post--working on clang 7 and failing on clang 8-11 (with the following error)... but it works again on clang 12 ;P. (I am bisecting overnight to see exactly what fixed it; if anyone sees this and is interested, come back to this thread tomorrow for a follow-up.)
note: candidate template ignored: substitution failure [with N = 0]: function 'loophole' with deduced return type cannot be used before it is defined
template <int N, typename T = decltype(loophole(A::tag<N>{}))> T get_type();
~~~~~~~~ ^
commit dd8297b0669f8e69b03ba40171b195b5acf0f963
Author: Richard Smith <richard@metafoo.co.uk>
Date: Fri Oct 30 18:30:56 2020 -0700
PR42513: Fix handling of function definitions lazily instantiated from
friends.
When determining whether a function has a template instantiation
pattern, look for other declarations of that function that were
instantiated from a friend function definition, rather than assuming
that checking for member specialization information on whichever
declaration name lookup found will be sufficient.
clang/include/clang/AST/Decl.h | 14 +++-
clang/lib/AST/Decl.cpp | 61 ++++++++++++--
clang/lib/Sema/SemaDecl.cpp | 84 +++++--------------
clang/lib/Sema/SemaDeclCXX.cpp | 11 +++
clang/lib/Sema/SemaTemplateInstantiateDecl.cpp | 110 ++++++++++++-------------
clang/test/SemaTemplate/friend.cpp | 7 ++
6 files changed, 159 insertions(+), 128 deletions(-)
Sadly I never finished that blog series due to severe suicidal depression (ended up being hospitalized for several months), but I do have a toy implementation of that laying around somewhere.
Currently abroad for my primary profession (fashion model), but as soon as this current job is finished and I'm back home I could look for it and ping you.
Any preferred channel where I can reach you, or do you prefer me replying here?
The more I'm forced to work with C++ I'm convinced it's some sort of long running joke we mistook for sincerity. Any time you want to do something interesting someone has a magic template that "solves the issue", yet nobody else understands how it works.
We should be using languages actually designed for these uses, not trying to force those languages features back into c++.
This is just like training your dog to walk on its hind legs balancing a ball on its nose; it is an achievement, of sorts, but tells you nothing about dogkeeping. Nobody actually coding C++ does this, and a working programmer has no reason to pay it any attention beyond its novelty value. Anyway, Clang 8 and up do not allow it.
You can do overwhelmingly harder-to-understand-or-justify things in Haskell, and most are about as important. But some of those are essential.
I was aware that the C++ type checker had memory and even that it was Turing complete. This is still non-intuitive to me. Only an Uber C++ expert who knows exactly what constructs cause the type checker to retain memory would be able to design this.