A "monostate" in the design-patterns lingo of 20 years ago was a class with only static member variables, basically where all state is shared state. It was supposed to be alternative to the Singleton pattern where you don't need all those .getInstance() calls and instead can just default-construct an instance, any instance, to get access to the shared state. It fell out of favor because usage was fairly error-prone and surprising to programmers who are not familiar with the pattern. Most people expect that when you create a new instance, you are actually creating new state, but monostate intentionally makes each new instance just a window on the shared global state.
I would've thought that the C++ template class would be just a marker interface to use on a monostate, so that users of the class know that it has shared state. But it seems like usage patterns in the article are very different from that, and all the comments here are ignorant of the history of the monostate pattern and befuddled at its intended usage. Maybe it was added to the standard by someone familiar with the design pattern, but they didn't do a good job with education and documentation to explain to everyone else what it was for?
That’s the GP’s point. It is strange that std::monostate was chosen as the name, given that the different Monostate pattern [0] was fairly well established in C++ circles.
I considered that, but a lot of their comment seems to be about changing the class's behaviour to reflect the functionality they were expecting, rather than just giving it a different name. Especially this bit:
> I would've thought that the C++ template class would be just a marker interface to use on a monostate, so that users of the class know that it has shared state.
Also, if that's what they really meant then they surely could've have written a far sorter comment that simply says this name is already taken and it should be called something else. They don't seem to be saying that at all.
(Not that it affects my original point, but FWIW that linked meaning of monostate isn't common in my experience, and it sounds like a truly awful idea: if your state is really good then be honest about it and use free functions. So it hardly seems worth reserving a useful word for it over the concept std::monostate is actually about.)
BTW I saw your deleted reply about the point of monostate, thanks. For something that has to implement an existing interface, I can see the possible benefit.
Yeah I actually haven’t used the pattern much myself, so I started to doubt the relevance. Once you start having multiple separate monostates implementing the same interface, and also limit their creation/instantiation in the sense that the code that uses the monostate objects is separate from the code that creates them, the difference to regular objects/classes becomes a bit murky.
Yeah. The design pattern sounds like there is one state, and it is shared. What STL has looks like, all instances look the same, hence only one state is possible. They are homonyms with slightly different etymologies.
DonHopkins 85 days ago | parent | context | favorite | on: What if null was an Object in Java?
Why stop at null, when you can have both null and undefined? Throw in unknown, and you've got a hat trick, a holy trinity of nothingness!
Of course the Rumsfeld Matrix further breaks down the three different types of unknowns.
>"Reports that say that something hasn't happened are always interesting to me, because as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns—the ones we don't know we don't know. And if one looks throughout the history of our country and other free countries, it is the latter category that tends to be the difficult ones." -Donald Rumsfeld
1) Known knowns: These are the things we know that we know. They represent the clear, confirmed knowledge that can be easily communicated and utilized in decision-making.
2) Known unknowns: These are the things we know we do not know. This category acknowledges the presence of uncertainties or gaps in our knowledge that are recognized and can be specifically identified.
3) Unknown unknowns: These are the things we do not know we do not know. This category represents unforeseen challenges and surprises, indicating a deeper level of ignorance where we are unaware of our lack of knowledge.
And Microsoft COM hinges on the IUnknown interface.
>Speaking at a software conference in 2009, Tony Hoare apologized for inventing the null reference:
>"I call it my billion-dollar mistake. It was the invention of the null reference in 1965. At that time, I was designing the first comprehensive type system for references in an object oriented language (ALGOL W). My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn't resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years." -Tony Hoare
>"My favorite is always the Billion-Dollar Mistake of having null in the language. And since JavaScript has both null and undefined, it's the Two-Billion-Dollar Mistake." -Anders Hejlsberg
>"It is by far the most problematic part of language design. And it's a single value that -- ha ha ha ha -- that if only that wasn't there, imagine all the problems we wouldn't have, right? If type systems were designed that way. And some type systems are, and some type systems are getting there, but boy, trying to retrofit that on top of a type system that has null in the first place is quite an undertaking." -Anders Hejlsberg
>"My favorite is always the Billion-Dollar Mistake of having null in the language. And since JavaScript has both null and undefined, it's the Two-Billion-Dollar Mistake." -Anders Hejlsberg
Assuming these mistakes are additive and not multiplicative.
Each equality function, at least in Common Lisp, has a distinct goal.
Numbers use = if you don't care about type. For instance, the real number 0 and the complex number 0 + 0i would be treated equal by this function.
Do you care about strict pointer equality? eq.
Do you want to distinguish 0 and 0d0? eql.
Do you care about isomorphism? equal.
The only weird function to me would be equalp, given that it performs case-insensitive comparison of characters etc. Overall, I find complaints about the equality predicates to be based on a misunderstanding of the definition of "equality".
Clojure goes the opposite direction and has a single = function that performs all sorts of reflection and deep comparisons; it's convenient in practice, but it's also easy to burn CPU cycles if you are e.g. comparing large collections and only care about pointer equality.
> Overall, I find complaints about the equality predicates to be based on a misunderstanding of the definition of "equality".
It's more that LISP is the only language I know that exposes how complex the concept of equality actually is. All the other ones I use more-or-less make an implicit assumption that equality means one of the things that LISP offers but if you want the other definitions of equality you either have to build support for them by hand or call out to a library function that's going to use (static or dynamic) reflection to reduce the problem to the one kind of equality the language implements.
(Apart from that, the only other complaint is that LISP being as old as it is, the default names for those equalities are too terse to self-describe. You just have to memorize which one is 'eq' and which one is 'eql', there's no way to guess looking at the names themselves).
Common Lisp doesn't do a very good job, unfortunately.
[1]> (equal '(1 2 3) '(1 2 3))
T
[2]> (equal "abc" "abc")
T
[3]> (equal #(1 2 3) #(1 2 3))
NIL
[4]> (subtypep 'string 'vector)
T ;
T
So a string is formally a kind of vector according to t he type system itself, but vectors don't use equal comparison whereas strings do. We can reach for equalp:
[1]> (equalp #(1 2 3) #(1 2 3))
T
But then we have to accept, something we might not necessarily want:
[2]> (equalp #(1 2 3) #(1 2 3.0))
T
which will not be true under the equal function that we really wanted to Just Work for simple vectors:
[3]> (equal '(1 2 3) '(1 2 3.0))
NIL
eql being the default equality function throughout the library is also suboptimal.
The specification allows (eq 1 1) to be false, which caters to unrealistically poor implementations such as ones that heap-allocate all integers. This, in spite of the fact that fixnum is required to be at least 16 bits wide: most-positive-fixnum must be at least 32767, and the fixnum type must be a supertype of (signed-byte 16).
What this means is that if Common Lisp were fit into a 16 bit machine (good luck with that at all), it would have to provide 16 bit fixnums (easily doable, but not unboxed ones) which defeats the fixnum concept of fixnums being the unboxed range of integers, beyond which we have boxed bignums.
Two arrays are equal only if they are eq, with one exception: strings and bit vectors are compared element-by-element (using eql).
What makes strings and bit vectors special, in my opinion, is that they are homogeneous arrays. Heterogeneous arrays aren't something that I often use outside of Clojure (usually because they are awkward to use in statically typed languages).
The specification allows (eq 1 1) to be false
Because the specification doesn't limit the magnitude of numbers nor state how they should be stored in memory. The scenario you've described requires an implementation that intentionally makes every number a bignum or an instance of a class (like Smalltalk).
(To contextualize and clarify: It's a classic old JavaScript joke about how Brendan Eich hates gay people getting married so much that he donated money to a political campaign against marriage equality, even though he enjoys the human right of marriage himself. Some bigots think they are more equal than others.)
That always chafed me about Scheme. I see the utility of `eq?` and `eqv?`, but I'd prefer that there were only `equal?` and functions were defined to get an object's "id" or "numeric equivalency class," or whatever, instead of having different flavors of data structures that differ only for certain values.
What's really weird to me is not that C++ has a unit type and picked a weird name for it (that's just C++). The weird thing is how many unit types it has:
The distinct types are the whole point. You wouldn't want a std::tuple<> to be implicitly convertible to a std::optional<T> (for arbitrary T), and std::nullptr_t exists to be the type of nullptr, which captures the conversion behaviours appropriate for null pointer literals and has nothing to do with the variant use case std::monostate exists to serve.
If there was a std::unit_t and it was implicitly convertible to optional, tuple and pointer, I don't think that would be worse in terms of usability at all (maybe worse in readability for people who haven't heard of a 'unit' type).
As for the std::variant use case, using std::monostate is only a matter of convention there. You could use any of the other unit types just the same.
std::monostate is explicitly provided for use with std::variant. It's in the <variant> header. Sometimes people use it for other things, but that's really an abuse, especially given defining your own type suitable for such cases is typically as simple as `struct mytype{};`.
Using one type to represent empty literals for optional, tuple and pointer types, implicitly convertible to all of them, would make the compiler accept many obviously accidental constructs. In a world where the maintainers of C++ are trying their hardest to make the language safer what conceivable benefit would there be?
Why wouldn't you want std::tuple<> to be the same as std::monostate, though? In many languages with a proper unit type such as Haskell and Rust, the zero tuple is the unit type.
void is the unit type. The fact that it is not constructible is a wart of the language, inherited from C. It would be easy to fix and would simplify a significant amount of generic code.
A function returning bottom cannot return, yet void foo() {} can. In fact it can even return the result of calling other void functions:
void bar() { }
void foo(){ return bar();}
In generic code void is usually internally replaced by a proper, regular void_t unit type and converted back to void at boundaries for backward compatibility.
[[noreturn]] void bar();
would be a candidate for a bottom-returning function, except that [[noreturn]] isn't really part of the type system.
> void is the unit type. The fact that it is not constructible is a wart of the language, inherited from C.
> A function returning bottom cannot return, yet void foo() {} can.
Or you could say it the other way, that it is the bottom type, and the fact that it can be used as the unit type for returned values is a wart of the language. Furthermore, void* isn't a pointer of the unit type, it's a type for pointers to undefined/unspecified value types.
nothing is gained by making void a proper bottom type. It would only break existing code. OTOH make void a proper unit type would be backward compatible and actually make the language simpler both from a specification point of view and in practical terms.
I didn't mean that it should be made into a bottom type, even though it sort of looks like it on the surface.
I think the original idea was that void meant "unknown", not "empty" or "non-existent": It was all about whether values could be allocated or not (the wart mentioned above). A plain void variable cannot, but a void pointer can be allocated. For functions, they just reused the keyword to mean "no return value" or "no arguments".
A pointer to the bottom type would make even less sense as an interpretation for void *, since such a pointer couldn't possibly point to any initialized value.
Sorry, I wasn't clear enough: I meant that void is not just a mix of the unit and bottom types, it is also in used for unspecified (unknown actually) types.
I ran into this recently writing some C++20 coroutines. The protocol for delivering values from a coroutine that was previously suspended has two flavors: one for values and one for void. My initial draft just implemented the value version and used a struct VoidTODO {} where void should be.
It's too late now. void pointers are used as a pun to mean "type wildcard." If void were a real thing that could have a size and address, that wouldn't work anymore.
Void means "it is a syntax error to construct a value of this type". This is not a type that exists in category theory or Haskell. (But similar to the "bottom" type.)
Hence, "void*" - a pointer to something, but it would be a syntax error to derefence this pointer.
In that regard void behaves like any incomplete type. In C and C++ you cannot construct objects of incomplete types nor you can assign through pointers of incomplete types. But you can construct pointers to void and other incomplete types.
Differently to other incomplete types void has some special behaviour: you can declare a function as returning void and return with no arguments is also special cased. You can also cast to void.
A void with proper unit semantics would simply be a complete type instead. The only special case would be return with no arguments implicitly returning a void instance, but that would be pure sugar.
That's a good point. Maybe one could argue that rather than the unit type not being constructible, the wart of C is that functions that return "bot" can still "finish executing without returning".
I would almost rather argue that void is indeed the "bot" type, and a function marked with a void return type shouldn't be said to "return void;" rather we should say that it's an overloaded syntax that means the function has no return value at all. Same for "return bar()" there, that's just a false-friend of the syntax for returning a value, just syntactic sugar for "bar(); return;".
C and C++ don't have a type spindle, where void would be at the bottom. Only C++ has the concept of subtype, only in the class system, and the C++ class system doesn't have a bottom type; there is no bottom class that is a base for all the others.
void is not a proper type; it's just a hack shoehorned into a convenient spot in the type system.
Which is why the C++ people have to invent this whole zoo of other things.
If void were a type, then, for starters, "return x;" would be syntactically valid in a function returning void. (Only, no possible x would satisfy the type system, so there would have to be a diagnosable rule violation in that regard.)
A function returning void does not return a type. It doesn't return anything; it is a procedure invoked for side effects.
The same situation could be achieved in other ways, like having a procedure keyword instead of void.
The (void) parameter list is another example of void just being a hack. It was introduced in ISO C, and then C++ adopted it for compatibility.
The 2023 draft of ISO C finally made () equivalent to (void), though it will probably take many decades for (void) to disappear.
> C and C++ don't have a type spindle, where void would be at the bottom. Only C++ has the concept of subtype, only in the class system, and the C++ class system doesn't have a bottom type; there is no bottom class that is a base for all the others.
A bottom type is not the base of all other types.
> void is not a proper type; it's just a hack shoehorned into a convenient spot in the type system.
It is a type, but it is not Regular and it is incomplete. 'return x;' is invalid in a void-returning function because it doesn't type check. 'return void()' or 'return (void)0;' or 'return void_returning_function();' are all valid because they type check.
Making void regular has been proposed multiple times [1]. It is a relatively simple extension but nobody that cares has the time to carry it through standardization.
1 A return statement with an expression shall not appear in a function whose return type is void.
It's a constraint violation. It doesn't matter what the type of the expression is.
It looks as if C++ made a small improvement here.
Yes, the bottom type is at the bottom of the type derivation hierarchy. That's why the word bottom is there; that's what it's at the bottom of. It's also why it can't have any instances. Since every other type is a supertype, then if the bottom type contained some value V, that value would be imposed into every other type! V would be a valid String, Widget, Integer, Stream, Array ... what have you.
To clarify, it could be that the bottom type is not the base of all types, if the language has a split between some types which participate in that sort of thing and others that don't (e.g. class versus basic types or whatever).
But void is not the base of anything in C and C++.
You could argue that void is in some category of types where it is at the bottom; but no other types are in that category.
There is another problem: a bottom type should be the subtype of all types in that category. That includes being its own subtype. There we have a problem: C and C++ void is not a subtype of void in any sense.
That doesn't seem right to me. I can define a function returning "void", and it can terminate. I would expect that a function returning an uninhabited type can never complete.
So it appears this follows the terrible C++ "naming things with the wrong name on purpose" trend, with the biggest example being naming a variable length array (a list, collection, etc) `std:vector` even though Vector was already a well known word with a very different meaning.
I'd argue that unit would be just as cryptic as monostate, if you don't know what either is.
Like, if we assume someone looking at code with `std::unit`, what might they think this is? If one is not aware of its use in ML or similar, it could just as easily be assumed that it could be something to do with units like meters or kilograms or whatnot. After all, the C++ standard library is vast so it wouldn't necessarily be all that far-fetched.
Then the only question would be to ask why it would be default-constructible. At which point you'd have to read the docs for the type anyway.
It’s called “unit” because it is a unit of the operation of multiplication / constructing a tuple (up to a unique isomorphism): for any type 't the tuple types unit * 't and 't * unit are isomorphic to 't. When you don’t have that as a fundamental operation in your type system, like ML does, then it could be confusing.
sure, but I would be surprised if a significant fraction of programmers knew that. When we hear the word "unit", that's not the first definition that comes to mind.
> I'd argue that unit would be just as cryptic as monostate, if you don't know what either is.
And if my grandmother had wheels she'd be a bicycle. Of course if there wasn't a standard name for this concept that had been used in the industry for 50+ years (and in theoretical work for over 100) then it wouldn't make any difference what name you used for it. But given that there is a standard name for this concept that has been used in the industry for 50+ years, making up a different name is pretty unfortunate.
In C++ lingo they are called function objects or callables, not functor. You might stumble on code here and there that suffix a class name with Functor but it's not that common.
I would say it was widespread, but it's now fallen out of favor. Alexandrescu got the name wrong 25 years ago† and that got embedded into early boost, but by the time that blog post was written I feel like I'd mostly stopped seeing it being used by C++ experts.
† I don't know if the original mistake was his, but he certainly did a lot to spread it.
A simple Google search shows no shortage of recent websites and documents including fairly authoritative references using the term functor to describe a type that overloads operator ().
Heck even the author of this article, Raymond Chen, uses the term functor as recently as 2020 to describe such a construct:
> In C++ lingo they are called function objects or callables, not functor.
The word "functor" has a long and glorious history in C++. Try entering "C++ Andrei Alexandrescu functors" into your internet search engine of choice. For bonus points, try "c++ Scott Meyers functor" as well.
Coming from other languages I noticed this about C++ as well. I can't give examples right now but I recall multiple times being like: "Oh that is just a weird name for $KnownComputingConcept".
The term "vector" represents an ordered collection of elements(ex 3-dim vector is [x,y,z]). A list is flat out incorrect because it is used to refer to linked list(std::list), and a collection means a group of objects and that can be anything like a map, set, etc so it's too generic.
Before C++ popularized the term, other programming languages and libraries had already used "vector" to describe similar data structures. For example, Common Lisp has a vector type that represents a one-dimensional array.
Stepanov introduced the term in C++ and he fully acknowledges that it was a bad name and that he regrets it. If he could redo it, he would have renamed it array or array_list.
Void is different from this type though, as a variable of type void can't be occupied.
In ML and friends monostate is called unit (and gets used a lot because void returns aren't allowed by the languages). Some have empty types too, which can never be occupied. A function returning Empty can't return, for example, though there are other use cases
You are equivocating on the word "void". Your statement that "a variable of type void can't be occupied" is true in functional languages where "void/Void" is often used as the name of a type that isn't inhabited (assuming the language is sound/normalizing/whatever).
But here we are talking about C++, where "void" is a pseudotype that is absolutely inhabited, in some conceptual sense. Any function that is declared to return void and which returns is returning a thing that conceptually inhabits void. In this sense, std::monostate indeed captures the same concept as void, but in a much better way, because it's properly a type, not a pseudotype.
Note: Java does the same thing, effectively, with "Void" which is inhabited by exactly one value: null.
I think it's not correct to say that void is a monotype in C++, because the compiler won't allow you to assign the result of a function marked void to a variable, and you cannot declare a variable of type void.
I'd accept that it's not the same as the empty type though, given that void* can be occupied and functions marked void can return. Probably someone with more type theory than me can name this properly
> Probably someone with more type theory than me can name this properly
Probably “garbage”. C’s void is not a type and does not behave consistently, it’s a keyword associated with arbitrary convenient behaviours for that case.
Which is really annoying, and makes a ton of templated code in C++ have have to bifurcate on void unnecessarily. They already let me do return f(); in a void function if f also returns void... they should let me declare a variable of type void and the language is going to become a lot more pleasant.
Not that familiar with c++ but used to have this thought about both Java and C#. Think I’ve changed my stance on it now though.
If following something like CQS the bifurcation can be thought of allowing “pure” functions and excluding code with a temporal / side-effecting component from higher order code.
Not saying bifurcating on void is the best approach to handle that, but in languages where side effects are a thing something is needed to make sure higher order code and side effecting code mix properly.
And this doesn’t come anywhere near to properly making that distinction anyway: a non-void function can have all the side effects, and a void function can have no side-effects.
It also does not “make sure higher order code and side effecting code mix properly”, it just makes a subset of likely side-effecting code not mix with higher order code at all.
> I think it's not correct to say that void is a monotype in C++, because the compiler won't allow you to assign the result of a function marked void to a variable, and you cannot declare a variable of type void.
The compiler does not allow you to do that particular operation out of an arbitrary restriction, but that does not make `void` a true void type. It still holds a monotype value!
int bar() {
std::cout << "Bar" << std::endl;
return 0;
}
void baz() {
std::cout << "Baz" << std::endl;
}
template <typename T> T foo(T (*f)()) {
std::cout << "Foo" << std::endl;
return f();
}
template <typename T> T varfoo(T (*f)()) {
std::cout << "Varfoo" << std::endl;
T a = f();
return a;
}
int main()
{
foo(bar); // Valid (returning an int).
foo(baz); // Valid (returning a void).
varfoo(bar); // Valid (assigning an int).
// varfoo(baz); // Invalid (assigning a void (why???)).
}
I'm not that familiar with C, or C++. My impression is that void is a special case that doesn't need to be special, some accidental complexity that came from mapping machine instructions to a higher level language.
It's kind of baked into C grammar. And there's absolutely no compelling use-case to fix it in C.
In C++, there's a very compelling case for making void an actual type, because you can't use void as a templated type, which means that templates involving functions that potentially have void return types require unpleasant amounts of template metaprogramming.
Now that C++ standards committees are considering basic usability fixes (e.g. the long overdue ability to do`namespace com::microsoft::directx { }`) there's a vague possibility that somebody might look into actually fixing this some time before 2040.
Incidentally, the classic C did not have 'void'; instead, it was assumed that any function would, by default, return 'int' in the form of some value stored in the accumulator, and so the "value" of 'void' would be effectively represented by random garbage. The 'void' that was introduced explicitly in a later version of C weakened the original meaning of the unknown value by allowing pointers to 'void' and thus not requiring that the value pointed to must be always thought of as meaningless (since you could cast a pointer to void to a pointer to something else).
“Nothing” can easily be interpreted as an uninhabited type (regardless of its use in haskell).
> a fancy way to say void?
Less fancy and more workable. Had void been a proper type in the first place it would not have been needed (but also… void had the same issue as nothing, it sounds like an uninhabited type more than a unit type).
Despite that, they could have called it Void, even if the standard library normally uses all lowercase.
I think there is exactly one equivalence class of instances of std::monostate, whereas there are exactly zero equivalence classes of instances of void.
In category theory terms, I believe void is the initial type (there is exactly one morphism from void to any other type), whereas monostate is the terminal type (there is exactly one morphism from any other type to monotype).
I greatly prefer tag classes for this. You can define 'em in a single line and a downstream user can't accidentally plug the unset value in to the template.
You could say the same thing about std::monostate, which is not a dummy type. If you need a unique sentinel type you have to make one for that purpose.
There's a fun thing like this in Swift too! `Void` is an empty tuple, and has all of the related constraints (can't conform to protocols, being the most salient one). If you have a type that has to conform to, say, `Equatable` or `Codable` you should instead use `Never?` which you can conform to most protocols via throwing `fatalError` on an extension of `Never` to the protocol.
Good thing they made this instead of expanding the standard library to be more like Java's or Python's. It still only contains the most basic functions, and std::monostate ;)
It's poor name. If it has no members, it holds no bits. Therefore it has no state. It's not enough for an object to exist in order to hold state. It must be capable of distinguishing at least between two values, like true or false. If something doesn't hold state, the word state has no business in its name.
This thing is just a counterpart to void that is a class. A better name would be voidclass or something along those lines.
(There is a Monostate pattern, but that involves a class with state: just all the state is static. It's basically like a module with global variables. Completely different thing.)
Yes, exactly like 640K sounds like a power of two, if you're a MS-DOS user.
But in a way, this is right since:
<no of bits>
2 = <no of states>
When the number of bits is 0, 2^0 = 1: 1 state. A state machine with one state is certainly possible.
Problem is we need 2 or more states to do anything useful with state. We can draw an initial state bubble in a state diagram and not add any states; it can even have transitions back to itself.
So maybe monostate is not exactly a misnomer; it's just weird to mention state about something that is not useful for working with state.
Raymond is a very smart and productive person, and is not maligning C++ at all in this article. It makes me want to reassess my bittersweet perspective on the language.
On the flip side, it shows why so many people are excited about Rust. You can pick up literally any valuable thing off the ground that made the mistake of being built around C++, like CUDA, slap it on Rust, and people will both adopt it and be excited to contribute to it.
So... Isn't the fact that you can't default-construct that variant without `monostate` working-as-intended?
Not everything can or should be default-constructed or default-constructible. That does complicate initialization sometimes (i.e. there are practical reasons to "suspend" construction until you have all the needed data), but you're not avoiding breaking type safety by adding a `monostate`, you're just giving yourself the "could be null" headache.
Seems almost intentionally confusingly written. Why not use void if monostate is like void? Ah, because monostate is actually entirely unlike void. void has zero values, mono state exactly one.
void is kind of strange in C, because it looks like an empty type, but it still behaves as though it has a single instance (a unit type). For example, a function with an empty return type can’t return (it’d have to supply a value of it); a void function can. You can’t cast things to an empty type (otherwise you’d get a value of it), you can cast things to void. Void a unit type, not an empty type, it’s just a bad one.
It would be somewhat more cohesive and less weird if we'd argued that the "void" return type means that a return value cannot be constructed which is taken to mean that the function has no return value, and simply returns without providing a value.
Try as I may, I can’t make sense of that. I’ve read something like it in books on C, but I still can’t. Maybe I’m infected with set theory too deeply.
In my mind, a computation (a “function”) must either return a value or hang/crash. If it appears as though it returns a member of ∅, it must hang/crash, because there are no members of ∅. If it returns a member of the single-element set, [1], it can return one, there’s just no use inspecting it afterwards (you know what it is already).
(For what it’s worth, if you use a prover-adjacent language such as Agda or Idris, this is exactly how things are going to work there.)
C functions (and in fact the "functions" of most mainstream programming languages) are not computations, they are algorithms. An algorithm doesn't necessarily have a result of any kind, at least not in the way that a (mathematical) function has. The result of the algorithm can be the state in which it leaves the World (e.g. an algorithm for cleaning a house doesn't have a return value, it changes the state of the house).
In fact the traditional programming name for what we mostly call functions today was "(sub)routine" - you call a subroutine, and when it finishes, it returns to where it originally started.
Consider also that at the assembly level (and below it), subroutines don't have return values, nor arguments. The program counter simply jumps to the beginning address of the subroutine, and the `return` instruction jumps back to the address right after that jump. The subroutine may read values from certain locations in memory, and possibly write some others back, but none of this is necessary or enforced in any way. C functions, and the corresponding keywords, are much closer to this conception of assembly subroutines than they are to the mathematical notions of functions or computations.
The calling convention does say where to look for the return value. So in a sense the return value always exists, but would not be meaningful if the function has a void return type.
The calling convention is part of the abstraction, it's not part of the processor's logic. Different languages often have different calling conventions, on the same OS and processor family. Different OSs have different calling conventions for their system calls.
> a computation (a “function”) must either return a value or hang/crash.
It can also simply return control, without returning anything. It's equivalent to invoking a continuation with zero arguments. Do you allow for zero-argument functions, at least?
Of course, if a function can't return no value whatsoever, you suddenly need new syntactical categories to support it: you need to prohibit using such functions in an expressions (only call statements are allowed), you need a way to return from such a function (naked "return", which is prohibited from taking any expressions), and it's also severely strains your generics/templates because you can't treat such functions uniformly etc.
> It can also simply return control, without returning anything. It's equivalent to invoking a continuation with zero arguments. Do you allow for zero-argument functions, at least?
There's nothing weird about a function taking 0 arguments, any more than there's anything weird about a function taking 3 or 5 arguments. A function can't take an argument of type void, so functions shouldn't be allowed to return void either.
> Of course, if a function can't return no value whatsoever, you suddenly need new syntactical categories to support it: you need to prohibit using such functions in an expressions (only call statements are allowed), you need a way to return from such a function (naked "return", which is prohibited from taking any expressions), and it's also severely strains your generics/templates because you can't treat such functions uniformly etc.
Or you do what sensible languages do: all functions return a value (if they return at all), functions that don't want to return anything in particular can return a unit value but that value is just a normal value that behaves normally in expressions etc., you don't make naked return a special case (you can make it syntax sugar for "return unit" if you really want), and you can treat those functions uniformly in generics, much more easily than with C++ templates where you have to make special cases for void functions.
I don't argue with any of that, you know. But we do have a historical (stupid in retrospect) split between functions and procedures, dating a-a-a-all the way back to ALGOL 60 at least.
You encounter the same problem with zero argument functions.
A function on a zero type would be unable to return any value, since there's no value you could apply it to. To have a function of zero arguments you should use the unit type (which means the function effectively picks out a single value).
This is also related to how a function with multiple arguments is a function of the product type, and an empty product is 1 not 0.
No, you just invoke the function and pass it zero arguments, that's it.
Sure, you can build your whole theoretical framework of computation with only the functions of exactly one argument, and then deal with tuples to fake multi-valued arguments/multiple return values — but you don't have to do that. You may as well start from the functions with arbitrary (natural) number of arguments/return values, it's not that hard.
Sure, and passing it zero arguments is exactly what it means to evaluate it on the single value of the unit set.
I mean surely we can agree that a pure function of 0 arguments picks out exactly 1 value, and that a function that accepts n different values (values not arguments) as input returns at most n different results? Why make an exception for n=0?
Your definition of a function of 0 arguments and that of a function over the unit set are identical. Or at least equivalent.
They're equivalent, but only up to whatever computational substrate one is actually using. You can build functions out of small-step operational semantics of, say, a simplistic imperative register machine with a stack. In this case, a function of 0 arguments and a function of 1 trivial unit argument are visibly different even though their total effect on the state is the same. After all, we're talking about theory of computation and so it better be able to handle computations as they are actually performed at the low level, too.
It's yet another example of "in theory, the theory and the practice are the same; in practice, they're different": I have written a toy functional language that compiles down to C, and unit-removal (e.g. transforming int*()*int into int*int, lowering "fun f() -> whatever = ..." into a "whatever f(void) {...}" etc.) is a genuine optimization. The same, I imagine, would apply to generating raw assembly: you want to special-case handling of unit so that passing it as an argument would not touch %rdi, and assigning a unit to a value should not write any registers, and "case unit_producing_function(...) of -> ... end" actually has no data dependency on the unit_producing_function etc.
> In this case, a function of 0 arguments and a function of 1 trivial unit argument are visibly different even though their total effect on the state is the same. After all, we're talking about theory of computation and so it better be able to handle computations as they are actually performed at the low level, too.
You don't have to push anything for the unit values on the stack though, their representation doesn't take up any space. Just like if you're passing a big argument (like a large integer, or a struct passed by value) it might take multiple registers or a lot of space in your stack frame, there's no 1:1 correspondence between arguments and stack space.
> you want to special-case handling of unit so that passing it as an argument would not touch %rdi, and assigning a unit to a value should not write any registers
This shouldn't need to be a special case though. For each datatype you have a way of mapping it to/from some number of registers, and for the unit type that number of registers is 0 and that mapping is a no-op.
One of the way to represent them doesn't take up any space. But if you want to e.g. have a generic function of e.g. (T,U,V) -> V type (that is, with no monomorphization in your compiler) then either your units have to take space (otherwise the layout of (unit,int,bool) is different from (int,int,bool) and the same), or you'll have to pass around type descriptors when invoking generic functions. Which many, many implementors of static languages rather wouldn't do unless absolutely necessary.
> if you want to e.g. have a generic function of e.g. (T,U,V) -> V type (that is, with no monomorphization in your compiler) then either your units have to take space (otherwise the layout of (unit,int,bool) is different from (int,int,bool) and the same), or you'll have to pass around type descriptors when invoking generic functions.
You have that problem already though surely, as T might be bigger than an int, or want to be passed in a floating-point register, or both.
Yes, that's the problem you face when you're dealing strictly with functions of a single argument. Still, there are two options: first, it's arguably is already written into the register so you don't need to do anything.
Alternatively, you may instead represent () as a full, 64-bit wide machine word and then map every 64-bit value to mean () so, again, you don't actually need to write anything: all registers contain a valid representation of () at all times. This is similar to how we usually represent booleans: 0 is mapped to mean False, and everything else is mapped to mean True, although in this case we sometimes do need to rematerialize some definite value into the register of choice.
In any case, it's mostly just a matter of correctly writing the constant materializer; but if you adopt multi-argument/multi-valued functions you simply never encounter this problem:
for arg, place in zip(arg_list, arg_places):
load(arg, place)
invoke(fun, kont)
for val, place in zip(ret_values, ret_places):
load(val, place)
kontinue(kont)
Notice how degenerate loops simply disappear with no additional handling.
> if you adopt multi-argument/multi-valued functions you simply never encounter this problem
Sure. If you have multiple return then unit values become a lot less important because you can just have a function that returns 0 values. But most languages, especially C-like languages, do not have multiple return.
The word "function" in mathematics and the word "function" in C (and C++) are homonyms. Two words spelled the same and pronounced the same but with entirely different meanings. Any effort to conflate the two will just end in tragedy.
It doesn't make sense from a strict type and set theory point of view because it doesn't make sense from a strict type and set theory point of view. Neither C nor C++ are rigorous languages.
We also have "void foo(void)" and here void takes on two entirely different meanings, while type theory would suggest this is a function that diverges if it were called, which you can't.
"If it appears as though it returns a member of ∅, it must hang/crash, because there are no members of ∅."
If you want to go with math, think more group theory. I'm specifically thinking about how you can always create a monoid if you have an associative binary operation, because even if your associative binary operation doesn't have an identity element, you can just declare one. Similarly, if you have "functions" that "return nothing", you just declare that nothing right into existence. Then you can just think of the C language layer basically erasing away any attempt to examine that value returned behind the scenes, because as you say, why?
It's not a function. A C function "returning" void is just C's syntax for writing a procedure. C doesn't call it a procedure, but that's what it is semantically.
The distinction between "function" and "procedure" doesn't map very well to whether the return type is void in C's syntax:
- On one hand, a lot of "functions" are actually procedures that just happen to return a value: think for example `write(2)`, which is clearly used for its side effect, not to compute how many bytes could be written -- even though that's what it returns.
- On the other hand, you can have a "procedure" (i.e., a function "returning" void) that actually has no side effects other than storing a computed value in a specified location (e.g. void square(int x, int *ret) { *ret = x*x; }). That's clearly a function in the mathematical sense, even though it "returns void".
The difference between a function and a procedure is strictly whether they can be used as a value or not in the syntax of the language, not the semantics. A function is a subroutine that can be used as either a value or a statement, a procedure is a subroutine that can only be used as a statement. That is, if x = foo(); is valid syntax, then foo() is by definition a function. If x = foo(); is invalid syntax, then foo() is a procedure.
> (For what it’s worth, if you use a prover-adjacent language such as Agda or Idris, this is exactly how things are going to work there.)
You don't even have to go that far, Rust supports this concept. The built-in empty type is called `!`, and cannot be constructed. It's partially unstable, and there's a bunch of things you can't do yet, but you can use it as a return type.
Because you can't use void in templates uniformly: you can't have "T x;" when T = void, you can't do "return T{};", you can't form types like "(*R)(int, T)", etc.
Having access to the unit type is useful; I use it maybe once every couple of months in F# even if we restrict solely to the use-case "instantiate a generic with the unit type to indicate that no data is held here". (Of course, since F# also doesn't have a `void` type - a truly non-constructible data type is indeed very rarely useful! - F# uses `unit` in many places where C++ and C# use `void`.)
The size of an optional<T> is `sizeof T + 1` because it needs a boolean for the set/unset flag.
The size of a variant is… the same, at least, because it needs to store a discriminator integer (apparently both gcc and clang do optimise to a byte when there are less than 256 variants).
Or do you mean adding a monostate to an existing variant rather than wrap that variant into an `optional`? In which case you should fix your example (and provide a version wrapped in an optional) because they’re very confusing.
I don't understand what's confusing with my example aha.
Current situation: you have
std::variant<int, whatever> x;
you now want to discriminate on whenever x has been initialized explicitly or not, the two cases I posted:
// case 1
std::variant<int, whatever> x;
// case 2
std::variant<int, whatever> x = 0;
you have two options:
option A: does not needlessly increase sizeof, does not add indirection penalty upon access:
std::variant<std::monostate, int, whatever> x;
option B: needlessly increases sizeof, adds indirection penalty upon access (mostly relevant when compiling in debug mode without inlining if you still want to have a semblance of performance):
Parton my naiveté, but if you wanted a value that's either a thing, or not-a-thing, why wouldn't you express that with a std::optional? What advantage does std::monostate have versus option types?
Brevity, I guess? I suppose the most brief way to express absence or presence of value in C++ is a pointer, but I could be tried at the Hague for that take.
None of this is to be facetious, I'm really trying to learn here. I'm not a C++ guy by trade. C and Rust are my day-to-day languages.
You're not wrong, if you just want "a T or nothing", optional is the way to go. But what if you want "a T, a U, a V, or nothing"? Then you do
std::variant<std::monostate, T, U, V>
Or
std::optional<std::variant<T, U, V>>
But then the "none" state is a bit more "special", it's not just one of the options. The sizeof of the type will also probably be a bit bigger, because it has to contain both the bool for the optional, as well as the tag for the variant.
The other obvious use is what the article states, that it allows the variant to be default constructed even if none of its members are. Though you can do that with optional as well. It's mostly a matter of style. I mostly avoid std::monostate because the name is so confusing, I really agree with the other users that something like std::unit or std::none would be better.
> it allows the variant to be default constructed even if none of its members are
Ah, okay! So it's serving the same purpose as the "None" in this Rust snippet:
enum Foo {
None, // <-- impl core::Default and return this
T(T),
U(U),
V(V),
}
That makes sense, though it really shows how the legacy of C++'s type system limits stdlib design in the present day. It'd be awfully nice to be able to just write std::variant<void, T, U, V>.
Exactly. The difference here is that std::variant doesn’t “name” the options, you just get the type (and an index, so variant can contain duplicate types). So you have to have some marker type that’s like “this contains nothing”.
Fundamentally, Rust’s enums are a much better way of doing this thing compared to std::variant, but C++ did the best it could without changing the core language. Which arguably they should have done.
That is opposite of sense. monostate is a much more general concept, if you want to argue about the need for a new type, the answer is to remove optional because it’s a `variant<T, monostate>`. It doesn’t work the other way around.
That's putting the cart before the horse. Languages are designed for developers to use, not compilers to optimize. The intention and use of std::optional is clear.
Edit: My point, if not clear, is that the compiler should add the extra bit of code for when an optional is empty automatically, rather than requiring that an optional be defined (it's optional!) unless it is explicitly typed as monostate.
That makes even less sense, what "extra bit of code" are you talking about? monostate is not designed to be used with optional: although there's the odd case where that's useful an optional<monostate> is bijective to a boolean, and because C++ does not have zero-sized types it's less efficient (as it takes two bytes).
It has other uses for end-user developers. For example, when you need a class member to be conditionally elided based on template parameters. You can swap the normal type with non-zero size with std::monostate such that it has zero size.
I would've thought that the C++ template class would be just a marker interface to use on a monostate, so that users of the class know that it has shared state. But it seems like usage patterns in the article are very different from that, and all the comments here are ignorant of the history of the monostate pattern and befuddled at its intended usage. Maybe it was added to the standard by someone familiar with the design pattern, but they didn't do a good job with education and documentation to explain to everyone else what it was for?