> But all those other languages include explicit support for null pointers for a good reason: they’re extremely useful. [....] The problem with null pointers is that it’s easy to forget to check for them.
There is no inherent reason for that; just that mainstream languages which support `null` haven't been checking `null` usage. Some of the recent ones do. For example, Kotlin has non-nullable types by default, and null types have to be explicitly marked and checked for null-ness.
Even for Java, null analysis is built into Eclipse with the help of annotations. Though, ofcourse, it would be far nicer to have it baked into the language.
And indeed, it's the representation Rust uses. When you have the type Option<T>, if Rust knows that a value of type T can never be zero (as is the case with pointer types), then the compiler uses zero as the machine-level representation for None, which indicates the absence of a value. I get into that a few paragraphs later.
What's important is that you can't use a value of type Option<T> as if it were T; you have to check it first. This is helpful for non-pointer types as well; I include an example of that.
It's different from a traditional Option type, though, in that Option<Option<T>> is inexpressible, and that the distinction between map and flat_map is reduced.
In case anyone thinks that's an academic point, this is exactly the reason who so many dynamic languages can't tell the difference between 'key is not in hashtable' and 'key is in hashtable with value null'. I constantly get bitten by this and similar confusions when writing javascript or clojure.
Removing a layer of indirection when you had a reference type anyway is an optimization that runs into the issue that you describe. The obvious solution is to not apply it when null is already a valid inhabitant of that reference type.
Without collapsing that indirection, Option<Option<T>> is perfectly expressible:
Just (Just foo): <ptr> -> <ptr> -> foo
Just Nothing: <ptr> -> <0>
Nothing: <0>
It's true that some languages expose "nullable references" as a distinct type and not a detail of representation, and it's true (... and sometimes obnoxious) that this doesn't layer as nicely (in particular, it's not a functor).
It may be worth pointing out that the `Option<T>` type in Rust doesn't inherently involve any pointers at all, assuming `T` is not a pointer type itself.
For example, `Option<i32>` is probably (the compiler gets to choose) going to be represented as two four-byte values: the discriminant, which distinguishes the `Some` and `None` cases, and then a space for the value `v`, for when the discriminant says we have `Some(v)`. Since zero is a perfectly fine value for an `i32`, we have to store the discriminant separately.
But note that this is just a flat eight-byte value. There's no heap allocation involved. It's just as if you'd written in C:
struct O { enum { Some, None } discriminant; int32_t value };
I compiled a program that uses `Option<Option<i32>>`, and looked at the DWARF debugging info to see what the compiler did with it. It seems to represent this as a twelve-byte value: four bytes for the discriminant for the outer `Option`, followed an eight-byte `Option<i32>` value laid out as before. Since you can get the address of a value held by an enum, I guess this makes sense; the compiler can't combine the discriminants or do anything clever like that.
Option<i32> could have discriminant values 0 or 1, and Option<Option<i32>> could use discriminant value 2 for None, and 0 or 1 mean the object's an Option<i32>.
Yeah, while the ability to provide a pointer seems meaningful, it might be illusory. If our value is None, then a pointer to the inside Option<i32> is... what?
Edited to add: With your proposed encoding, the memory contents of an Option<Option<i32>> is identical to an Option<i32> precisely when there is an Option<i32> to speak about. And when there is no Option<i32>, you can tell that with one comparison, rather than by backing out an unknown number of levels. I like it.
If one alternative is larger than the others and it has an enum tag or pointer inside of it, such that you can squeeze the other alternatives before/after that tag word, you're good to go. If you had multiple alternatives of maximum size, they'd need some word in them, at the same offset/size, such that they don't have conflicting representations. An enum tag could overlap with a non-null pointer, and two enum tags with user-declared tag values (such that they do not overlap) could work too. Or if you're crazy you could discard having each type's representation be computed solely as a function of what the type is made of, choosing their representations based on how well they pack into other types. (Or worse yet, if you've got a borrow checker and your types are memcpyable, you could pack things and rejigger bytes however you wanted and then unfurl them into a temporary whenever something uses the interior value, unless I'm overlooking something. So if you had Either<Either<u32, u32>, Either<u32, u32>>, you could represent that with a single tag whose value is 0, 1, 2, or 3, and when if the tag's 2 or 3 and you make a reference to the interior value, either fix the tag in-place and fix it back when you're done (if you're the sole owner of the Either<Either...>), or copy the value out and let the borrower borrow that (and copy it back in when it's done, if it was a mutable borrow).)
The `value` you are returning has type T. T is not the same type as Optional<T>.
With AnimalMuppet's approach, you would need to return a pointer to a T (so, `&value` in something Cish). Note that this is not the same thing as a nullable reference in, for instance, C#.
`null or T` is a valid representation only when T is a non-nullable reference. It doesn't work when T is optional, and it also doesn't work when T is a primitive type, a struct, &c.
That said, there's obvious reasons it might be wanted an optimization where it's possible. For that specific case, `Just value` could be specialized to `value`, but that can be done by the compiler (akin to automatic unboxing, elsewhere).
Yeah, I'm curious about this too. It seems like (continuing to use Rust parlance) your three examples are (respectively) `None`, `Some(None)`, and `Some(Some(some value of interest))`. There is no `None(None)` case, so there's no problem.
This is fair. I was thinking of this in the context of GC'd languages, where one instead has nullable types (or sentinel null values). Eg. Java, C#, Python, Ruby, Scala.
The C++ case where a distinction between `foo = null` and `*foo = null` exists is indeed much closer to a an option type. You're right to point it out.
> But all those other languages include explicit support for null pointers for a good reason: they’re extremely useful. [....] The problem with null pointers is that it’s easy to forget to check for them.
There is no inherent reason for that; just that mainstream languages which support `null` haven't been checking `null` usage. Some of the recent ones do. For example, Kotlin has non-nullable types by default, and null types have to be explicitly marked and checked for null-ness.
Even for Java, null analysis is built into Eclipse with the help of annotations. Though, ofcourse, it would be far nicer to have it baked into the language.