From page 19: > But all those other languages include explicit support for null ...

dllthomas · on Sept 9, 2015

A null pointer is a perfectly sensible implementation of an optional type.

jblandy · on Sept 9, 2015

And indeed, it's the representation Rust uses. When you have the type Option<T>, if Rust knows that a value of type T can never be zero (as is the case with pointer types), then the compiler uses zero as the machine-level representation for None, which indicates the absence of a value. I get into that a few paragraphs later.

What's important is that you can't use a value of type Option<T> as if it were T; you have to check it first. This is helpful for non-pointer types as well; I include an example of that.

Veedrac · on Sept 9, 2015

It's different from a traditional Option type, though, in that Option<Option<T>> is inexpressible, and that the distinction between map and flat_map is reduced.

jamii · on Sept 9, 2015

In case anyone thinks that's an academic point, this is exactly the reason who so many dynamic languages can't tell the difference between 'key is not in hashtable' and 'key is in hashtable with value null'. I constantly get bitten by this and similar confusions when writing javascript or clojure.

dllthomas · on Sept 9, 2015

Removing a layer of indirection when you had a reference type anyway is an optimization that runs into the issue that you describe. The obvious solution is to not apply it when null is already a valid inhabitant of that reference type.

Without collapsing that indirection, Option<Option<T>> is perfectly expressible:

    Just (Just foo): <ptr> -> <ptr> -> foo
    Just Nothing: <ptr> -> <0>
    Nothing: <0>

It's true that some languages expose "nullable references" as a distinct type and not a detail of representation, and it's true (... and sometimes obnoxious) that this doesn't layer as nicely (in particular, it's not a functor).

jblandy · on Sept 9, 2015

It may be worth pointing out that the `Option<T>` type in Rust doesn't inherently involve any pointers at all, assuming `T` is not a pointer type itself.

For example, `Option<i32>` is probably (the compiler gets to choose) going to be represented as two four-byte values: the discriminant, which distinguishes the `Some` and `None` cases, and then a space for the value `v`, for when the discriminant says we have `Some(v)`. Since zero is a perfectly fine value for an `i32`, we have to store the discriminant separately.

But note that this is just a flat eight-byte value. There's no heap allocation involved. It's just as if you'd written in C:

    struct O { enum { Some, None } discriminant; int32_t value };

I compiled a program that uses `Option<Option<i32>>`, and looked at the DWARF debugging info to see what the compiler did with it. It seems to represent this as a twelve-byte value: four bytes for the discriminant for the outer `Option`, followed an eight-byte `Option<i32>` value laid out as before. Since you can get the address of a value held by an enum, I guess this makes sense; the compiler can't combine the discriminants or do anything clever like that.

SamReidHughes · on Sept 10, 2015

It could combine the discriminants, I think:

Option<i32> could have discriminant values 0 or 1, and Option<Option<i32>> could use discriminant value 2 for None, and 0 or 1 mean the object's an Option<i32>.

dllthomas · on Sept 10, 2015

Yeah, while the ability to provide a pointer seems meaningful, it might be illusory. If our value is None, then a pointer to the inside Option<i32> is... what?

Edited to add: With your proposed encoding, the memory contents of an Option<Option<i32>> is identical to an Option<i32> precisely when there is an Option<i32> to speak about. And when there is no Option<i32>, you can tell that with one comparison, rather than by backing out an unknown number of levels. I like it.

jblandy · on Sept 10, 2015

Yeah, that's a nice trick. I wonder how general it is, though.

SamReidHughes · on Sept 10, 2015

If one alternative is larger than the others and it has an enum tag or pointer inside of it, such that you can squeeze the other alternatives before/after that tag word, you're good to go. If you had multiple alternatives of maximum size, they'd need some word in them, at the same offset/size, such that they don't have conflicting representations. An enum tag could overlap with a non-null pointer, and two enum tags with user-declared tag values (such that they do not overlap) could work too. Or if you're crazy you could discard having each type's representation be computed solely as a function of what the type is made of, choosing their representations based on how well they pack into other types. (Or worse yet, if you've got a borrow checker and your types are memcpyable, you could pack things and rejigger bytes however you wanted and then unfurl them into a temporary whenever something uses the interior value, unless I'm overlooking something. So if you had Either<Either<u32, u32>, Either<u32, u32>>, you could represent that with a single tag whose value is 0, 1, 2, or 3, and when if the tag's 2 or 3 and you make a reference to the interior value, either fix the tag in-place and fix it back when you're done (if you're the sole owner of the Either<Either...>), or copy the value out and let the borrower borrow that (and copy it back in when it's done, if it was a mutable borrow).)

AnimalMuppet · on Sept 9, 2015

> It's different from a traditional Option type, though, in that Option<Option<T>> is inexpressible.

I don't think so. It's just a pointer to a pointer, rather than a single pointer. That is,

  int **value;

and either

   value == null

or

  *value == null

or

  **value == some value of interest

tomp · on Sept 9, 2015

Hm... not really.

Say you have a function, `f : (bool, T) -> Option<T>`.

    function f<T>(cond : bool, value : T) : Option<T> {
      if cond {
        return value                 // equivalent: Some(value)
      } else {
        return null                  // equivalent: None
      }
    }

    let x = f(false, f(false, 1))    // equivalent: None
    let y = f(true, f(false, 1))     // equivalent: Some(None)

There is no way to tell apart `x` and `y`.

dllthomas · on Sept 9, 2015

You are confusing types and representations, and your function is not well typed.

tomp · on Sept 9, 2015

> You are confusing types and representations

Hm... not sure if I am. Or, at least, `null or T` is not a valid representation for type `Option<T>`. It's all very confusing.

> and your function is not well typed.

How so?

dllthomas · on Sept 9, 2015

The `value` you are returning has type T. T is not the same type as Optional<T>.

With AnimalMuppet's approach, you would need to return a pointer to a T (so, `&value` in something Cish). Note that this is not the same thing as a nullable reference in, for instance, C#.

`null or T` is a valid representation only when T is a non-nullable reference. It doesn't work when T is optional, and it also doesn't work when T is a primitive type, a struct, &c.

That said, there's obvious reasons it might be wanted an optimization where it's possible. For that specific case, `Just value` could be specialized to `value`, but that can be done by the compiler (akin to automatic unboxing, elsewhere).

sanderjd · on Sept 9, 2015

Yeah, I'm curious about this too. It seems like (continuing to use Rust parlance) your three examples are (respectively) `None`, `Some(None)`, and `Some(Some(some value of interest))`. There is no `None(None)` case, so there's no problem.

Veedrac · on Sept 9, 2015

This is fair. I was thinking of this in the context of GC'd languages, where one instead has nullable types (or sentinel null values). Eg. Java, C#, Python, Ruby, Scala.

The C++ case where a distinction between `foo = null` and `*foo = null` exists is indeed much closer to a an option type. You're right to point it out.

steveklabnik · on Sept 9, 2015

And is in fact how Rust represents Option<T>. There's also a trait, http://doc.rust-lang.org/stable/core/nonzero/struct.NonZero.... , which can be used to ensure a structure you build also has said representation.