Type-Level Programming in Rust

couchand · on Oct 5, 2020

I'm immediately turned off by the entirely gratuitous use of unsafe here. I clicked through to the linked Reddit discussion and I'm not convinced the author has really thought it through. Writing, "pedagogically, that's irrelevant to the post, so I didn't highlight it" misunderstands the pedagogy here.

However, there's some really neat stuff later, so it's wise to grit your teeth and get past it.

In case anyone else had the same reaction, here's an entirely-safe version of the first snippet that you can pretend replaced the original:

    // Each state is a unique type
    struct Receiving;
    struct Sending;

    // The state machine is parameterized by the state
    struct Channel<State> {
      chan: ...,
      _state: PhantomData<State>
    }

    // Methods for the state are uniquely associated with only the state
    impl Channel<Receiving> {
      // recv consumes ownership, ensuring old state is invalidated
      fn recv(mut self) -> (Channel<Sending>, String) {
        let Channel { chan, .. } = self;
        let msg = chan.recv();
        // The state type changes after executing a transition
        let next = Channel { chan, _state: PhantomData };
        (next, msg)
      }
    }

    impl Channel<Sending> {
      fn send(mut self, msg: String) -> Channel<Receiving> {
        let Channel { chan, .. } = self;
        chan.send(msg);
        Channel { chan, _state: PhantomData }
      }
    }

    #[test]
    fn channel_test() {
      let c: Channel<Sending> = Channel::new();
      let c: Channel<Receiving> = c.send("hi");
      let (c, msg) = c.recv();
      // and so on
    }

chrismorgan · on Oct 5, 2020

Further to this: transmute is really pretty dangerous, and you’ve got to be very careful if you use it, or you’ll invoke undefined behaviour. And I believe the transmutes in the original article do invoke undefined behaviour, since the layout of the struct is undefined (you’d need to use something like `#[repr(C)]` on the struct to make it be defined). In practice, these particular examples are at least 99.9999% sure to do the right thing in all versions of Rust ever, but you just shouldn’t ever do this—you’re playing with delayed-action napalm. With his background, I feel that Will should have known better than to use unsafe here in this way: because it’s wrong.

Admittedly the SecureVec example is more understandable: because the generic is passed through to Item as well, you’ll need to convert the Vec<Item<T, ItemLevel>> to a Vec<Item<T, MaxLevel<ItemLevel, VecLevel>>>, which… well, it can be done readily enough without unsafe code, by consuming the Vec, mapping each item and collecting to a new vec, but short of a kind of specialisation magic that I don’t think will kick in in this case, that’s going to entail allocating a new vector, so that I would be inclined to use unsafe code here for efficiency; but care should still be taken that it doesn’t invoke undefined behaviour.

But one more word in defence of the transmutes: it makes it obvious that it’s either zero-cost or just a copy, whereas more involved things can become subject to compiler optimisations, so that you don’t know that the whole supposedly-noop transformation will be being optimised out of existence. That confidence and clearly-stated intent is valuable.

evincarofautumn · on Oct 5, 2020

Has there been any work in Rust on something like Haskell’s ‘Coercible’? Namely, it defines the precise criteria for safe zero-cost coercions between types that provably have the same representation, and derives the coercions automatically. It guarantees that ‘coerce’ is identical to ‘fmap coerce’, which saves the cost of that exact traverse+reconstruct pattern to change out a phantom type parameter.

There are some deficiencies, most prominently that the “role” system is aggressively monomorphic, so you can’t always prove that two things are coercible—the compiler must conservatively assume that type parameters with unknown roles cannot be coerced—but overall it’s much better than what we had before, and I expect it’ll continue to improve.

steveklabnik · on Oct 5, 2020

We have a project group working on "safe transmute", here's their first RFC https://github.com/rust-lang/rfcs/pull/2981

It cites the Haskell work: https://github.com/jswrenn/rfcs/blob/safer-transmute/text/00...

evincarofautumn · on Oct 5, 2020

Cool, thanks!

I’m thinking about a similar sort of coercion system for a language project, and it’s good to have a slightly broader view of the design space. This area hasn’t been very well explored yet, not just in terms of semantics, but also developer ergonomics: most of the time you want to hide representations for convenience, but when considering safe coercions, you really want to make it possible to talk about them explicitly & precisely.

This feature is also very subtle, and right at the very edge of the type system, so it’s extremely easy to break soundness here, usually by failing to consider things like variance. The reason “roles” were added to GHC in the first place isn’t just because of the issue with type families / associated types mentioned in that RFC; it’s also because anything that provides a safe API atop unsafe internals could also be used to violate soundness, especially in conjunction with mutation. The classic examples are: 1. coercing “container of A” to “container of B” using the coercion from A to B, then inserting a B that is not a valid A, either in its value or just because A and B have incompatible trait instances (like ordering or hashing). You need to be able to disallow conversions between things even if they have identical representations.

kd5bjo · on Oct 5, 2020

> In practice, these particular examples are at least 99.9999% sure to do the right thing in all versions of Rust ever

I’m pretty sure that declaring both `Item` and `SecureVec` as `#[repr(transparent)]` would cover that last 0.0001% chance: it protects against the extra fields becoming non-zero-size by accident and forces the binary representation to be identical to the first field.

At that point, he’s transmuting between things with the same representation; all that’s left to verify is semantics.

chrismorgan · on Oct 5, 2020

I knew there was another #[repr(…)] value, just couldn’t remember its name and didn’t look it up. Thanks for pointing it out. Yep, #[repr(transparent)] is a correct way of fixing the undefined behaviour—and if it won’t compile, then #[repr(C)] will do.

wcrichton · on Oct 5, 2020

I didn't know about #[repr(transparent)], I will add that to the blog post. Thank you!

wcrichton · on Oct 5, 2020

Hi, author of the code here. Why do I use mem::transmute? To highlight the fact that when using typestate, it's useful to understand operations that change the the value of an object as distinct from those that change the type of an object. It's also a matter of efficiency -- you don't have to generate a new struct every time you change state.

You can avoid the overhead without transmute, but it's still unsafe and not easily explained: https://github.com/Munksgaard/session-types/commit/3a310d205...

kevincox · on Oct 5, 2020

Generating new structs is free in Rust. There is only cost if you need to send it somewhere and the layout differs from the original.

Since `transmute` will be undefined behaviour if the layout differs you will only be saving cost if you would be invoking undefined behaviour.

I recommend checking the generated asm and reporting a missed-optimization bug if it differs.

couchand · on Oct 5, 2020

In response to a challenge about the use of unsafe in the very first example of a blog post that you acknowledge doesn't need it, you respond with a defense of the code in some library. The differences between those contexts is massive; this seems central to the misunderstanding here.

By applying a pattern that should rightly raise alarm bells for any seasoned Roast coder, you've distracted from your goal. By linking to a Reddit discussion rather than editing your post to clarify it, you've done your readers a disservice by drawing them into this muck.

lilyball · on Oct 5, 2020

Why can’t that just be written

  fn cast<E2, P2>(self) -> Chan<E2, P2> {
    Chan(self.0, self.1, PhantomData)
  }

guitarbill · on Oct 5, 2020

Thank you for going to the trouble of posting the code. I kind of agree with your point here, especially because the safe version is hardly more arduous.

jzoch · on Oct 5, 2020

Im not disagreeing with you but fyi will crichton definitely knows what he is talking about. He teaches Programming Language courses at Stanford and his brother is the all-time top contributor to Rust. So, pedagogy + rust is sort of his thing.

In general, this series of posts has been more about types than about rust so I don't blame him for ignoring unsafe. It isnt relevant (its also not a best practice!)

couchand · on Oct 5, 2020

Teachers are just like everyone else: they're fallible and half of them are below average.

db48x · on Oct 6, 2020

Well, teachers are a subset of humanity and thus in principle they could all be above the overall average for humanity. It's even intended that there be a selection bias, so hopefully most of them are at least above average at teaching if nothing else. Of course half of them are worse than the average teacher, but that's not very useful information.

monadic2 · on Oct 5, 2020

> his brother is the all-time top contributor to Rust.

Say no more! Any brother of such renowned talent is sure to know what they're talking about.

jswrenn · on Oct 5, 2020

A few months later, Will released Tyrade, an impressive macro-based DSL for type-level programming in Rust: https://github.com/willcrichton/tyrade

The README uses some of the same motivating examples as the blog post.

Type-level programming absolutely has its place. Rust's `typenum` and `generic_array` crates enabled projects like `nalgebra` long before const generics. I recently used type-level programming to prototype an extension to Rust's type system — that's now an RFC (#2981)!

mateus_amin · on Oct 5, 2020

That is really cool. Would "Use types as data primitives" be a good way to describe Tyrade?

dathinab · on Oct 5, 2020

Just please don't overdo type-level programming, especially not if it involves trait bounds and wildcard impl.

Finding the right balance between using type-level programming to ensure invariants and not using it is a important skill.

ragnese · on Oct 6, 2020

Especially so with a language like Rust, IMO. Its type system is powerful enough that you kind of can do it, but it gets awkward pretty quickly. In some ways that's a dangerous combo!

Arnavion · on Oct 5, 2020

See also this comment where I showed a way to doing such a thing using constants for the type parameters instead of types, using the currently-unstable "const generics" feature.

https://news.ycombinator.com/item?id=22747405

kevincox · on Oct 5, 2020

One minor note is that I don't think PhantomData is required here.

  _state: PhantomData<State>

Since State is zero-sized it is free to store it in the struct.

  state: State

IIUC PhantomData is only required when you need to indicate that you are logically storing something of that type but don't want to actually store it. Since storing is free in this case you may as well just store it.

lilyball · on Oct 5, 2020

This sounds reasonable, but personally I prefer the PhantomData as it makes it immediately obvious to the reader that this is a zero-sized type.

charlieflowers · on Oct 5, 2020

FYI if the author is around. All the code blocks render as one continuous line for me (no line breaks)

Tried it in both Firefox and Chrome.

zanellato19 · on Oct 5, 2020

It was the same for me and then rendered correctly after a refresh. Weird

seanmceligot · on Oct 5, 2020

The SecureVec is nice. I've been waiting to see this become more common. Reminds me of this talk at ICFP 2010 on compile time Authorization checks.

http://jamiemorgenstern.com/papers/ml10sectyp.pdf