I'm immediately turned off by the entirely gratuitous use of unsafe here. I clicked through to the linked Reddit discussion and I'm not convinced the author has really thought it through. Writing, "pedagogically, that's irrelevant to the post, so I didn't highlight it" misunderstands the pedagogy here.
However, there's some really neat stuff later, so it's wise to grit your teeth and get past it.
In case anyone else had the same reaction, here's an entirely-safe version of the first snippet that you can pretend replaced the original:
// Each state is a unique type
struct Receiving;
struct Sending;
// The state machine is parameterized by the state
struct Channel<State> {
chan: ...,
_state: PhantomData<State>
}
// Methods for the state are uniquely associated with only the state
impl Channel<Receiving> {
// recv consumes ownership, ensuring old state is invalidated
fn recv(mut self) -> (Channel<Sending>, String) {
let Channel { chan, .. } = self;
let msg = chan.recv();
// The state type changes after executing a transition
let next = Channel { chan, _state: PhantomData };
(next, msg)
}
}
impl Channel<Sending> {
fn send(mut self, msg: String) -> Channel<Receiving> {
let Channel { chan, .. } = self;
chan.send(msg);
Channel { chan, _state: PhantomData }
}
}
#[test]
fn channel_test() {
let c: Channel<Sending> = Channel::new();
let c: Channel<Receiving> = c.send("hi");
let (c, msg) = c.recv();
// and so on
}
Further to this: transmute is really pretty dangerous, and you’ve got to be very careful if you use it, or you’ll invoke undefined behaviour. And I believe the transmutes in the original article do invoke undefined behaviour, since the layout of the struct is undefined (you’d need to use something like `#[repr(C)]` on the struct to make it be defined). In practice, these particular examples are at least 99.9999% sure to do the right thing in all versions of Rust ever, but you just shouldn’t ever do this—you’re playing with delayed-action napalm. With his background, I feel that Will should have known better than to use unsafe here in this way: because it’s wrong.
Admittedly the SecureVec example is more understandable: because the generic is passed through to Item as well, you’ll need to convert the Vec<Item<T, ItemLevel>> to a Vec<Item<T, MaxLevel<ItemLevel, VecLevel>>>, which… well, it can be done readily enough without unsafe code, by consuming the Vec, mapping each item and collecting to a new vec, but short of a kind of specialisation magic that I don’t think will kick in in this case, that’s going to entail allocating a new vector, so that I would be inclined to use unsafe code here for efficiency; but care should still be taken that it doesn’t invoke undefined behaviour.
But one more word in defence of the transmutes: it makes it obvious that it’s either zero-cost or just a copy, whereas more involved things can become subject to compiler optimisations, so that you don’t know that the whole supposedly-noop transformation will be being optimised out of existence. That confidence and clearly-stated intent is valuable.
Has there been any work in Rust on something like Haskell’s ‘Coercible’? Namely, it defines the precise criteria for safe zero-cost coercions between types that provably have the same representation, and derives the coercions automatically. It guarantees that ‘coerce’ is identical to ‘fmap coerce’, which saves the cost of that exact traverse+reconstruct pattern to change out a phantom type parameter.
There are some deficiencies, most prominently that the “role” system is aggressively monomorphic, so you can’t always prove that two things are coercible—the compiler must conservatively assume that type parameters with unknown roles cannot be coerced—but overall it’s much better than what we had before, and I expect it’ll continue to improve.
I’m thinking about a similar sort of coercion system for a language project, and it’s good to have a slightly broader view of the design space. This area hasn’t been very well explored yet, not just in terms of semantics, but also developer ergonomics: most of the time you want to hide representations for convenience, but when considering safe coercions, you really want to make it possible to talk about them explicitly & precisely.
This feature is also very subtle, and right at the very edge of the type system, so it’s extremely easy to break soundness here, usually by failing to consider things like variance. The reason “roles” were added to GHC in the first place isn’t just because of the issue with type families / associated types mentioned in that RFC; it’s also because anything that provides a safe API atop unsafe internals could also be used to violate soundness, especially in conjunction with mutation. The classic examples are: 1. coercing “container of A” to “container of B” using the coercion from A to B, then inserting a B that is not a valid A, either in its value or just because A and B have incompatible trait instances (like ordering or hashing). You need to be able to disallow conversions between things even if they have identical representations.
> In practice, these particular examples are at least 99.9999% sure to do the right thing in all versions of Rust ever
I’m pretty sure that declaring both `Item` and `SecureVec` as `#[repr(transparent)]` would cover that last 0.0001% chance: it protects against the extra fields becoming non-zero-size by accident and forces the binary representation to be identical to the first field.
At that point, he’s transmuting between things with the same representation; all that’s left to verify is semantics.
I knew there was another #[repr(…)] value, just couldn’t remember its name and didn’t look it up. Thanks for pointing it out. Yep, #[repr(transparent)] is a correct way of fixing the undefined behaviour—and if it won’t compile, then #[repr(C)] will do.
Hi, author of the code here. Why do I use mem::transmute? To highlight the fact that when using typestate, it's useful to understand operations that change the the value of an object as distinct from those that change the type of an object. It's also a matter of efficiency -- you don't have to generate a new struct every time you change state.
In response to a challenge about the use of unsafe in the very first example of a blog post that you acknowledge doesn't need it, you respond with a defense of the code in some library. The differences between those contexts is massive; this seems central to the misunderstanding here.
By applying a pattern that should rightly raise alarm bells for any seasoned Roast coder, you've distracted from your goal. By linking to a Reddit discussion rather than editing your post to clarify it, you've done your readers a disservice by drawing them into this muck.
Thank you for going to the trouble of posting the code. I kind of agree with your point here, especially because the safe version is hardly more arduous.
Im not disagreeing with you but fyi will crichton definitely knows what he is talking about. He teaches Programming Language courses at Stanford and his brother is the all-time top contributor to Rust. So, pedagogy + rust is sort of his thing.
In general, this series of posts has been more about types than about rust so I don't blame him for ignoring unsafe. It isnt relevant (its also not a best practice!)
Well, teachers are a subset of humanity and thus in principle they could all be above the overall average for humanity. It's even intended that there be a selection bias, so hopefully most of them are at least above average at teaching if nothing else. Of course half of them are worse than the average teacher, but that's not very useful information.
The README uses some of the same motivating examples as the blog post.
Type-level programming absolutely has its place. Rust's
`typenum` and `generic_array` crates enabled projects like `nalgebra` long before const generics. I recently used type-level programming to prototype an extension to Rust's type system — that's now an RFC (#2981)!
Especially so with a language like Rust, IMO. Its type system is powerful enough that you kind of can do it, but it gets awkward pretty quickly. In some ways that's a dangerous combo!
See also this comment where I showed a way to doing such a thing using constants for the type parameters instead of types, using the currently-unstable "const generics" feature.
One minor note is that I don't think PhantomData is required here.
_state: PhantomData<State>
Since State is zero-sized it is free to store it in the struct.
state: State
IIUC PhantomData is only required when you need to indicate that you are logically storing something of that type but don't want to actually store it. Since storing is free in this case you may as well just store it.
However, there's some really neat stuff later, so it's wise to grit your teeth and get past it.
In case anyone else had the same reaction, here's an entirely-safe version of the first snippet that you can pretend replaced the original: