Hacker News new | past | comments | ask | show | jobs | submit login
What's in the Box? (fasterthanli.me)
169 points by milliams on April 19, 2021 | hide | past | favorite | 74 comments



Preemptive comment: the following output towards the second half:

    help: function arguments must have a statically known size, borrowed types always have a known size
      |
    3 | fn f<T>(&t: T)
      |         ^
is a bug that already has a PR to fix. It's supposed to point at the T and suggest the following instead:

    help: function arguments must have a statically known size, borrowed types always have a known size
      |
    3 | fn f<T>(t: &T)
      |            ^


I've added a link to the PR directly in the article to clear up the confusion!


You changed the function signature between your two code snippets, that makes it hard to understand what you mean.


Check out the context from TFA: this is a help message prescribing a change you could make (inserting a &). The signature is different because it's prescribing a different signature to fix the issue.


This managed to capture a common occurrence when I've tried to learn rust. The compiler spits out a short but seemingly helpful error like this:

  warning: trait objects without an explicit `dyn` are deprecated
    help: use `dyn`: `dyn Error`
You make the code change it suggests and then you get a longer error message that says what it just told you to do is invalid.


The longer error message would have been emitted regardless of the warning:

    warning: trait objects without an explicit `dyn` are deprecated
     --> file.rs:3:13
      |
    3 | fn foo() -> Trait {
      |             ^^^^^ help: use `dyn`: `dyn Trait`
      |
      = note: `#[warn(bare_trait_objects)]` on by default
    
    error[E0746]: return type cannot have an unboxed trait object
     --> file.rs:3:13
      |
    3 | fn foo() -> Trait {
      |             ^^^^^ doesn't have a size known at compile-time
      |
    help: use some type `T` that is `T: Sized` as the return type if all return paths have the same type
      |
    3 | fn foo() -> T {
      |             ^
    help: use `impl Trait` as the return type if all return paths have the same type but you want to expose only the trait in the signature
      |
    3 | fn foo() -> impl Trait {
      |             ^^^^^^^^^^
    help: use a boxed trait object if all return paths implement trait `Trait`
      |
    3 | fn foo() -> Box<dyn Trait> {
      |             ^^^^^^^^^^^^^^
https://play.rust-lang.org/?version=nightly&mode=debug&editi...

Edit: and for some more realistic cases where the compiler can actually look at what you wrote, instead of just giving up because you used `todo!()`:

    warning: trait objects without an explicit `dyn` are deprecated
     --> file.rs:7:20
      |
    7 | fn foo(x: bool) -> Trait {
      |                    ^^^^^ help: use `dyn`: `dyn Trait`
      |
      = note: `#[warn(bare_trait_objects)]` on by default
    
    error[E0308]: `if` and `else` have incompatible types
      --> file.rs:11:9
       |
    8  | /     if x {
    9  | |         S
       | |         - expected because of this
    10 | |     } else {
    11 | |         D
       | |         ^ expected struct `S`, found struct `D`
    12 | |     }
       | |_____- `if` and `else` have incompatible types
    
    error[E0746]: return type cannot have an unboxed trait object
     --> file.rs:7:20
      |
    7 | fn foo(x: bool) -> Trait {
      |                    ^^^^^ doesn't have a size known at compile-time
      |
      = note: for information on trait objects, see <https://doc.rust-lang.org/book/ch17-02-trait-objects.html#using-trait-objects-that-allow-for-values-of-different-types>
      = note: if all the returned values were of the same type you could use `impl Trait` as the return type
      = note: for information on `impl Trait`, see <https://doc.rust-lang.org/book/ch10-02-traits.html#returning-types-that-implement-traits>
      = note: you can create a new `enum` with a variant for each returned type
    help: return a boxed trait object instead
      |
    7 | fn foo(x: bool) -> Box<dyn Trait> {
    8 |     if x {
    9 |         Box::new(S)
    10|     } else {
    11|         Box::new(D)
      |
and

    warning: trait objects without an explicit `dyn` are deprecated
     --> file.rs:7:20
      |
    7 | fn foo(x: bool) -> Trait {
      |                    ^^^^^ help: use `dyn`: `dyn Trait`
      |
      = note: `#[warn(bare_trait_objects)]` on by default
    
    error[E0746]: return type cannot have an unboxed trait object
     --> file.rs:7:20
      |
    7 | fn foo(x: bool) -> Trait {
      |                    ^^^^^ doesn't have a size known at compile-time
      |
      = note: for information on `impl Trait`, see <https://doc.rust-lang.org/book/ch10-02-traits.html#returning-types-that-implement-traits>
    help: use `impl Trait` as the return type, as all return paths are of type `S`, which implements `Trait`
      |
    7 | fn foo(x: bool) -> impl Trait {
      |                    ^^^^^^^^^^


ah yeah I've gotten that

  error[E0308]: `if` and `else` have incompatible types
before.

I was trying to write a simple little program that would output data and optionally reversed (like sort -r). Nothing I could do with iterators would work because it kept trying to tell me that a reverse iterator was not compatible with an iterator. It would work if I hardcoded it, but any code like

  output = if (reverse) { result.rev() } else { result };
would fail to compile with a completely nonsense error message. I think I eventually got it to work by collecting it into a vector first and reversing that. Haven't really touched rust since. Honestly the compiler might as well just tell me to go fuck myself.


> would fail to compile with a completely nonsense error message.

How long ago was this? We spend a lot of time trying to make the error messages easy to follow and informative. If it wasn't that long ago, I would love to see the exact cases you had trouble with in order to improve them.

> I think I eventually got it to work by collecting it into a vector first and reversing that.

Collecting and reversing is indeed what I would do if I couldn't procure a reversible iterator (DoubleEndedIterator), but if you have one, you can write the code you wanted by spending some cost on a fat pointer by boxing the alternatives[1].

> Haven't really touched rust since. Honestly the compiler might as well just tell me to go fuck myself.

I'm sad to hear that, both because you bounced off (which is understandable) and because that experience goes counter to what we aim for. We dedicate a lot of effort on making errors not only readable, pedagogic and actionable (with varying levels of success). We really don't want the compiler to come across as antagonistic or patronizing.

[1]: https://play.rust-lang.org/?version=stable&mode=debug&editio...

Edit: For what is worth, type mismatch errors do try to give you appropriate suggestions, but in the case of two types that both implement the same trait (like the one you mention), the compiler does not look for traits that are implemented for both:

    error[E0308]: `if` and `else` have incompatible types
      --> file.rs:10:9
       |
    7  |       let x = if true {
       |  _____________-
    8  | |         A
       | |         - expected because of this
    9  | |     } else {
    10 | |         B
       | |         ^ expected struct `A`, found struct `B`
    11 | |     };
       | |_____- `if` and `else` have incompatible types
This could be done, but that has the potential to give you a lot of output for all possible traits that could satisfy this code, and in general we would be sending you in the wrong direction. When we can't be 90% sure that the suggestion is not harmful, we just "say nothing", like in this case. On a real case of the example above, you'd be more likely to want to create a new enum.


Per the sibling conversations: instead of having the compiler tell users about traits that are implemented by both arm types, maybe it would be more productive to tell users how the issue arises from static dispatch considerations?

Maybe if there's an attempt to invoke a method on the result later, like in this case, the compiler could point to it in a "note" and say "Would not be able to determine statically which `impl` of this method to invoke", or something.

Users with experience in languages like Java and Python will have a reasonable expectation that code like this should work, because "they both implement iteration" [0]. It's definitely not obvious that dynamic dispatch is why that can work, and how Rust's static dispatching default impacts idioms like this.

It's singularly frustrating to try to express yourself in a language where familiar idioms just don't carry anymore -- as anyone who's gone from Haskell to Java can attest. I think it's valuable to recognize the idiom and gently teach users why Rust differs.

[0] https://news.ycombinator.com/item?id=26867490


The problem for the `if/else` case is that we need to identify the trait that both arms should be coerced to, and that set could potentially be huge, and any suggestion should skip things like std::fmt::Display. That's why the suggestion I showed earlier only works on tail expressions, we can get the type from the return type and work from there, and even account for someone writing `-> Trait` and suggest all the appropriate changes.

I just filed https://github.com/rust-lang/rust/issues/84346 to account for some cases brought up and I completely agree with your last sentence. It is something that others and I have been doing piecemeal for a while now and would encourage you (and anyone else reading this) to file tickets for things like these at https://github.com/rust-lang/rust/issues/


Sure, that makes sense.

In some of the cases in this discussion, isn't the problem the compiler has to solve a little bit simpler? From (a) the indeterminate type of the result and (b) the invocation of a method on that value within the same scope, we should be able to infer the trait that the user is relying on. (Assuming the trait itself is in scope, which, if it isn't, is already an error that hunts for appropriate traits to suggest, I think?)

In some other cases we've discussed here, the actual trait we want is named in the return type, which also fills the role of (b) above. I think this is the case you outlined.

I guess my point is, it seems like we already have enough local information to avoid doing a type-driven trait search. In one case, we have a method and two non-unifiable types, and in the other, we have a trait and two non-unifiable types. I can see how the more general case of "just two non-unifiable types" would be hard, but I'm not sure we have to solve that to cover a meaningful chunk of the problem space.


I would almost say iterators could/should be treated as a special case here (in terms of informing the compiler message). It's extremely unintuitive that adding a transform to an iterative gives you a totally different concrete type. I understand why this is, but most people won't, and it's one of the most prominent cases of this general problem in my experience.


Boxing isn't necessary. It's just not very pretty.

https://play.rust-lang.org/?version=stable&mode=debug&editio...


That snippet gives a warning: "trait objects without an explicit `dyn` are deprecated". Adding the `dyn` in the right place (`&mut dyn Iterator<Item=i32>`) makes it a little more clear that you're still paying the costs of a fat pointer (half for the trait object pointer, half for the instance pointer), even if the instance is indeed stack-allocated and not heap-allocated ("Box").

If you're returning the `dyn Iterator` from this function, you'd likely need to Box it anyway, since it will go out of scope on return. (Of course, you inlined the function to account for this ;) )

None of which is to say you're wrong; only that different solutions will be appropriate for different cases. "Box" will probably work more consistently in more cases, but it's definitely valuable to have the stack-allocated approach when it's applicable.


yeah.. that's basically what i was trying to do:

https://play.rust-lang.org/?version=stable&mode=debug&editio...

so where it says

   = note: expected type `Rev<std::vec::IntoIter<_>>`
            found struct `std::vec::IntoIter<_>`
it could not be more unhelpful. I know those things are different, however the following "for i in output" works for both of those things individually, so why does it matter that the types are different since they both implement iteration?


I think the key idea here is that Rust consistently uses static dispatch by default. When you invoke some method defined by a trait, it needs to look up at compile-time which actual implementation it should dispatch to. Since the if-else expression isn't returning the same type, it doesn't matter if they both implement Iterator -- Rust still doesn't know which actual type will be produced, so it won't be able to tell which method implementations should actually be dispatched to.

Dynamic dispatch, which solves the issue you faced, needs to be explicitly opted into using the `dyn Trait` syntax, since it introduces a hidden indirection through a fat pointer.

This is definitely a difference from languages like Java or Python, where dynamic dispatch is the default (and sometimes there's no way to explicitly use static dispatch). On the other hand, languages like C and C++ also use static dispatch by default, with mechanisms like function pointers or `virtual` to provide dynamic dispatch as needed.

You would very likely have faced a similar problem in C++, had you used `auto x = (..) ? ... : ...`; (If you used `T x = ...; if (...) { x = ... } ...`, you'd have been faced immediately with the issue of "what type should T be" anyway, I think.)


That actually makes perfect sense to me.

I ran into that issue trying to port something else to rust, just didn't realize it was this same issue. In go you just define an interface and then make a slice of that interface and put things in it. In rust I ended up having to do

  Vec<Box<dyn Checker>>
I think initially I tried just doing a

  Vec<Checker>
and when that failed I ended up putting something like "How do I make a vec of an impl in rust" and found a code sample.

That's where the compiler just saying "The types don't match" is not very helpful.


I think the fundamental issue is that most languages these days use dynamic-dispatch invisibly and by default. Rust seeks to empower users to be more efficient by default (which is good), but sometimes, especially on this iterators case, it creates deeply confusing/frustrating barriers that have to be explicitly stepped around.


If I'm following correctly, you had a situation like the following, right?

https://play.rust-lang.org/?version=stable&mode=debug&editio...

    error[E0277]: the size for values of type `dyn std::fmt::Display` cannot be known at compilation time
       --> src/main.rs:5:12
        |
    5   |     let y: Vec<dyn Display> = x.into_iter().collect();
        |            ^^^^^^^^^^^^^^^^ doesn't have a size known at compile-time
        |
        = help: the trait `Sized` is not implemented for `dyn std::fmt::Display`
    
    error[E0277]: a value of type `Vec<dyn std::fmt::Display>` cannot be built from an iterator over elements of type `{integer}`
     --> src/main.rs:5:45
      |
    5 |     let y: Vec<dyn Display> = x.into_iter().collect();
      |                                             ^^^^^^^ value of type `Vec<dyn std::fmt::Display>` cannot be built from `std::iter::Iterator<Item={integer}>`
      |
      = help: the trait `FromIterator<{integer}>` is not implemented for `Vec<dyn std::fmt::Display>`
    
    error[E0277]: the size for values of type `dyn std::fmt::Display` cannot be known at compilation time
       --> src/main.rs:5:31
        |
    5   |     let y: Vec<dyn Display> = x.into_iter().collect();
        |                               ^^^^^^^^^^^^^^^^^^^^^^^ doesn't have a size known at compile-time
        |
        = help: the trait `Sized` is not implemented for `dyn std::fmt::Display`
I can see a bunch of places where we could improve the output that we haven't gotten to yet:

- The third error shouldn't have been emitted in the first place, the first two are more than enough

- The first error has a note, but that note with a little bit of work could be turned into a structured suggestion for boxing or borrowing

- For the second suggestion we could detect this case in particular where the result would be !Sized and also suggest Boxing.

Edit: filed https://github.com/rust-lang/rust/issues/84346

It is also somehow unfortunate that `impl Trait` in locals isn't yet in stable, but once it is it would let you write `let z: Vec<impl Display> = x.into_iter().collect();`, but as you can see here, that doesn't currently work even on nightly: https://play.rust-lang.org/?version=nightly&mode=debug&editi...


Listing all possible examples would indeed cause far too much output, but maybe the error should mention the possibility? Like: "If A and B are implementations of the same trait, you can use ...".


Good read. Amos used to be the most visible/active developer on itch and his twitter was interesting to read. He also has a lot of other pieces like this that really dig into how and why you go wrong when working with something that is unfamiliar to you.

I really enjoyed this one:

https://fasterthanli.me/articles/i-am-a-java-csharp-c-or-cpl...


> Because in JavaScript, if something goes wrong, we just throw!

A bit out of topic: somewhat ironic that the JS/TS-based projects in my workplace we ended up using return and Result<T, E>-like struct, backed by TypeScript's union type.

Throws is only for never-scenarios and unhandled cases.

Rust-like approach has been very successful to squash runtime bugs, especially in big, complex products.


I really like this pattern in TS:

    const x: Data | Error = await fetchData().catch(e => e);
    if (x instanceof Error) {
      return 'too bad'
    }
    // proceed with x as Data
Basically you add `.catch(e => e)` to any I/O which may fail, and handle the error in an `if` instead of a `try`.

Many of the benefits of Go's error handling, but without the awkward extra variable and without a chance of using your result without ensuring it isn't an error.


This was a great read! I'm starting to pick up rust and this is helpful.

I've also been watching the Crust of Rust series from Jon Gjengset and enjoying those as well.

https://www.youtube.com/playlist?list=PLqbS7AVVErFiWDOAVrPt7...


Jon is amazing. I've been watching his videos while doing the dishes. Unfortunately I'm essentially caught up. Any other recommendations?


Have you looked at Ryan Levick's videos on youtube? He also does a pretty good job covering the language.

edit for link: https://www.youtube.com/channel/UCpeX4D-ArTrsqvhLapAHprQ


Thanks, I'm vaguely familiar with them from the Rustacean Station discord. Somehow their style doesn't quite work for me.


Thanks for the recc!


David Pedersen's streams/videos are also great: https://www.youtube.com/channel/UCDmSWx6SK0zCU2NqPJ0VmDQ/vid...


Thanks, I'm vaguely familiar with them from the Rustacean Station discord. Somehow their style doesn't quite work for me.


As someone trying to move into systems programming with rust, thanks for the resources guys!


This blog post is really difficult to navigate. It's about Box, so... ^Fbox, right? No, because the author uses a directory called "whatbox" and so their examples are littered with "whatbox", so... ok, ^Fbox and match word, but now the section titled "What the heck is a Box?"... says nothing about Box.


The moment I read "As a Java developer, you may be wondering if we're trying to turn numbers into objects (we are not)" without an explanation of what we _were_ trying to do, I came to the comments to see whether it was worth persevering.

Is there a common name - ideally (since I love them) an eponymous law? - for the precept that, when writing, you shouldn't introduce a question in your reader's mind without at least _acknowledging_ it? Ideally, of course, it should be answered immediately, but sometimes that's impractical. To prevent your reader from having too many "open loops" in their brain, though (shout out to [GTD](https://gettingthingsdone.com/)), if the answer isn't essential to what follows (which is not the case here), you should at least reassure the reader that they don't need to know the answer to that question yet.


This. TFA is too long and doesn't show even a glimpse of getting to the point.


While I like the idea of Rust, having to (re)-learn all these subtleties seems daunting (just look at the table comparing dyn vs. box vs. arc vs. references etc.)

When I moved to Java 1.0, coming from C/C++, I hated the performance loss but happily traded that for a garbage collector. Typically, when the code compiled, it ran.

Now with Rust I'm wondering how much time practitioners spend on analyzing compiler errors (which is a good deal better than analyzing heap dumps with gdb). And do you get to a place where your coding naturally avoids gotchas around borrowing?


> Now with Rust I'm wondering how much time practitioners spend on analyzing compiler errors

Almost zero. Seriously. Because the rules got internalized for me pretty quickly.

If you asked me, "how much time did you spend when just starting Rust," then it would be a lot more than zero. Enough to noticeably slow me down. But it got down to near-zero pretty quickly. I'd maybe a month or so with ~daily programming.


I'll caveat this by saying that to have this kind of experience, it's incredibly important to understand why you're getting these types of errors in the first place.

Rust programs require you to consider some things upfront in your design that you don't have to think about in other languages. If you internalize these requirements, designing programs in Rust can quickly become just as easy and natural as developing in other languages. But it can feel arbitrary and impossible if you just try and force your way forward by `Box`ing and `clone()`ing everything endlessly because it seems to make the annoying compiler errors go away.

If you're the type of engineer who learns a new language and just ends up writing programs in the style of your old language (but with different syntax), Rust is going to feel a lot harder and you may never "get" it. The difficulty curve of Rust is—I think—much steeper than other languages for this type of engineer. You can be productive writing C-style programs in golang. You can be productive writing Java-style programs in Ruby. But Rust is going to fight you much harder than other languages if you try to approach things this way.

If you're the type of engineer who strives to build idiomatic software in whatever language you're using, you'll have a much faster ramp-up to proficiency.


Personally, I find the overhead of dealing with rust memory management to be 100% worth it when writing embedded code with no dynamic allocation. It can really help to prevent bad memory management practices and, not so much catch, but rather structurally prevent, bugs. If you’re really experienced with embedded C you were probably doing things mostly the same way anyway.

For writing code on an operating system, I’m in the same boat as you; I would rather have GC. Haskell and Rust are spiritually actually pretty similar with the former simplifying memory management and enriching the type system (at the cost of needing to worry about things like memory leaks), and I tend to go to Haskell for non-embedded applications most of the time.


> While I like the idea of Rust, having to (re)-learn all these subtleties seems daunting

It's not like Java is any simpler. Rust gives you the equivalent of a GC, except at compile time. And the compiler tells you when you're getting it wrong.


I've heard before the idea that Rust has a "compile time GC" or "static GC", and while I can sympathize with wanting to leverage that term, it already has a fairly well understood meaning and it's not what Rust provides. The only GC that comes built-in to Rust is reference counting via Rc and Arc.

With an actual GC, there is no notion of getting it wrong; the whole point of a GC is that it automates the handling of memory. A useful metaphor for a GC is that it's a system that simulates an infinite amount of memory. With a GC, at least conceptually, you don't allocate and deallocate memory, rather you declare what object you want and it stays alive forever. The GC works behind the scenes to maintain this illusion, although since it's only a simulation there are certain imperfections.

There are some languages that can do this at compile time, such as Mercury and I believe Rust took some inspiration from Mercury... but Rust does not have a compile-time GC the same way that Mercury does.


> With an actual GC, there is no notion of getting it wrong; the whole point of a GC is that it automates the handling of memory.

Rust is memory safe, so Rust programs don't go "wrong" either, barring use of unsafe{} or bugs in the underlying implementation.

You only need Rc<> or Arc<> for objects that may have more than a single "owner" controlling their extent. That's a comparatively uncommon case.


You're switching context here and doing so in a way that's fairly pedantic and not really useful.

You mentioned that Rust informs you when you're "getting it wrong" and most people who aren't being pedantic can understand the meaning of that; that there's something that would otherwise go wrong if not for a compile time check that prevents it.

In most GC'd languages, there is no notion of something that would have otherwise gone wrong if not for a compile time check (with respect to memory safety). In most GC'd languages, that very concept doesn't exist.

Another way to put it is that there's nothing for a Java compiler to tell a user about "getting it wrong" because there's nothing to get wrong in the first place (with respect to memory safety, since we're being overly pedantic now).


So, I think you've gravely misunderstood the concepts at work here.

You know NullPointerException? Does that feel like maybe "getting it wrong" ? But it does still happen to Java programmers. That can't happen in (safe) Rust. If you write a program that could try to dereference a null pointer it won't compile. You'd be getting it wrong.

Or let's try something a bit more sophisticated. Many Java data structures can be subject to a Data Race in threaded software. So you may be "getting in wrong" in the sense that on John's 16 core monster server the output is incorrect, but on your cheap 10 year old laptop it works (much more slowly but) correctly. Both outcomes were valid meanings of your Java program, and Java provides some tools you could use to protect yourself, but it won't even warn you that you were "getting it wrong" the results are just incorrect, too bad.

In (safe) Rust, Data Races can't happen, the compiler will reject your program. Some other types of Race Condition can happen, but no Data Races.


I literally state in my post that I am referring to garbage collection and memory safety. I literally even state this specifically because I knew if I didn't someone would bring up completely irrelevant details for the sake of argument. And yet... here we are.

In Rust, data races are undefined behavior, and so safe Rust mostly prevents them even though there are still to this day subtle issues about where the boundary between safe and unsafe Rust is. That said, this is a great thing that Rust provides, it's a genuinely incredible step forward, but it has very little to do with this topic.

However in Java, data races are not undefined behavior, they have well specified semantics and do not result in memory errors the same way they do in Rust or C++:

https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.htm...

Calling a NullPointerException a memory safety violation is like calling a panic on unwrap or panic on array out of bounds a memory safety violation (they're not). Both are well defined operations with specified semantics.

Are they likely bugs? Yes absolutely, but neither Java or Rust prevent developers from writing bugs and the fact that you're confusing program correctness with memory safety only indicates that it's you who gravely misunderstands the concepts being discussed.


NullPointerException isn't a memory safety violation but it is getting it wrong.

That's what the original comment claimed, that Rust's compiler "tells you when you're getting it wrong".

Since you brought it up - I'd actually say the existence of unwrap() shows this trend elsewhere in Rust. Java is one of many C-style languages in which silently discarding the important result of your call is a common mistake. In some cases Java tried to mitigate this with a Checked Exception, but now you're just adding a bunch of boilerplate everywhere, it doesn't do much to encourage a better way forward. Rust's Result and Option force a programmer to explicitly decide to discard unhandled cases (Errors and None respectively) each time if that's what they meant. Yet another case where the Rust compiler will tell you if you're getting it wrong.


The original comment was speaking about getting it wrong with respect to memory safety, not NullPointerExceptions, not array out of bounds accesses, not division by zero, but about memory safety.

This discussion isn't about program correctness as a general and broad concept, Rust and Java both have various strategies to eliminate many classes of errors and both languages leave the door open to many other classes of errors.

This discussion is about whether Rust uses a compile time garbage collector in order to ensure memory safety. It does no such thing, Rust has a borrow checker which ensures that syntactically valid expressions referencing memory have a correspondingly valid semantic interpretation. C++ does not have such a thing, syntactically valid expressions referencing memory may not have any valid semantic interpretation, what is referred to as undefined behavior. This is not what a garbage collector does in any sense of the word. A garbage collector is a system that computes an upper bound on object lifetime and when an object exceeds that upper bound, reclaims the memory associated with the object. Rust does no such thing at compile time.

Rust's system of enforcing memory safety is great, it's a step forward in language design, by all means give it the praise it deserves... just don't refer to it by a concept that already has a well defined meaning and an active area of research. Compile time garbage collection is a separate concept from how Rust enforces memory safety and there's not much utility in reusing that term, all it does is create confusion.


You're clutching at unrelated straws. Rather than comparing to Java, try comparing to OCaml, which is a language that's much closer to "Rust with GC". There's pretty much no safety gain from using Rust over OCaml. But if you use OCaml you don't have to worry about borrow checking.


> There's pretty much no safety gain from using Rust over OCaml.

> But if you use OCaml you don't have to worry about borrow checking.

I've never written any OCaml, when you choose not "to worry about borrow checking" how does OCaml arrange to ensure your program is free from data races in concurrent code anyway? Or do you consider that "pretty much no safety gain" ?


OCaml's memory model specifies bounded space-time SC-DRF.

What this comes down to in simple terms is that data races have well specified semantics and their effects are bounded both in terms of what is affected by a data race, and when it's affected.

Using C as a starting point, a data race can modify any region of memory, not just the memory involved in the read/write, and the modification can be observed at any time, it might be observed after the write operation of the data race executes or it can be observed before the write operation executes (due to instruction reordering).

In Java, data races are well specified using bounded space SC-DRF. This means that unlike C, data races are NOT undefined behavior. A data race is limited to only modify the specific primitive value that was written to. However it does not specify bounded time, so when the modification of that primitive value is observed is not specified by the Java memory model, it could happen before or after the write operation.

OCaml's memory model specifies both bounded space and time SC-DRF. When a data race occurs, it can only modify the primitive value that was written to, and the modification must be observed no sooner than the beginning of the write operation and no later than the end of the write operation.


That was a very long-winded non-answer, but I think I understood it to be essentially "Yes".

I'm definitely not an expert, but to me this memory model sounds like a more circumspect attempt to carve out a set of benign data races which we believe are OK. Now, perhaps it will work this time, but on each previous occasion it has failed, exactly as illustrated by Java.

Indeed the Sivaramakrishnan slides I'm looking at about this are almost eerily reminiscent of the optimism for Java's memory model when I was younger (and more optimistic myself). We'll provide programmers with this new model, which is simpler to reason about, and so the problem will evaporate.

Some experts, some of the time, were able to correctly reason about both the old and new models, too many programmers too often got even the new model wrong.

So that leads me to think Rust made the right choice here. Let's have a compiler diagnostic (from the borrow checker) when a programmer tries to introduce a data race, rather than contort ourselves trying to come up with a model in which we can believe any races are benign and can't really have any consequences we haven't reasoned about.

Of course unsafe Rust might truly benefit from nicer models anyway, they could hardly be worse than the existing C++11 model but that's a different story.


>I'm definitely not an expert,

I think that's something we can both agree on.


> It's not like Java is any simpler.

Can attest. It's easier to learn the rules of Java the language, but way harder to learn how to write good Java software. To some extent, Rust forces you to begin learning both at the same time, which is of course more difficult.

What always surprises me is how much "good Rust software" actually coheres with "good software". I'm not saying that you should write software in any language as though it were Rust -- every language has its own effective praxis. Rather, since Rust forces you to pick up some of the rules of good design as you learn, those rules can transfer to other ecosystems, forming part of a language-agnostic basis of engineering. I think that's really cool.

A good example is handles over pointers [0]: recognizing that pointers/reference embody two orthogonal concerns, address and capability, lets you see how to separate them when it benefits the design. Rust's extremely strict single-ownership model often forces you build separate addressing systems, allowing disparate entities to address and relate to each other in complex patterns, while consolidating all capability to affect those entities into a single ownership root.

The mental model of single-ownership itself is valuable for managing the complexity of a large network of entities, and knowing when you can or should break it in other languages has been really valuable to me.

[0] https://floooh.github.io/2018/06/17/handles-vs-pointers.html


>As a Java developer, you may be wondering if we're trying to turn numbers into objects (we are not).

Funny that he answered that so early on, my line of thought coming from C# was "Are they trying to turn a value type into a reference type? I thought Rust doesn't have quite the same distinction so is this some weird way of working with the heap instead of the stack?"


You're basically correct. Everything in Rust is value-typed by default. A Box<> puts a value type on the heap (makes it a reference type). The difference from a normal reference type is that a Box is owned (in Rust ownership terms), where other references aren't. Both normal references and boxes can be typed as dyn Trait which means they dynamic-dispatch method calls based on some trait, instead of statically dispatching them to the concrete type.


That is just a lot of fun.


Agreed. It's a wonderful post that goes into a lot of depth about some fascinating language internals.


> Because in JavaScript, if something goes wrong, we just throw!

Not if you're using `neverthrow` >:)

https://www.npmjs.com/package/neverthrow


https://github.com/gcanti/fp-ts https://github.com/gcanti/io-ts

If you're using TS these are pretty much footgun eliminator.


Oh, why have they redefined the word "enum" to mean a "union"?

    enum Result<T, E> {
        Ok(T),
        Err(E),
    }


To extend on what Steve mentioned, `union` in Rust is a C-style union, but because the type carries no information on what variant the underlying bits represent, accessing them is `unsafe`[1]. On the other hand, `enum`s in Rust are tagged unions, they reserve some data for the typesystem to determine at runtime which variant any given instance corresponds to. Because of that you can't interpret the T as an E with an enum, but you can with an union.

[1]: https://doc.rust-lang.org/reference/items/unions.html


And to add onto that, you can also use a Rust enum like a traditional enum:

    enum Foo {
        OptionA,
        OptionB,
        OptionC,
    }

    enum Bar {
        OptionA = 1,
        OptionB = 2,
        OptionC = 4,
    }


> to determine at runtime

I didn't know this. How is determined? Is there any code inserted by the compiler to do this?

What's the overhead of using Result<T, E>?


TL;DR: the overhead for enums that don't have an inordinate amount of variants is about 8 bytes, but can be zero in special cases.

> What's the overhead of using Result<T, E>?

Enough bits (+ padding) to discriminate between all the variants, as any tagged union would, modulo some tricks for niche optimization. For example, Result<i64, u64> is 16 bytes in memory because it needs 8 bytes to represent either a i64 or an u64, plus another 8 bytes to determine whether it is an Ok or an Err. You can think of it as a

    struct Enum {
        variant: usize,
        bytes: [u8; N],
    }
The niche optimization I mentioned applies for things like Option<NonZeroI64>, where NonZeroI64 fits in 8 bytes but the compiler can ensure that the bitpattern corresponding to the 0 value will never be used, so it can use that to differentiate between Some and None.

https://play.rust-lang.org/?version=stable&mode=debug&editio...


Thank you. Regarding size, from an embedded perspective, do you think all these enums, in a very deep call stack can, ultimately, overflow my stack? Because for every function call there should be 8, or 16 or 24 or N bytes of stack-allocated space to hold these enums. Is this something embedded Rust developers keep in mind? Or perhaps worry about the extra copies it has to make when functions return?

Regarding code: because of the "to determine at runtime" of your parent comment, I thought there could be some runtime checks (extra code) for handling enums.


> Regarding size, from an embedded perspective, do you think all these enums, in a very deep call stack can, ultimately, overflow my stack? Because for every function call there should be 8, or 16 or 24 or N bytes of stack-allocated space to hold these enums. Is this something embedded Rust developers keep in mind? Or perhaps worry about the extra copies it has to make when functions return?

I don't think I have come across any conversation by embedded developers foregoing enums because of that, but anecdotally that space does seem to rely on type state machines a bit more than others, which have no runtime memory cost. A bigger cost consideration is if you have an enum with variants that have wildly different sizes, making the enum big for all cases, whether it is necessary or not. We have a lint to encourage you to Box wildly diverging variants, but that might not be possible if you don't have an allocator.

> Regarding code: because of the "to determine at runtime" of your parent comment, I thought there could be some runtime checks (extra code) for handling enums.

Yeah, I figured that's what happened and I should be more careful with my wording.


Logically speaking, Rust's "enums" are neither enums (in the traditional sense) nor unions. They are tagged unions / disjoint unions / variants / coproducts / algebraic data types. We many names for this concept, but "union" is not one of them, because a union allows for a non-empty intersection.

Under the hood, of course, they are implemented with C-style unions with fields sharing memory. But to conflate Rust's enums with how they are implemented is to disregard the extra safety that they provide.


From my perspective, these really are more like enums than unions. Rust enums cover C-style enums as a limiting case:

    enum Result {
      Ok,
      Err,
    }
If this was all you got, you'd need to return something like `(status, value, err)` to model fallible functions. This is not unlike Go's `value, err` convention, except that `status` is additionally inlined into `value` as a sentinel `nil`. This does nothing to prevent you from using the `value` if an error has occurred, or vice versa.

Instead, we like to carry data _alongside_ the enum tags:

    // using named fields instead of tuple structs, for clarity
    enum Result<T, E> {
        Ok {value: T},
        Err {err: E},
    }
Now we can return a single `Result,` and you can only use `value` if the function succeeded, and you can only use `err` if the function failed.


It has a fixed enumerated list of cases (here `Ok` and `Err`). Some of those cases may have additional payloads, but I don't see how that makes them any less like an enumerated type.


They're not only unions, they're closer to discriminated unions. Rust also has unions, with the "union" keyword.


I'm saying this informatively rather than as a nit-pick but

    Because Result<T, E> is an enum, that can represent two things
is an incorrect use of that terminology - Result here is a tuple (it might possibly be a union but I hope not - those are really weird). Tuples are a really good data structure to get more familiar with since they do a lot of stuff that may be inobvious from the outside. A lot of folks will equate them to arrays and arrays can usually be used to represent them but a type-safe tuple (or n-ple pronounced EN-pull) is, essentially, the useful part of structs that aren't classes.

If you're not yet familiar with tuples I'd suggest spending some time reading up on them since they're a very strong tool in the developer toolbox.


Sorry, but you're not right, at least in the way that Rust uses the terms "enum" and "tuple." In Rust, an enum is a sum type (also called a "discriminated union" in some other languages; we don't use this term for Reasons), and a tuple is a product type (like arrays). Result is an enum.


Oh interesting - I had no idea that Rust called such a thing enum. Wanting to avoid the term discriminated union makes a lot of sense - the untagged union (what you'd get in C for instance) is a train wreck in that it is not a full statement of data. It's pretty hard to talk about tagged unions without people shortening it to unions - there are some alternatives out there like sum type, but I haven't seen them gain much traction linguistically. Within untagged unions the type is generally constrained to a certain extent by compiler checks but the true type of the value is always unknown unless communicated by a separate piece of data. Usually good uses of unions occur in places where that second piece of data is carried along side the first piece (tuple style) in a struct - generally the type should be inferrable from some piece of related data unless you want some serious headaches.


It's all good! Yeah, Rust has C-style unions as well, though they're unsafe, and largely for C interop.

Rust's enums give you the full power of putting other data structures inside of them; the variants can hold data of any kind; single values, structs, tuples, and even other enums.


Coming from C/C++, "enums with data" sound more interesting to me than "unions with a type", even though they describe the same thing. I used C/C++ enums far more often than unions, and often wished I could add extra data to them.

Another thing is that adding data to Rust enums is optional, and so you can have an enum of variants with no data. The union equivalent to that would be a type-only union which sounds kind of odd.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: