The recommended solution uses "scoped_threadpool". But "scoped_threadpool" uses ...

herbstein · on May 24, 2021

> But "scoped_threadpool" uses "unsafe".[1] They could not actually do this in Rust. They had to cheat.

Using unsafe isn't "cheating". The whole point of using Rust is that you can encapsulate very small amounts of "unsafe" code within a safe abstraction, and know that there is no way to abuse the safe APIs to produce undefined behavior. The formal verification project "RustBelt" [0] has proven exactly this -- safe Rust code composes arbitrarily to produce safe behavior.

There is still a burden of proof on anyone writing a safe function that uses unsafe functionality internally. Keep in mind that "unsafe" Rust still poses more restrictions on the programmer compared to raw C/C++.

> The language has difficulty expressing that items in an array can be accessed in parallel.

The Rust borrow semantics is based on `exclusive XOR shared` references. At the first order of approximation it's not possible to mutate a value through a shared reference. Interior mutability complicates this image somewhat, but it's enough to explain why what you wrote is wrong.

Spawning `n` threads and giving each thread a mutable reference to the vector `v` goes against the above borrow semantics because that would result in `n` exclusive references. Only through exclusive references is mutation allowed. Since this is not allowed it becomes a compile-time error.

In other words, it's not that the language has "difficulty expressing" that scenario. It was explicitly designed to not allow it.

  [0]: http://plv.mpi-sws.org/rustbelt/

Animats · on May 25, 2021

Using unsafe isn't "cheating". The whole point of using Rust is that you can encapsulate very small amounts of "unsafe" code within a safe abstraction, and know that there is no way to abuse the safe APIs to produce undefined behavior. The formal verification project "RustBelt" [0] has proven exactly this -- safe Rust code composes arbitrarily to produce safe behavior.

"Within a safe abstraction" is the question. The question is whether a piece of encapsulated code is always memory-safe for all uses. There's no guarantee of that from the language. The RustBelt people are working on tools for that, but it will probably require proof work in Coq for each bit of code containing "unsafe" to get there.

Look at the scoped_threadpools example again [2]

    use scoped_threadpool::Pool;

    fn main() {
        let mut pool = Pool::new(3);
        let mut v = vec![1, 2, 3];
        
        pool.scoped(|scope| {
            for i in &mut v {
                scope.execute(move ||{
                    *i += 1;       
                });
            }
        });
        
        println!("v: {:?}", v);
    }

There's an implicit assumption here that all the values returned from the iterator at

    for i in &mut v

are disjoint. But iterators, in general, do not guarantee disjoint outputs. An iterator which returned each reference value twice, for example, is a valid iterator. But used in the context above, two threads would receive mutable access to the same element of a vector. That seems to violate a core Rust safety assumption.

This would be a nice test to run through the version of the Miri interpreter that implements the dynamic "stacked borrow" checks from the RustBelt group.[2] That tool should catch this.

[1] https://news.ycombinator.com/item?id=24992747

[2] https://plv.mpi-sws.org/rustbelt/stacked-borrows/paper.pdf

herbstein · on May 25, 2021

> There's an implicit assumption here that all the values returned from the iterator at

> for i in &mut v

> are disjoint. But iterators, in general, do not guarantee disjoint outputs. An iterator which returned each reference value twice, for example, is a valid iterator. But used in the context above, two threads would receive mutable access to the same element of a vector. That seems to violate a core Rust safety assumption.

Your assumptions here are wrong. "An iterator which returned each reference value twice" could only be implemented using unsafe Rust, and code emitting multiple co-occuring mutable references to the same value are instant-UB. In other words, a "safe abstraction" being able to do this is actually not a safe abstraction at all. This would be a bug in the implementation of the custom iterator.

In fact, if you tried to prove such an implementation using the formal tools (Iris) from RustBelt you wouldn't be able to do it. I know, because I've done it as part of my course work under Lars Birkedal.

I'm not sure if you misunderstanding how `&mut v` turns into an iterator, which values it spits out etc, or what is happening. Let's look at how that code is translated into a program and typechecked. First, let's look at the for-loop itself. Anything on the form of

  for x in y

must have an implementation of the `IntoIterator` trait for the type of the expression `y`. Whatever that type might be. In this case `y` is actually `&mut v`, where `v`'s type is `Vec<i32>`. Luckily enough, an implementation of `IntoIterator` exists for any `&mut Vec<T>`. [0]

Looking at the expanded signature we can see that the associated type `Item` is `&mut T`. This tells us that the types the resulting iterator will produce are `&mut i32`. Keep in mind that this reference is into the backing memory store of the `Vec<i32>` of `v`. This is where Rust's semantic help us.

You say the following:

> But iterators, in general, do not guarantee disjoint outputs.

The issue with that statement is that iterators don't have to. Rust guarantees that condition. Safe Rust can't return multiple mutable references to the same value. Further, any unsafe implementation, even when exposed through a safe interface, giving this behavior is a bug. It would be great if the compiler could statically verify unsafe code too, but if it could there wouldn't be a need to call it unsafe.

> The question is whether a piece of encapsulated code is always memory-safe for all uses. There's no guarantee of that from the language.

That is ultimately impossible to prove completely and statically. Turing and Gödel made sure of that. However, some things that the compiler can't prove we can still prove as outside observers. Often these things will boil down to controlling and checking the states of several variables before performing an unsafe operation, but knowing that in a given context the operation is safe.

The first part of the above quote is partially true. However, if we assume a given unsafe implementation is indeed safe then the rules of Rust's semantics make the composition of any number of safe APIs a completely safe composition. This is a strong, useful statement because it means the very few places where unsafe is required can be easily checked, and every other piece of code doesn't have to spend mental energy for the programmer to consider the safety of their implementation.

  [0] https://doc.rust-lang.org/std/iter/trait.IntoIterator.html#impl-IntoIterator-19

Animats · on May 25, 2021

The issue with that statement is that iterators don't have to. Rust guarantees that condition. Safe Rust can't return multiple mutable references to the same value.

So how does Vec return mutable references to elements? Unsafe code, apparently. You can't write your own safe collection class with an iterator and return mutable references. You'd need a proof of disjointness system to do that.

The question is whether a piece of encapsulated code is always memory-safe for all uses. There's no guarantee of that from the language. That is ultimately impossible to prove completely and statically.

One can construct code for which memory safety is not decidable. That's a good reason to reject it. The Microsoft Static Driver Verifier has a simple solution - if symbolic execution can't verify safety within some time limit, it rejects the driver.

herbstein · on May 25, 2021

> So how does Vec return mutable references to elements? Unsafe code, apparently. You can't write your own safe collection class with an iterator and return mutable references. You'd need a proof of disjointness system to do that.

If you have a mutable binding to a vector `v` you can get a single mutable reference to one of the elements of that vector at a time. Because getting a single mutable reference to an element requires taking a mutable reference to the vector the disjointedness invariant is upheld. This is the core of the Rust borrow checker. If you're unsure about the semantics you should be reading about it, not trying to sus them out from a fairly surface level discussion on HackerNews.

This code:

  for x in &mut v {
     *x = // ...
  }

is perfectly valid. For every iteration of the loop only a single mutable reference to an element exists, and the reference is dropped before the next mutable reference is taken. This following piece of code is a full example that showcases how taking two mutable references is not allowed.

  struct ComplexNonCopyClone {
      a: u32,
  }

  fn main() {
      let mut v = vec![
          ComplexNonCopyClone { a: 1 },
          ComplexNonCopyClone { a: 2 },
          ComplexNonCopyClone { a: 3 },
      ];

      let mut_one = v.get_mut(0).unwrap();
      let mut_two = v.get_mut(0).unwrap();

      mut_two.a = 4;

      println!("{}", mut_one.a);
  }

Do note that using `mut_one` after `mut_two` is important, as otherwise the compiler will infer that `mut_one` can be dropped before `mut_two` is create, removing the clash.

> So how does Vec return mutable references to elements? Unsafe code, apparently.

The implementation can be found quite easily[0]. It's essentially invariants upheld by runtime checks and knowledge about the state of the given references/pointers. Notice the comments starting with "SAFETY". They explain the assumptions/conditions/reasons that make the code inside the `unsafe` block safe. If these assumptions can never be violated with malicious input or calling patterns then the code is actually safe and it can be a safe wrapper.

  [0]: https://doc.rust-lang.org/src/core/slice/index.rs.html#149-190

Animats · on May 25, 2021

We're talking about Vec's iterator, not simple slice access. That "SAFETY" stuff in slice.rs is just a subscript check. Vec's iterator is in [1]. The iterator which returns a mutable reference is unsafe code, because you can't express that concept in safe code.

To recap: the interesting issue is the thread pool example which spins off a new thread with a mutable reference to each element of an array. That's a rather unusual thing to do. It is only safe if each iteration returns a reference disjoint from all other references. Because Rust at the language level does not understand disjointness within an array, it needs a hack using "unsafe" to make this work. The borrow checker would not allow this in safe code.

chunks_mut uses the same trick, with more unsafe code.

My point in all this is that because the language doesn't have syntax for talking about disjointness of parts of an array, each time this comes up, another hack in unsafe code is required. But enough here; I may say more on the Rust forums.

[1] https://doc.rust-lang.org/src/core/slice/iter.rs.html

jzoch · on May 24, 2021

> They could not actually do this in Rust. They had to cheat.

That isn't true at all. The implementation of scoped_threadpool being in rust is all the evidence you need that they did not cheat. unsafe is a perfectly valid construct in rust that does not subvert its benefits. It just is a reversal of the defaults in other languages (unsafe by default - opt-in to safer constructs like smart_ptr).

People freak out too much over this stuff. If you are learning - feel free to use escape hatches from "idiomatic rust" but recognize that you aren't being idiomatic (and therefore may be handicapping your learning!). Its okay to use unsafe despite idiomatic rust trying to avoid it unless necessary (and even then providing safe wrappers over it). Its also okay to use Arc<Mutex> and the like if you need to just recognize there may be a better way.

steveklabnik · on May 24, 2021

unsafe is not cheating. It is part of the language for a reason, there to be used when appropriate. It is literally impossible to make a useful program without any unsafe code anywhere, because there aren’t Rust CPUs. By this definition, “Rust” does not exist.

thread::spawn uses unsafe code. sleep uses unsafe code. Is your own example not in Rust? Are you cheating?

(And yes, the spawn version isn’t optimally concurrent. The comment was long enough without getting into that. People struggle to get examples to compile before they worry about things like this. That would be next steps.)

Animats · on May 25, 2021

If you need "unsafe" in pure Rust code, not to talk to something external, the language has failed you at some point.

There are a relatively small number of trouble spots. They include, at least:

- No way to talk about partially initialized arrays in Rust, which leads to Vec needing "unsafe".

- Back references, which makes some kinds of trees, and of course doubly-linked lists, difficult.

Those I've mentioned before. From the discussion above, add:

- Problems around interior mutability within arrays, where you need some way to say, and prove, "a is disjoint from b", before working separately on a and b.

These are classic proof of correctness areas. It's fairly straightforward to state the conditions that have to be proven, and usually not that hard to prove them. The RustBelt crowd seems to be working on this.

Maybe they'll come up with a way to prove out conditions like that, short of grinding away by hand in Coq. Most of those problems are within reach of a fully automatic prover, like a SAT solver.

zozbot234 · on May 25, 2021

> These are classic proof of correctness areas. It's fairly straightforward to state the conditions that have to be proven, and usually not that hard to prove them.

You're right, but current Rust cannot express logical preconditions in code, much less compiler-checkable proofs. Many groups are working on this via a variety of approaches, but it will take some time before there's a common standard for expressing these things as part of the Rust language itself.

pjmlp · on May 25, 2021

From a language like Modula-3 or Ada point of view, unsafe should be used only for low level system programing stuff, like passing data into DMA buffers, OS APIs, Assembly and similar.

I agree with Animats here, to work around typesystem issues means that the type system still isn't expressive enough to define certain proprieties in a safe way.