Mutexes etc ... exist in Java.

josephcsible · on July 12, 2023

Yes, but Java will happily accept code that doesn't use them where it needs to, leading to bugs like this one. Rust catches that mistake at compile time instead.

pjmlp · on July 12, 2023

Not if the memory has been allocated on a shared memory segment, Rust has no control over what other processes might do.

rocqua · on July 12, 2023

Sound Rust code would either make functions touching the shared memory marked unsafe, or would do a defensive copy out of shared memory.

pjmlp · on July 12, 2023

That safe layer around unsafe still has no way to validate the consistency of the data.

Dylan16807 · on July 12, 2023

It can't proactively validate the data while it's in the shared memory.

If you do your validation during accesses it's fine. If you copy the data out of the shared memory it's fine.

Or you could use a mutex to protect the data between validation and use.

If you're worried about another process editing the memory without taking the mutex, that's equivalent to worrying about other unsafe code editing the memory without taking the mutex. The solution is the same in both place: don't share memory with completely arbitrary code. When people compare languages and techniques, they (rightfully) assume you're not doing that.

jjnoakes · on July 12, 2023

Right, but in rust, not using one is a compile time error. In Java (as you can see by the article), not using one is a silent bug at runtime.

pjmlp · on July 12, 2023

Only for in-memory data structures under Rust's control, if it is related to OS IPC, Rust cannot do anything.

jjnoakes · on July 12, 2023

This isn't true.

kaba0 · on July 12, 2023

This is a heavily optimized system library - you don’t use mutexes here. Rust wouldn’t help here, if mutexes would be fine, they would have been used. Especially that this is the result of C++ and Java code simultaneously.

Hell, it’s probably one area where rust’s benefits are a “hard sell” — you would have to constantly be in unsafe rust manipulating pointers manually as the compiler can’t reason statically about what a layer built on top does without a huge runtime cost (huge, as in you really don’t want to lock/unlock, or even refcount in these hot paths).

Arnavion · on July 12, 2023

No idea why Thaxll and the other comments are mentioning mutexes.

The equivalent (*) API to this Java API in Rust does exist; it's `String::from_utf8(Vec<u8>) -> String`. And the bug in TFA does not exist there. Since the signature consumes the `Vec<u8>` it's impossible for the caller or any other code to still have access to it to be able to modify it concurrently.

Also consider the similar API `str::from_utf8(&[u8]) -> &str`. The bug in TFA does not exist here either. Since the signature takes a `&` borrow of the slice, it is not possible for anything else in the program to have a `&mut` borrow of that slice to be able to modify it concurrently. After the function returns other parts of the program could mutate the slice, but they would only be able to do after the `&str` that is derived from the slice is dropped. So once again nothing would be able to mutate the slice and observe the effects in the `&str` itself.

All these "unable to do" are enforced at compile-time, because "consuming a value makes it unavailable to other parts of the program" and "cannot get a `&mut` to a value as long as a `&` from that value is still in scope" are all typesystem concepts. No mutexes or other runtime checks are involved.

(*) "Equivalent" in that it's an API to convert a sequence of bytes into a string. The Rust API doesn't have the encoding thing of the Java API because the Rust String / str are required to be utf-8 internally. But if an exact equivalent of the Java API did exist in Rust, the signature would still be the same wrt consuming `Vec<u8>` / borrowing `[u8]`, so it doesn't change the overall point re: concurrent modification. Furthermore, concurrent modification would cause problems even with Rust Strings if it was possible, because it would allow a String / str to become invalid utf-8 after they'd already been checked to be valid utf-8, which Rust considers to be UB.

Someone · on July 12, 2023

> No idea why Thaxll and the other comments are mentioning mutexes.

Thaxll mentioned mutexes in a reply to the statement

Java has no way to express the concept of "something that nothing else can modify while I'm looking at it"

Even ignoring the performance aspect that is not the perfect answer, though. AFAIK, the JVM doesn’t have a notion of “you can only modify foo if you hold mutex bar”. That remains something the programmer must enforce.

On the other hand, tooling exists to help them, for example https://www.javadoc.io/doc/com.google.code.findbugs/annotati...

kaba0 · on July 12, 2023

The scenario I was imagining and commenting on was about “implementing a JVM with Java’s semantics in Rust”. Of course if we limit the language itself to safe Rust, we get data race freedom, but at a quite significant price for a high level language (it constraints possibly correct programs down a lot). But Rust would not help with relation to the primitives here at all (implemented in C++/Java).

TheDong · on July 12, 2023

"Rust wouldn't help"

"This bug can't be implemented in rust"

"I meant that Rust doesn't fix the bug in Java. Even if you write rust code, you can also write buggy java code too so rust didn't fix the java code"

You're the only one here who thought "rust" meant "java semantics implemented in rust" in this context.

kaba0 · on July 12, 2023

Because in case of the problem at hand, this is a complex interplay between Java's standard library's Java code and the underlying JVM. There is not much to discuss regarding "rust would make the code safe", because so does JS as it is single threaded.. That's hardly interesting.

If we put Java on top of Rust, then no, Rust no longer can help about this. That was my whole point.

TheDong · on July 12, 2023

> That's hardly interesting

Rust and javascript having differences which prevent this class of bugs might not be very interesting, but it's more interesting than your point.

Unless I'm misunderstanding, your point is that a bug in Java cannot be avoided by switching languages to Java.

kaba0 · on July 12, 2023

No, my point is that changing the implementation language of java wouldn't have helped here.

invalidname · on July 12, 2023

The problem here is that we don't want a mutex. Once you have it the performance cost would apply in runtime. In fact, to write this code in rust you would need to write unsafe code to get around the problem where Rust forces you to write correct but inefficient code.

This code is intentionally not thread-safe. This isn't so much a bug but an interesting thought experiment.

Sharlin · on July 12, 2023

Rust absolutely helps here because in Rust it’s simply impossible for someone else to mutate something concurrently to you holding a reference to it. Code equivalent to that in the article simply won’t compile in Rust. This is, like, the very point of Rust’s borrow system. You can share, xor you can mutate, but not both at the same time. This holds equally for single and multi-threaded code.

ironmagma · on July 12, 2023

In safe Rust, that is. For unsafe Rust, I don't know exactly which bets are off but it's more than none.

masklinn · on July 12, 2023

In unsafe rust this is a concurrent modification of an object with shared references, which is an UB.

SpaghettiCthulu · on July 12, 2023

Unless everyone is just holding pointers

lostmsu · on July 12, 2023

Huh? This is exactly where Rust would help. In Rust the caller of the constructor would either have to add mutex if they needed concurrency, or just use the constructor without mutex overhead if they did not.

ironmagma · on July 12, 2023

It's a compile-time cost instead of a runtime cost.

gizmo686 · on July 12, 2023

99% of the time, the calling code trivially owns the array. If you are in a situation where the compiler cannot figure that out, then you need to deal with it regardless of what String does, because the exact same problem exists by the caller itself having a reference to the object.

j16sdiz · on July 12, 2023

Java compiler have a (not too bad) escape analysis engine. For something as low level as String intern/optimisation, it can be done in compile time

e-dant · on July 12, 2023

What about Rust’s borrow checker (affine types) enforces the use of mutexes (or other sync prims) here?

ekimekim · on July 12, 2023

As sibling comments point out, mutexes aren't needed here. But to answer your direct question, Rust's type system enforces the use of mutexes to access protected values (if you're using the stdlib Mutex implementation) by only allowing access to protected values through a MutexGuard object which is created by locking the mutex. The borrow checker enforces you can't access the MutexGuard concurrently, so therefore you can't access the protected value concurrently.

chowells · on July 12, 2023

Why would it need to? Rust's borrow checker makes it a compile-time error to share a mutable array between threads. No need for run time synchronization.

e-dant · on July 12, 2023

Right, that’s my understanding. But OP and a sibling thread here seem pretty sure about the mutex thing.

I think there’s some nuance, but not in the general case.

Shared memory, lazy statics in async blocks, and asynchronous constructors might have different initialization order mechanics that would require synchronization — but even then, the borrow checker would at least point it out

Sharlin · on July 12, 2023

Without a mutex, you can’t even write code equivalent to that in the article because you cant mutably share as you pointed out. With a mutex you could – and the mutex would prevent data races (but not race conditions in general) – but indeed mutexes are a red herring here (at least in the specific sense of a runtime synchronization primitive).

In Java you can’t synchronize defensively because synchronization requires that everybody who has access to the shared resource cooperates with you. And even if you could, you wouldn’t want to, not in this sort of a case.

In Rust mutexes own the data they protect, and make it impossible for anyone to access the data without locking the mutex first, but again, an API like this would clearly not bother with dealing with mutexes but rather take a normal compile-time-checked borrow.

kaba0 · on July 12, 2023

Unfortunately I can no longer edit my comment, please have a look at my reply: https://news.ycombinator.com/item?id=36690710