That's true, but in the case of Strings in particular they are generally considered to be thread-safe by virtue of being immutable (and the Javadocs themselves say this in many places). Concurrently modifying the input character array may seem like willful abuse in this case, but I suppose there might be some carelessly written code out there which does it and the post shows how it would create some weird and very hard-to-debug behavior. And I can see the argument going both ways as far as the Javadoc: on the one hand, it only promises to return a String representing what is currently in the input character array, which seems to warn against concurrent/subsequent modification at least obliquely. But on the other hand, it does seem to commit to an up-front defensive copy of the input character array which doesn't quite take place, and without which things go a bit haywire if the encoding of the string changes.
I don't think the correct answer is to sacrifice any performance to guard against this weird edge case, though.
As the article points out, the only thing the Javadoc guarantees is the subsequent modifications of the character array have no effect. It says nothing about concurrent modifications.
The type whose thread safety is in question here is not actually String, but char[]. I'm not going to say it is always wrong to share char[] between threads (as a primitive array of something other than double and long, the Java Spec does make some guarentees about char[]), however it is almost always wrong to be sharing char[] between threads.
If you want a language that protects you from this type of mistake, you simply need something more advanced than Java.
This isn't even the biggest hole in Java's safety. Java has a type system, which is at least a nominal claim to type safety. Yet, you can do things like:
Integer[] foo = new Integer[1];
Object[] bar = foo;
bar[0] = new Object()
And the compiler will let you (although the runtime won't).
Its even worse with generics:
List<Integer> foo = new ArrayList<>();
List bar = foo;
bar.add(new Object());
System.out.println(foo.get(0));
Integer x = foo.get(0);
Will compile, and won't even through an exception until the very last line, past the point where you have retrieved and used a non Integer from a List<Integer>
In fairness to java on the last one, it is at least a warning, and a deliberate compromise to support backwards compatability when they introduced generics.
While you are technically right in all your points, I think you exaggerate the problems regarding Java’s type system. Arrays are indeed a pain point from different times (the mistake of their covariance even “infected” C#), but generics are not problematic at all. In fact, code that compiles without warning is guaranteed to never get a ClassCastException.
Every language’s type system gives you escape hatches, so that last example is similar to how one might implement an optimization library in Rust where you have uninitiated elements. Hell, there you don’t even get runtime errors if you get it wrong, it will just segfault. So I really disagree with this notion of “it even worse with generics”.
> how one might implement an optimization library in Rust where you have uninitiated elements
The correct way to do this is use the MaybeUninit<T> type. Then you're responsible for correctly initializing a T before you call MaybeUninit<T>::assume_init() to get a T.
And that MaybeUninit type uses an unsafe call under the hood as an escape hatch from the "draconian" compiler's rules. Every practical language has these.
(nonetheless, thanks for mentioning the MaybeUninit, that was what I was thinking of, but didn't remember the name -- I haven't programmed in Rust for a long time)
MaybeUninit<T>::assume_init() is the unsafe call. The promise that this is actually a T and not just a T-sized blob of uninitialized memory is made by that unsafe call, and since it's unsafe it's your responsibility as the programmer to obey the constraint, that this is, in fact, an initialized T.
Depending on how long "a long time" is, you might well not actually have been thinking of MaybeUninit. The prior solution (until mid-2019) was as you describe, and it was in fact unsound.
The point I was trying to convey (badly) with this example is that in both the assign to raw generic and assume_init case, the developer overrides the compiler's knowledge for his/her, demonstrating that most languages have similar escape hatches, because they are sometimes simply needed.
Yes, what you've described is usually unsound†, which is why it's deprecated (but it's offered in the standard library and so Rust policy is not to remove it since there are at least in theory sound ways it could have been used, shame to break them).
MaybeUninit<T> doesn't override the compiler's knowledge. It's using one very, very clever trick. Take a look at how MaybeUninit is defined. It's a union! The compiler can see this might be a T, but it can't ever see whether it's actually a T right now. As a result all the mis-optimisations which occur with the unsound approaches can't happen. MaybeUninit<T>::assume_init() is just a union read, which, sure enough, is an unsafe operation in Rust (writing to a union is safe, but reading from one is not) and that's why the function is unsafe.
† This is technically sound for Zero Size Types, because then we're basically saying e.g. "You see nothing? Well, I promise it's actually no Spoons, like, an array of 0 Spoons". So that can't cause the compiler to emit code that we didn't expect - the Rust compiler isn't going to emit code to read or write no Spoons, whether or not they "exist" according to type analysis.
I don't think this is a fair attack on Java's type system. In both examples you do what is effectively casting to void pointer - escape type system guarantees.
The thread-safety guarantees can only be obeyed if the arguments are thread-safe as well.
For example, take a java.util.concurrent.ConcurrentHashMap, which is thread-safe. If your value object is mutated concurrently, and is not thread-safe, you can't expect anything sensible to happen if you call containsValue().
Likewise, String's char[] constructor can only be thread safe if there are no concurrent changes to the argument.
> The issue is that the ctor checks for a property (all chars are under 256) then assumes that holds indefinitely.
You could say that this is a classical TOCTOU bug as the ctor assumes that a mutable argument does not mutate.
The core question[s] here are whether this assumption is and should be in the contract.
Protection against these kinds of bugs is only possible with deep copies of mutable arguments or COW semantics. So you either implement callee defensively and slow down general case (remember, this is stdlib), or you leave concurrency control to the caller. Probably every sane general purpose language does the latter.
> You could say that this is a classical TOCTOU bug as the ctor assumes that a mutable argument does not mutate.
Indeed.
> Protection against these kinds of bugs is only possible with deep copies of mutable arguments or COW semantics.
In the general case yes, but in this case no, you can solve the issue and improve performances by implementing the thing correctly, in a single pass over the input. This way you don’t get a split-brain situation.
I don't think the correct answer is to sacrifice any performance to guard against this weird edge case, though.