I didn't think it was a bug either till I got to the "Spooky action at a distance" section. The fact that the following code:
"hello world".startsWith("hello")
can return false in any circumstance, is a bug in the language. The fact that some other code in an entirely unrelated part of the codebase can intern a broken string and thus break string-comparisons for the entire codebase, is mind boggling.
Fortunately, I don't think the fix needs to be too expensive. After a defensive copy of the byte-array is made, just before returning the newly constructed string object, verify once again that the chosen coder matches the string contents. If it doesn't, that would imply a race condition, and the broken string instance should not be returned
> After a defensive copy of the byte-array is made, just before returning the newly constructed string object, verify once again that the chosen coder matches the string contents.
Allocating new String objects might very well be the most frequent memory allocation operation in the JVM. IMO it would be a mistake to do anything to slow this down to protect against this weird instantiation behavior, unless it implies a security vulnerability which can't be fixed by guards at the interning step.
The language maintainers have already decided to slow down string-construction by calling StringUTF16.compress(value) which checks if the input can be converted into LATIN1 and if so, creating a LATIN1 byte-array representation from the ground up.
Compared to this, I wager that the incremental cost is small to verify that the coder matches the content.
Also, depending on the implementation of string-interning and StringUTF16.compress, it's possible that the verification step is only needed for non-LATIN1 strings. Which according to the JEP is only a small minority of strings seen.
If there is a cheaper solution, I'm certainly all for it. I'm just spitballing ideas here. But I don't think it is acceptable to allow "hello world".startsWith("hello") to return false.
A defensive copy probably doesn't help here. The JVM is permitted to optimize it away, since it has no externally observable effects and doesn't have a constructor.
This copy has no external observable effects (it's only temporarily used in the constructor and never stored anywhere), and since the JVM is (under the JMM) permitted work under the assumption the original array doesn't change unless the thread itself changes it or there is some happens-before relationship with something that does, and is permitted to re-order same-thread execution as-if-serial, it's fully permitted to assume the original array is identical to the copied array, and it may elide the copy, since the original array isn't stored or mutated in any way.
Whether the JVM actually does this for arraycopy is an implementation detail. The JVM is permitted to perform this type of escape analysis and elision, and definitely does for some allocations.
In general arrays and concurrency doesn't mix very well in Java, and is something I'd avoid mixing on principle. You avoid most of these types of problems by having threads talk by passing around immutable objects via concurrency primitives, volatile references, synchronized-blocks, etc.
You are incorrect - it is allowed to behave as if the original didn't or did change, that is true and in the memory model. But once the copy is done, then the state is settled, and is no longed allowed to change at any point. There are no more conflicting accesses.
It is "fully permitted to assume the original array is identical to the copied array" only to the extent that affects reading from the original array, not the copy (ie, it can pretend that the original didn't change, even it did).
Anything else would break causality requirements (JLS§17.4.8.).
> In general arrays and concurrency doesn't mix very well in Java, and is something I'd avoid mixing on principle.
Sound sensible. I think what you've said is any semi-malicious code can give you a primitive array, then modify it from under you in another thread. That's not completely surprising but the clincher is, you can't even take a defensive copy.
Well I mean there are things you can do prevent the JVM from this particular optimization. But you need to use the right tools to tell Java to expect spooky action at a distance, and arraycopy just isn't sufficient.
This would probably do
byte[] b;
// A
synchronized (a) {
b = System.arraycopy(a);
}
// B
Here A happens-before B, and you can assume a read fence at the start of synchronized block and a store fence at end of it, so consumers of b should get a consistent view of the array, and the synchronized block tells Java that you expect a to be modified by another thread so it can't assume it stays the same throughout the execution; and elision of the arraycopy isn't possible; so while the array may be in a weird state, it will at least stay consistent throughout the execution.
"hello world".startsWith("hello")
can return false in any circumstance, is a bug in the language. The fact that some other code in an entirely unrelated part of the codebase can intern a broken string and thus break string-comparisons for the entire codebase, is mind boggling.
Fortunately, I don't think the fix needs to be too expensive. After a defensive copy of the byte-array is made, just before returning the newly constructed string object, verify once again that the chosen coder matches the string contents. If it doesn't, that would imply a race condition, and the broken string instance should not be returned