I didn't think it was a bug either till I got to the "Spooky action at a distanc...

cempaka · on July 12, 2023

> After a defensive copy of the byte-array is made, just before returning the newly constructed string object, verify once again that the chosen coder matches the string contents.

Allocating new String objects might very well be the most frequent memory allocation operation in the JVM. IMO it would be a mistake to do anything to slow this down to protect against this weird instantiation behavior, unless it implies a security vulnerability which can't be fixed by guards at the interning step.

whack · on July 12, 2023

The language maintainers have already decided to slow down string-construction by calling StringUTF16.compress(value) which checks if the input can be converted into LATIN1 and if so, creating a LATIN1 byte-array representation from the ground up.

Compared to this, I wager that the incremental cost is small to verify that the coder matches the content.

Also, depending on the implementation of string-interning and StringUTF16.compress, it's possible that the verification step is only needed for non-LATIN1 strings. Which according to the JEP is only a small minority of strings seen.

If there is a cheaper solution, I'm certainly all for it. I'm just spitballing ideas here. But I don't think it is acceptable to allow "hello world".startsWith("hello") to return false.

masklinn · on July 12, 2023

I don’t think you need any defensive copy, just to have compress switch to utf16 mode when it encounters a non-latin1 char and go on from there.

It would likely be faster since currently the implementation restarts the entire conversion from 0 in utf16 mode.

Genbox · on July 12, 2023

Making the 99% case faster at the cost of making the 1% case slower, is a speedup.

marginalia_nu · on July 12, 2023

A defensive copy probably doesn't help here. The JVM is permitted to optimize it away, since it has no externally observable effects and doesn't have a constructor.

orra · on July 12, 2023

Wait, really? It's very externally observable in the presence of multithreading. Surely there must be a way to say "no, really, please make a copy".

dtech · on July 12, 2023

afaik the person you replied to is wrong, using System.arrayCopy would definitely make a copy and not be optimized away.

Maybe poster meant only copying the reference with `char[] copy = original` which indeed does nothing, but that's not a defensive copy.

marginalia_nu · on July 12, 2023

This copy has no external observable effects (it's only temporarily used in the constructor and never stored anywhere), and since the JVM is (under the JMM) permitted work under the assumption the original array doesn't change unless the thread itself changes it or there is some happens-before relationship with something that does, and is permitted to re-order same-thread execution as-if-serial, it's fully permitted to assume the original array is identical to the copied array, and it may elide the copy, since the original array isn't stored or mutated in any way.

Whether the JVM actually does this for arraycopy is an implementation detail. The JVM is permitted to perform this type of escape analysis and elision, and definitely does for some allocations.

In general arrays and concurrency doesn't mix very well in Java, and is something I'd avoid mixing on principle. You avoid most of these types of problems by having threads talk by passing around immutable objects via concurrency primitives, volatile references, synchronized-blocks, etc.

guipsp · on July 14, 2023

You are incorrect - it is allowed to behave as if the original didn't or did change, that is true and in the memory model. But once the copy is done, then the state is settled, and is no longed allowed to change at any point. There are no more conflicting accesses.

It is "fully permitted to assume the original array is identical to the copied array" only to the extent that affects reading from the original array, not the copy (ie, it can pretend that the original didn't change, even it did).

Anything else would break causality requirements (JLS§17.4.8.).

orra · on July 12, 2023

> In general arrays and concurrency doesn't mix very well in Java, and is something I'd avoid mixing on principle.

Sound sensible. I think what you've said is any semi-malicious code can give you a primitive array, then modify it from under you in another thread. That's not completely surprising but the clincher is, you can't even take a defensive copy.

marginalia_nu · on July 12, 2023

Well I mean there are things you can do prevent the JVM from this particular optimization. But you need to use the right tools to tell Java to expect spooky action at a distance, and arraycopy just isn't sufficient.

This would probably do

  byte[] b;

  // A
  synchronized (a) { 
    b = System.arraycopy(a);
  }
  // B

Here A happens-before B, and you can assume a read fence at the start of synchronized block and a store fence at end of it, so consumers of b should get a consistent view of the array, and the synchronized block tells Java that you expect a to be modified by another thread so it can't assume it stays the same throughout the execution; and elision of the arraycopy isn't possible; so while the array may be in a weird state, it will at least stay consistent throughout the execution.