Another way of fixing it would be, whichever character caused 'StringUTF16.compr...

marvy · on July 12, 2023

Clever! I like this! I feel a bit jealous that I didn't think of it :)

It's not quite as easy as it seems, because these methods are "intrinsics". That is, the pure Java code you see is not always the one that gets used; instead, the JIT compiler can use a faster implementation that uses vectorized assembly or whatever. (That's why you see "@IntrinsicCandidate" on compress() and toBytes() in StringUTF16.java)

But I think your idea would also be possible to implement in vectorized assembly, so it still works!

The rule for most things (such as ArrayList) in the JDK is: "if you use race conditions to break this thing, that's your problem, not ours". But String is different: it's one of the few things meant to be "rock-solid, can't break it even if you try", so I think this bug in String would qualify as a potential security issue in their mind: there are many places in Java that trust Strings not to act weird, and some of them are even in native code deep in the guts of the VM.

On the other hand, String is used all over the place so having to introduce a performance regression to fix this bug would be quite painful. All of the other proposed solutions in this thread introduce an extra copy, and an extra pass over the string. Your fix is basically no extra cost, and as a bonus, can be tweaked so each char in the array is read only once.

Which means that the bug can now be fixed "guilt-free", if anyone from the JDK team is reading this thread. Though they have some pretty clever people there too, so they might have thought of it eventually for all I know.