> Native UTF-8 in memory makes character indexing a non-constant time operation ...

remexre · on March 22, 2022

UTF-32 isn't really a solution either, unless you consider a scalar value to be a character; I bet almost nobody wants U+0308 to be "a character"...

native_samples · on March 24, 2022

But in practice the Java definition of a character basically always works, because characters that aren't in the BMP are vanishingly rare in real software outside of emoji, and of course, Java long pre-dates emoji.