This is an internal representation. JS strings do and continue to behave as sequences of 16-bit integers.
This change takes advantage of the fact that most JS strings fit into an 8-bit charspace, so for those that do, it uses a more compact representation internally.
This optimization is simply: if we have a string and we know that all of the uint16_ts in the string are <= 255, then just store it as a sequence of uint8_ts.
1, 2. Grapheme clusters are not normative in Unicode, they can be tailored for specific languages. There's a default cluster finding algorithm but it's not suitable in all cases. There's no "one size fits all" approach.
3. Forward and backward are likewise language and tailoring dependent because they depend on graphemes. There may also be application-specific tailoring such as the handling of combining marks, in some scripts "forward" and "backward" are not clearly defined.
Be careful what you wish for. Unicode strings are fucking complex. UTF8 double so.
For example which of the four Unicode character normalization interests you most? Or you need grapheme clusters? Or you need code points? Or byte values?
This change takes advantage of the fact that most JS strings fit into an 8-bit charspace, so for those that do, it uses a more compact representation internally.
This optimization is simply: if we have a string and we know that all of the uint16_ts in the string are <= 255, then just store it as a sequence of uint8_ts.