This is an internal representation. JS strings do and continue to behave as sequ...

angersock · on July 21, 2014

ES6.1 wishlist: UTF8 strings, full stop.

pjscott · on July 22, 2014

I want Unicode strings that support

1. Opaque cursors pointing somewhere in the conceptual sequence of code points, with constant-time dereferencing,

2. Ranges, defined by starting and ending cursors, and

3. The ability to move cursors forward or backward by either code points or composed grapheme clusters.

This would be a saner interface than any other I've seen, and it puts very few constraints on the underlying encoding.

jahewson · on July 22, 2014

1, 2. Grapheme clusters are not normative in Unicode, they can be tailored for specific languages. There's a default cluster finding algorithm but it's not suitable in all cases. There's no "one size fits all" approach.

3. Forward and backward are likewise language and tailoring dependent because they depend on graphemes. There may also be application-specific tailoring such as the handling of combining marks, in some scripts "forward" and "backward" are not clearly defined.

angersock · on July 22, 2014

That's great stuff...that should be done after standardizing on UTF8.

Ygg2 · on July 21, 2014

Be careful what you wish for. Unicode strings are fucking complex. UTF8 double so.

For example which of the four Unicode character normalization interests you most? Or you need grapheme clusters? Or you need code points? Or byte values?

iopq · on July 22, 2014

You already have UTF16 which is both complex and inefficient (because of two byte representations of Latin characters)

kzrdude · on July 21, 2014

Those are general Unicode issues, not UTF-8 issues.

angersock · on July 21, 2014

I don't want UTF32, UCS-2, UTF16, or endian issues--that much I know for sure. :)