Hacker News new | past | comments | ask | show | jobs | submit login

This is an internal representation. JS strings do and continue to behave as sequences of 16-bit integers.

This change takes advantage of the fact that most JS strings fit into an 8-bit charspace, so for those that do, it uses a more compact representation internally.

This optimization is simply: if we have a string and we know that all of the uint16_ts in the string are <= 255, then just store it as a sequence of uint8_ts.




ES6.1 wishlist: UTF8 strings, full stop.


I want Unicode strings that support

1. Opaque cursors pointing somewhere in the conceptual sequence of code points, with constant-time dereferencing,

2. Ranges, defined by starting and ending cursors, and

3. The ability to move cursors forward or backward by either code points or composed grapheme clusters.

This would be a saner interface than any other I've seen, and it puts very few constraints on the underlying encoding.


1, 2. Grapheme clusters are not normative in Unicode, they can be tailored for specific languages. There's a default cluster finding algorithm but it's not suitable in all cases. There's no "one size fits all" approach.

3. Forward and backward are likewise language and tailoring dependent because they depend on graphemes. There may also be application-specific tailoring such as the handling of combining marks, in some scripts "forward" and "backward" are not clearly defined.


That's great stuff...that should be done after standardizing on UTF8.


Be careful what you wish for. Unicode strings are fucking complex. UTF8 double so.

For example which of the four Unicode character normalization interests you most? Or you need grapheme clusters? Or you need code points? Or byte values?


You already have UTF16 which is both complex and inefficient (because of two byte representations of Latin characters)


Those are general Unicode issues, not UTF-8 issues.


I don't want UTF32, UCS-2, UTF16, or endian issues--that much I know for sure. :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: