Hacker News new | past | comments | ask | show | jobs | submit login

That's a bad interface that allows you to split strings at useless codepoints and get illegal UTF-16 strings as the result.



It's the historical interface which websites now rely on, changing it would be like writing a libc with strcmp operating on Pascal strings.

In any case, a Javascript String is not actually designed to be UTF-16, it is essentially just an `uint16_t[]`. Even textual strings just store UTF-16 code units, not full UTF-16 data. Relevant snippets from the standard:

The String type is the set of all finite ordered sequences of zero or more 16-bit unsigned integer values ("elements").

When a String contains actual textual data, each element is considered to be a single UTF-16 code unit. [...] All operations on Strings (except as otherwise stated) treat them as sequences of undifferentiated 16-bit unsigned integers; they do not ensure the resulting String is in normalised form, nor do they ensure language-sensitive results.

See also:

- Section 8.4 http://www.ecma-international.org/publications/files/ECMA-ST...

- http://mathiasbynens.be/notes/javascript-encoding


> Although the standard does state that Strings with textual data are supposed to be UTF-16.

No, it doesn't. It states that they're UTF-16 code units, a term defined in Unicode (see D77; essentially an unsigned 16-bit integer), which is not the same as UTF-16. A sequence of 16-bit code units can therefore include lone surrogates, which something encoded in UTF-16 could not.


Oh, yes; I just skimmed 'code unit' bit without actually reading. (I've now removed the misinformation from my previous comment.)


I think JS may be from the time when UCS-2 was all there was and there were only 65535 Unicode characters.


It's needed for compatibility with the Web, unfortunately.


Definitely. Thankfully, ES6 will introduce

    "🍣".codePointAt(0)
and iterators will iterate code points, not code units.

    for (var c of "🍣"){}




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: