String classes rarely use UTF-16 because it doesn't have fixed length code point... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

ceronman on July 22, 2014 | parent | context | favorite | on: Slimmer and faster JavaScript strings in Firefox

String classes rarely use UTF-16 because it doesn't have fixed length code point representation. UCS-2 is often used instead, which uses two bytes to represent all the unicode points in the Basic Multilingual Plane (BMP), which is enough for 99.99% of the use cases.

One example of this is Python, which used UCS-2 until version 3.3. There was a compile time option to use UCS-4, but UCS-2 was enough for most cases because the BMP contains all the characters of all the languages currently in use.

chadzawistowski on July 22, 2014 [–]

Which encoding does Python use now?

ceronman on July 22, 2014 | [–]

PEP 393 introduced flexible string representation which can use 1, 2 or 4 bytes depending on the type of the string: http://legacy.python.org/dev/peps/pep-0393/

Join us for AI Startup School this June 16-17 in San Francisco!
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact