Hacker News new | past | comments | ask | show | jobs | submit login

String classes rarely use UTF-16 because it doesn't have fixed length code point representation. UCS-2 is often used instead, which uses two bytes to represent all the unicode points in the Basic Multilingual Plane (BMP), which is enough for 99.99% of the use cases.

One example of this is Python, which used UCS-2 until version 3.3. There was a compile time option to use UCS-4, but UCS-2 was enough for most cases because the BMP contains all the characters of all the languages currently in use.




Which encoding does Python use now?


PEP 393 introduced flexible string representation which can use 1, 2 or 4 bytes depending on the type of the string: http://legacy.python.org/dev/peps/pep-0393/




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: