Hacker News new | past | comments | ask | show | jobs | submit login

There is no UTF-21. The number after UTF signifies the size of the code unit, which in UTF-32 is 32 bits wide. That code points are only defined up to U+10FFFF is irrelevant for that; the code unit is still 32 bits.



Yeah, I was joking. UTF-32 is a waste of 11 perfectly good bits[1], so you could just as well write each code point as 21 bits (0o0000000–0o7777777), and call it UTF-21. 24 bits (3 × 0x00–0x7F) would be fine too, but 21 is the smallest possible fixed-width Unicode encoding.

[1]: https://github.com/evincarofautumn/protodata




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: