Hacker News new | past | comments | ask | show | jobs | submit login

There is a Unicode encoding "UTF-32" which has the advantage of being fixed width. This is not popular for the obvious reason that even ascii characters are expanded to 4 bytes. Additionally the windows APIs, among other interfaces, are not equipped to handle 4-byte codepages.



Being fixed width is not an advantage. Code points aren't a very useful unit of text outside of the implementation of algorithms defined by unicode. All of these algorithms generally require iteration anyway. O(1) code point indexing is nearly useless.

http://manishearth.github.io/blog/2017/01/14/stop-ascribing-...


It's fixed width with respect to code points, but not with respect to any of the other things mentioned in the linked article. For example, the black heart with emoji variation selector (which makes it render red) is two code points.


> "UTF-32" which has the advantage of being fixed width

It's fixed width for now. It can not hold all the current available code-points, so it will probably have the same fate as UTF-16 (but it will probably take a long time).

I'd stay away from it.


There are currently 17 × 65536 code points (U+0000..U+10FFFF) in Unicode. UTF-32 could theoretically encode up to a hypothetical U+FFFFFFFF and still be fixed-width.

Note that, at present, only 4 of the 17 planes have defined characters (Planes 0, 1, 2, and 14), two are reserved for private use (15 and 16), and an additional is unused but is thought to be needed (Plane 3, the TIP for historic Chinese script predecessors). Four planes appear to be sufficient to support every script ever written on Earth, as it's doubtful there are unidentified scripts with an ideographic repertoire as massive as the Unified CJK ideographs database.

We are very unlikely to ever fill up the current space of Unicode, let alone the plausible maximum space permissible by UTF-8, let alone the plausible maximum space permissible by UTF-32.


The bummer is when you want to create a font that supports all the characters. Ugh. Talk about alot of work.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: