gudzpoz's favorites | Hacker News

Please understand that Han unification is _the_ problem. It is clean that Unicode needs to realize that the Han unification is wrong and accepts what the native writers of those languages think about their scripts.

To make the problem more understandable to the people that are used to alphabetic scripts, suppose that tomorrow an Asian committee starts creating Uniword, a repertoire that maps complete words to numerical IDs. At a certain point they get to "colour".

Uniword committee: Well, that word shares meaning and origin with the other word "color", for which we have already a codepoint, so we will encode them under the same codepoint.

GB, Australia and Canada: Ehi! No! To us those are different words; especially, we do not want Mr. Colours to appear as Mr. Color.

Uniword commitee: No problem, just add some out-of-band information like "nationality" or "<span lang='en-GB'>"

"colour"-people: that will not work, there are so many cases in which this can go wrong. Whenever I copy a field from a DB I also have to extract this extra information?

Uniword: yes, that is the problem? C'mon!

"colour"-people: but do you need to do that in your applications?

Uniword: no, we have one code for every single word in our languages, including codes for very old languages that exist only in two palimpsests.

"colour"-people: and why cannot we have the same level of granularity?

Uniword: because you have too many words!!! And we started we had only 100k available integers.

"colour"-people: and now?

Uniword: now we have 2^32. But, yeah, that is not the point; just do how we suggest. This dialog is getting to long.

"colour"-people: "dialogue", please.