Hacker News new | past | comments | ask | show | jobs | submit login

They are the same character, though. They do not use the same glyph in different language contexts, but Unicode is a character encoding, not a font standard.





They're not. Readers native in one version can't read the other, and there are more than handful that got duplicated in multiple forms, so they're just not same, just similar.

You know, obvious presumption underlying Han Unification is that CJK languages must have a continuous dialect continuums, like villagers living in the middle of East China Sea between Shanghai and Nagasaki and Gwangju would speak half-Chinese-Japanese-Korean, and technical distinction only exist because of rivalry or something.

Alas, people don't really erect a house on the surface of an ocean, and CJK languages are each complete isolates with no known shared ancestries, so "it's gotta be all the same" thinking really don't work.

I know it's not very intuitive to think that Chinese and Japanese has ZERO syntactic similarity or mutual intelligibility despite relatively tiny mental shares they occupy, but it's just how things are.


You're making the same mistake: the languages are different, but the script is the same (or trivially derived from the Han script). The Ideographic Research Group was well aware of this, having consisted of native speakers of the languages in question.

That's not "mistake", that's the reality. They don't exchange, and they're not the same. "Same or trivially derived" is just a completely false statement that solely exist to justify Han Unification, or maybe something that made sense in the 80s, it doesn't make literal sense.

> "Same or trivially derived" is just a completely false statement

You'd have to ignore a lot of reality to believe this. It's even in the names of the writing systems: Kanji, Hanja, Chữ Hán. Of course they don't exchange, because they don't carry the same meaning, just as the word "chat" means completely different things in French and English. But it is literally the same script, albeit with numerous stylistic differences and simplified forms.


CJK native speakers can't read or write other "trivially derived" versions of Hanzi. I don't understand why this has to be reiterated ad infinitum.

We can't actually read Simplified Chinese as a native Japanese just like French speakers can't exactly read Cyrillic, only recognize some of it. Therefore those are different alphabet sets. Simple as that.

The "trivially derived different styles" justification assumes that to be false, that native users of all 3 major styles of Hanzi can write, at least read, the other two styles without issues. That is not true.

Итъс а реал проблем то бе cонстантлй пресентед wитҳ чараcтерс тҳат И жуст cанът реад он тҳе гроунд тҳат тҳейъре "саме".

I hope you don't get offended by the line before this, because that's "same" latin, isn't it?


Yes, but the same is true for overlapping characters in Cyrillic and Latin. A and А are the same glyph, so are т,к,і and t,k,i and you can even see the difference between some of those.

The duplication there is mostly to remain compatible or trivially transformable with existing encodings. Ironically, the two versions of your example "A" do look different on my device (Android), with a slightly lower x-height for the Cyrillic version.

The irony is you calling it irony. CJK "the same or trivially derived" characters are nowhere close to that yet given same code points. CJK unified ideographs is just broken.

So when are we getting UniPhoenician?

This is a bullshit argument that never gets applied to any other live language. The characters are different, people who actually use them in daily life recognise them as conveying different things. If a thumbs up with a different skin tone is a different character then a different pattern of lines is definitely a different character.

> If a thumbs up with a different skin tone is a different character

Is it? The skin tone modifier is serving the same purpose as a variant selector for the CJK codepoint would be.


The underlying implementation mechanism is not the issue. If unicode had actual support for Japanese characters so that when one e.g. converted text from Shift-JIS (in the default, supported way) one could be confident that one's characters would not change into different characters, I wouldn't be complaining, whether the implementation mechanism involved variant selectors or otherwise.

Okay, that's fair. The support for the selectors is very half-assed and there's no other good mechanism.

It doesn't matter to me what bullshit semantics theoretical excuse there is, for practical purposes it means that UTF-8 is insufficient for displaying any human language, especially if you want chinese and japanese in the same document/context without switching fonts (like, say, a website)



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: