Hacker News new | past | comments | ask | show | jobs | submit login

>Fewer than a quarter of the characters it contains are now in common use

12K characters in common use is equally impressing for me as a non-Asian.




It's actually way fewer than that IRL. Japan's official list of commonly used Kanji only has 2136 characters. Taiwan's list has 4808, and the PRC's list has 3500 "frequent" characters with another 3000 supplementary "common" ones. Digitization has made it even easier to use these characters without recognizing the actual form or how to write them.


The 常用漢字 (Japanese Common Use Kanji) list does not include many kanji that native speakers can read and newspapers don't always follow the rule that they only should use characters from the list. In addition, you need to include the 人名用漢字 (Personal Name Use Kanji) in the list because basically all of those characters are also used in fairly common words.

Native speakers can probably recognise at least 3-4k kanji if not more but can probably only write around 2k from memory, depending on how well-read they are.

嘘 (lie) is the best example of an incredibly common word whose kanji form (which is used fairly often) is not in any official government list.


If you look at a frequency list of Chinese characters,[0] the top 4800 characters make up about 99.9% of modern texts.

That means that if you know 4800 characters, and you read a text that is 1000 characters (equivalent to around 700 words) long, there's likely one character you won't recognize.

The funny thing is, if you recognize only the top six characters, you already know 10% of the characters in a typical text. The distribution is very top-heavy, but with a long tail that you do have to learn to become literate.

0. https://lingua.mtsu.edu/chinese-computing/statistics/char/li...


A now vanished Chinese restaurant near us was named in English 'The Good Earth', but in Chinese even I with near zero knowledge could read 'Three Big <somethings>'; never found out what that last character was and couldn't imagine what would make sense in context either!


大三元. It's a Mahjong reference.[0]

By the way, those are the 17th, 125th and 370th most common characters in modern written Chinese.

0. https://zh.m.wikipedia.org/zh/%E5%A4%A7%E4%B8%89%E5%85%83


More like 12k characters currently in use at all. Common use characters are a much smaller set than that. (3k or so?)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: