Hacker News new | past | comments | ask | show | jobs | submit login
Four Types of Kanji (2019) (learnjapanesebest.wordpress.com)
101 points by sova on Dec 3, 2020 | hide | past | favorite | 80 comments



From my experience with Chinese, the classification in the article follows the established convention but seems to use non-standard terminology for each category. Arguably, it's much more straightforward to categorize the characters as:

1. Pictograms: 人 'person', 木 'tree', 日 'sun, day', 月 'moon, month'

2. (Simple) Ideograms: 上 'up', 三 'three'

3. Compound Ideograms, or Meaning-Meaning Compounds: 休 'rest' (人 'person' next to a 木 'tree'), 明 'bright' (日 'sun' and 月 'moon') - note that the pronunciation of the compound character is unrelated to either component

4. Phonetic-Meaning Compounds: 晴 /qíng/ 'clear (weather)' (日 'sun' as a meaning cue, and 青 /qīng/ as a pronunciation cue) - the vast majority of Chinese characters belong in this last category

More categories can be (and historically were) distinguished, such as when an additional component was added to a character to clarify its meaning, even though the original character already had that meaning, but that primary meaning became overshadowed by a secondary meaning it acquired later; however such distinctions are mostly useful for understanding the evolution of Chinese characters rather than the way they currently are.

Of course the above classification, even with only the basic four categories, is still largely subjective, for example 娶 /qǔ/ 'to marry (a woman)' can be thought of either as a meaning-meaning compound (取 'to take' a 女 'woman'), or as a phonetic-meaning compound with the phonetic component 取 'to take' pronounced /qǔ/ as well.

It also has to be said the pronunciation cue is often not very helpful anymore, as the sounds in Mandarin Chinese have shifted, and guessing the pronunciation from similar characters can end up being hilariously wrong. Besides, sometimes it is not even clear which component of a character is supposed to be the phonetic cue. Now I don't know about Japanese but I imagine these pronunciation cues must be even less helpful than in Chinese.


Now I don't know about Japanese but I imagine these pronunciation cues must be even less helpful than in Chinese.

You'd be right about that. Kanji were also imported over time, not at once. So Japanese bases its onyomi, the "Chinese" reading of Kanji as opposed to kunyomi/Japanaese reading, on the Chinese language in different centuries. Oftentimes Chinese people will complain that the Japanese should just use the same reading as the Chinese, when actually the Chinese aren't using it anymore themselves.


Chinese also has the additional problem that character simplification in the PRC broke quite a few pronunciation cues. With 廣, you can correctly guess it'll sound similar to 黄. Simplified to 广, you have no idea.


It's a wash. It broke some and restored others, often by following variant characters already in use, e.g. 闆, which has no pronunciation similarity to 品, became 板 (following cues of e.g. 版).

Chinese hasn't had consistent phonetic components for a long long long time.


闆 doeshas no pronunciation similarity to 品,in ancient Chinese, you can find that with Cantonese which is close to the official languages in some ancient times.


That's Simplified Chinese

And I recommend people willing to learn Chinese characters to learn Traditional Chinese to be less confused and easier to remember on the long run.


The Chinese I have asked who learnt Simplified Chinese characters in school attest to being able to read Traditional Chinese characters with very little additional effort; or even just on the spot (say, when shopping or dining). Those who learnt Traditional Chinese characters seem to have a harder time going in the other direction. (No studies to cite; I have asked a good 30+ people at work, social circles, etc. over the years.)


Even if you completely obstruct some characters from view, people will still be able to figure out the meaning from context but that doesn't necessarily mean they learned the other variant. And on top of that, they might be reluctant to admit it.

If you really wanted to know, you'd have to show them the characters in isolation. Try for example 發. I bet it's not going to be so straightforward.

In any case, most people don't really get to have the choice but if you're planning to learn both, going with traditional characters first is easier because the simplification is a many-to-one mapping: if you've only known simplified, you won't be aware there are supposed to be different characters in place of the same one you're familiar with. In other words, you don't only have to learn something new but also unlearn what you know and got used to, and the latter part is always more difficult.

For instance (and to stay within the same example), both 發 and 髮, unrelated to each other, were simplified to the same 发. A person who only knows simplified is likely to incorrectly write *頭發 for 'hair' (instead of the correct 頭髮), and there are 2.5M Google hits for this keyword (to be fair, many probably due to the automated conversion, and some might be an intentional pun).


> the simplification is a many-to-one mapping

Simplification is a many-to-many mapping. There are many characters that were either merged in the process of standardizing traditional Chinese characters or have multiple meanings which are split by standard simplified Chinese.

Here's a very small sample of a bunch of Traditional -> Simplified examples.

乾 -> 乾 (qian2)、干 (gan1) (干 is itself a merger of multiple traditional characters)

夥 -> 夥 (huo3)、伙 (huo3)

兒 -> 儿 (er2)、兒 (ni2)

祇 -> 祇 (qi2)、只 (zhi3) (只 is also itself a merger of multiple traditional characters)

> Try for example 發. I bet it's not going to be so straightforward.

I'd be pretty shocked if any literate Chinese speaker in the PRC did not recognize that character.

I think in general you're overestimating the difficulty of learning traditional characters when educated in a system using solely simplified characters.

PRC undergraduate courses will sometimes just flat-out assume knowledge of traditional characters. Indeed I have a textbook from a history of Chinese medicine class that's entirely in traditional characters. The professor (not unreasonably) just assumed we all already knew traditional characters or would pick it up in a week or two. All subsequent course material (e.g. test materials) was also in traditional characters, although we were allowed to write in simplified characters (and I think everyone in the class did so).

I'm sure there's some corner cases you could use to stump people, but PRC kids consume quite a lot of traditional Chinese material even if they're not formally taught it in school. It's more mentally taxing than just reading simplified characters, but it's still fairly straightforward.

In general it's pretty easy to pick up either character set if you're a fluent reader in the other, at least for reading. Learning how to handwrite both sets is trickier and probably requires dedicated training, but even then only on the order of weeks or at most months.

> (to be fair, many probably due to the automated conversion, and some might be an intentional pun)

I'm fairly certain the overwhelming majority are due to automated conversions.


> Simplification is a many-to-many mapping.

Technically you are correct but this is just pedantry. Of the 4 examples you listed, 3 are corner cases, and 1 is also a variant traditional character. You know way too much not to realize this (by the way, that's a compliment!), and that you couldn't come up with anything reasonably common like the 發/髮 → 发 pair (because there isn't anything comparable going the other way) indirectly proves the point I was making.

Unless at a very advanced level, a person switching from traditional to simplified can simply afford to ignore any of these relatively rarely-encountered one-to-many mappings, whereas the same cannot be said about switching from simplified to traditional. It is then one extra thing to study if going one direction but not the other. That might be good to know for somebody who is actually making that decision, although in reality it's probably moot, since people will learn either one or the other first depending on their individual circumstances.

> I'd be pretty shocked if any literate Chinese speaker in the PRC did not recognize that character.

That might very well be true but is beside the point.

The claim in the parent post was that somebody who only ever learned simplified characters would be able to read traditional characters "on the spot" having never seen them before, to which I provided this counterexample of a commonly-encountered traditional character that will not be recognized without context, unless the person is already familiar with it.

All I'm saying is nobody is likely to figure out this character seeing it for the first time without context. (For the record, this applies both ways, just the example would have to be different.)

> I'm fairly certain the overwhelming majority are due to automated conversions.

But in that case it also says something that the software didn't handle the conversion well (implying that perhaps whoever wrote it didn't know about it), and that people posting this stuff also didn't correct the obvious mistake (suggesting they didn't know either).

*

You wrote a lot in response, and I generally don't disagree, I just don't see how most of it relates to my post. For the record, I'm not claiming it's difficult or easy going one way or another. In fact I was only pointing out the following 3 things:

1. If you only learned one set (whether simplified or traditional), you don't know the other automatically.

2. If people can read characters in context, they are likely just guessing - nothing wrong about it by the way, that's how people read in any language - but it doesn't mean they would really "know" the same characters in isolation.

3. Some common, unrelated traditional characters were merged to become the same simplified character. This is an extra layer of difficulty if you don't already know which these are (as you would if you learned the traditional characters first). (I concede it's also the case the other way, but in my opinion none of it is common enough to worry about at this level of granularity, so the burden is not comparable.)

I'm getting the impression you're looking for something to disagree with among the things I wrote, and I'm sure you'll be able to find it but I'd rather what I wrote be helpful even at the cost of having to simplify things a bit, and thus not being rigorously correct.

In other words, I'm trying to describe the forest, not the individual trees. My audience are the people who are looking at the forest from a distance: if someone is already in the forest, they either know what's inside or can look around on their own. That the forest insiders won't benefit from my description is to be expected. I hope you can recognize that.


Oof well that's rough.

> that you couldn't come up with anything reasonably common like the 發/髮 → 发 pair (because there isn't anything comparable going the other way) indirectly proves the point I was making.

Nah I just wasn't thinking straight.

著 -> 著(著作)、着(说着) (this is a common automated translation error from traditional to simplified, search for e.g. "说著")

覆 -> 覆(覆盖)、复(答复) (复 then aggregates other traditional characters)

帳 -> 帐(帐号)、账(账单) (technically 賬 exists but if we're excluding 祇 as a common but non-standard variant, the same holds true for 賬)

and probably some more I'm missing again.

Though the number of common one-to-many mappings in either direction is quite small (I'd guess maybe 10ish clusters), which brings me to my second point.

> although in reality it's probably moot, since people will learn either one or the other first depending on their individual circumstances.

It's moot because it is trivial to go from one to the other (for reading! It's harder for writing) regardless of direction. You give me a native speaker who has only ever seen one character set in their whole life (which is actually quite rare) and I can have them reading slowly in the other character set by the end of the day, and fluently by the end of a week. There's a set of rules for translating between radicals and then maybe in the ballpark of 100 exceptions (in either direction) on top of that and that gets you effectively full fluency.

> But in that case it also says something that the software didn't handle the conversion well (implying that perhaps whoever wrote it didn't know about it), and that people posting this stuff also didn't correct the obvious mistake (suggesting they didn't know either).

The automated translation error is usually an artifact of people being lazy and not checking the translation (see e.g. my 著 example). The fact the software doesn't get it is because differentiating the different cases is a hard NLP problem but an easy human-solvable problem so the investment isn't usually made. But probably part of it too is that this is starting to get into writing in a different character set, which is harder than just reading it.

But to get back up out of the weeds, I was mainly reacting to support mrslave's assertion that

> The Chinese I have asked who learnt Simplified Chinese characters in school attest to being able to read Traditional Chinese characters with very little additional effort

which is true and holds up even if you strip context as you mention.

> If you really wanted to know, you'd have to show them the characters in isolation. Try for example 發. I bet it's not going to be so straightforward.

I've actually played this game with friends (all PRC speakers) once before, i.e. quizzing each other on individual traditional characters. There was only one that threw even one of us for a loop (叢) and would've been obvious given context. Everyone got everything else out of maybe 30 irregular characters (i.e. don't just follow standard simplified radicals).

I think "on the spot" was supposed to mean "without having seen the character before but seeing its context," I don't think mrslave meant literally ab initio but I dunno at that point I'm putting words in mrslave's mouth.

While I agree with your three points in the second half of your reply, I don't think that was the initial thrust of your parent comment.

I interpreted your initial comment to mean that switching from simplified characters to traditional characters is both non-trivial and harder than going from traditional to simplified. Moreover there was the added implication that PRC speakers don't generally know individual traditional characters, but are only able to guess at them when given additional contextual clues. Perhaps I'm completely distorting your claim, but it is regardless a common claim I see on English language forums, so I don't think I'm pedantically attacking a strawman.

I disagree with those points.

It is trivial (relative to most other things language-learning-related) to go from simplified characters to traditional characters (and indeed to go the opposite direction) and it is not appreciably more difficult in either direction. Moreover I think you're drastically underestimating the proficiency of the average literate PRC speaker in understanding traditional characters. Keep in mind there's still quite a lot of exposure to traditional characters despite its absence in the educational system. Signs, foreign film subtitles, decorations, trademarks, etc. traditional characters pop up all over. There's also a ton of cultural imports from Taiwan that preserve their traditional characters. Because it's so trivial, this passive osmosis is enough to attain essentially full fluency in reading traditional characters.

I have such a strong reaction to this not because of insider nit-picking, because I've seen so many examples of Chinese language learners hemming and hawing over whether they should go for traditional characters or simplified characters when there's effectively no difference. Fluency in one makes fluency in the other trivial, at least if you don't care about being able to handwrite both. Choose whichever has the better teachers/resources/etc. or just flip a coin. There's no advantage to one over the other. Barring your environment, the only time I could find even a shred of advantage to choose traditional characters over simplified characters is if you're exclusively studying Classical Chinese and not any modern variety. Even then the advantage is far far less than a lot of people proclaim.


Your examples are excellent in pointing out the many to many mapping during the simplification process. Many people assume chinese characters are somehow "static" and that the modern traditional characters are not aggregately mapped from older characters. Your last point is also very important as the top questions I've come across language learning forums is traditional vs simplified. This seemingly trivial issue to people fluent in Chinese often get exaggerated and politicized to the detriment of Chinese learners. The best analogy I can think of between traditional vs simplified is akin to American English vs British English, except English learners don't make a big fuss about which version they should learn first.


> The best analogy I can think of between traditional vs simplified is akin to American English vs British English, except English learners don't make a big fuss about which version they should learn first.

There isn't really much of an analogy here, and on a general note, trying to approach a new problem by seeking an analogy to something we already know isn't really the best way to gain insight: simply put, not everything we are about to see is going to be like something we have already seen, or even close to it.

With that in mind, if you really wanted to compare this to the varieties of English, I can offer this analogy:

The word "curb" in American English has a number of meanings related to 'restraint', or it can also mean 'a raised edge of pavement'.

However, in British English, when used to mean the latter, the word is spelled "kerb."

So, if you are familiar with British English and want to learn American English, all you have to do is remember to always spell it "curb" and never "kerb."

However, if you are going the other way, you have to remember the new spelling of "kerb," but also only to use it with a certain meaning, and not the others.

In this example, going from British to American English is easier than the other way round, just as going from traditional to simplified characters is easier than the opposite.

Yet another example could relate to the distinction between "shall" and "will," which some people make, and many (if not most) don't. If you are used to making the distinction, and want to stop (go "simplified"), it is effortless: all you have to do is to start using "will" all the time. However, going the other way you'd have to learn the rules of using it, or face the risk of drowning:

(a) "I shall drown, no one will save me!"

(b) "I will drown, no one shall save me!"

Which one do you choose to be rescued?

Answer here: https://en.wikipedia.org/wiki/Shall_and_will#Uses_of_shall_a...


> There isn't really much of an analogy here, and on a general note, trying to approach a new problem by seeking an analogy to something we already know isn't really the best way to gain insight: simply put, not everything we are about to see is going to be like something we have already seen, or even close to it.

My analogy isn't meant for people fluent in both Chinese and English that understand the nuances between variants of languages. The analogy is much more informative for people deciding to learn Chinese than your any of your 發/髮 → 发 examples and analysis. Clearly there are distinctions between traditional Chinese and simplified Chinese, your earlier posts contain many valid examples and analysis which are suitable towards a Chinese etymology audience rather than the general English speaking audience who are curious about Chinese. FWIW I'd teach my kids traditional Chinese, but overexaggerating the difference between the two variants is not helpful for Chinese learners.


> overexaggerating the difference between the two variants is not helpful for Chinese learners

I wholeheartedly agree. In the grand scheme of learning Chinese, this is a relatively minor consideration. It wasn't my intention to play it up (and in fact I don't think I did), it's just how the discussion progressed from the original comment.


> I don't think I'm pedantically attacking a strawman.

Unfortunately I think you are. Note I never wrote anything about the "PRC" that you keep bringing up all the time. It's almost as if you were here to defend the feelings of "PRC speakers" from the perceived slight that I failed to give them enough recognition.

I stand by my opinion that it's easier to switch from traditional to simplified than the other way round. This isn't really controversial, it's well-established in literature, and follows directly from the fact that simplification was a lossy process: to claim otherwise is tantamount to stating that there wasn't really any simplification.

However, there was a simplification as a matter of fact, and it did make things simpler as it was supposed to. It was designed to make it relatively easy for everyone to switch from traditional but not necessarily to go the other way. That there are even so many examples you can provide to the contrary is itself a demonstration of some of its failures, which are also well-known and described, but the fact remains that the issues going the intended way are few and far between, and just not comparable to the fact that as part of the simplification, dozens of common characters such as 隻, 製, 劃, and 錶 were blended into semantically unrelated ones: an extra hurdle you have deal with when going back.

I could provide more examples, or even make an exhaustive comparison but that would be unlikely to change your opinion, since you are now arguing not against what I said but against some "common claim[s] [...] on English language forums," whatever that is even supposed to mean, that you decided to associate with me. I'm sorry to say that due to this I can no longer see it as a discussion in good faith, if it ever was one to begin with.

You are entitled to your opinion of course, as I am to mine: if anyone is serious about learning Chinese beyond the basic communication skills, I would advise them to start with traditional characters. If this happens to hurt the feelings of (some) "PRC speakers," so be it.

Let's agree to disagree.


I have only ever had two main points in this whole thread:

1. Simplified characters and traditional characters share a many-to-many relationship, not a one-to-many one

2. It is trivial (in reading) to go from simplified characters to traditional characters or vice versa. There is no significant difficulty difference in either direction.

My talk about the PRC speakers is not "to defend PRC speakers." It is simply some evidence of the latter. You have an entire country of people who can read traditional characters without any formal training in them (even when presented with single characters isolated from surrounding context). This is fairly strong circumstantial evidence that the barrier between the two is fairly trivial.

> I stand by my opinion that it's easier to switch from traditional to simplified than the other way round. This isn't really controversial, it's well-established in literature

Do you have examples from the literature of this? I don't know of any and I've watched this space (not particularly carefully so I may missed things, but I've read a fair number of papers). I know of results demonstrating the difference between how native simplified readers and native traditional readers break down characters (holistically vs analytically), but know of no other results. Even arguing from first principles is difficult. Even given a one-to-many relationship, that makes writing easier (mindless substitution works) and reading harder (again mindless substitution works now in the opposite direction). It's not obviously clear which direction works.

> to claim otherwise is tantamount to stating that there wasn't really any simplification.

This is not true. The vast vast majority of the simplifications are in simplifying character stroke counts, not in merging characters in any direction.

> It was designed to make it relatively easy for everyone to switch from traditional but not necessarily to go the other way.

This is also not true. This has never been a stated goal of any of the various attempts of simplification (《減省漢字筆畫的提議》, 《第一批簡體字表》, 《漢字簡化方案草案》, or the 《简化字总表》). More than anything its stated audience (when mentioned) has been illiterate speakers who are unfamiliar with any character set. The only reason that most of the characters look familiar is that one of their principles was to as much as possible not introduce new characters and only codify existing characters or character components (so simplified characters ended up relying a lot on variant characters and cursive script) since the point was to remove characters rather than add them. It's important to note this is the same goal the ROC had when creating their set of 教育部標準字體, what is usually taken as the body of "traditional characters" today. The main difference is that the ROC chose to use the most popular characters while the PRC chose to use the simplest characters. This was mainly chosen to avoid the thorny methodological issue of how to create characters wholly from scratch (which is what the now-discarded second round of simplifications did) not for ease of switching itself among the literate class, since the majority of people learning the character sets would be learning how to read for the first time.

To take a step back, I think you've been quite uncharitable to me in this discussion thread and keep implying motives I do not have.

And I understand you think I've been quite uncharitable to you. As far as I can tell you think that I'm essentially trying to show off some smarts here. That I happen to both derive pleasure from pedantically nitpicking where you're wrong and clearly ignoring the overall thrust of your point. I imagine your internal dialogue to something like the following:

"Come on, simplified characters and traditional characters basically share a one-to-many relationship. Everyone knows this. And now you latch on to some throwaway sentence I've written about whether people would recognize and turn this into some weird impassioned defense of "PRC speakers" which I never was even talking about in the first place! Clearly you're just trying to pick a fight here."

But I urge you also to view things from my perspective. I mean clearly I think you're misunderstanding me (why wouldn't I of course!).

My internal monologue goes something like:

"Look I was just making some corrections to some oft-repeated inaccuracies, now this person's accusing me of pedantry, okay now I've presented non-pedantic examples, now they think I'm defending 'PRC speakers' and somehow this is now a discussion about whether simplified characters have failed at their stated goals? Why do they keep ignoring the main points I'm making and mixing up my examples for my main points?"

From my perspective I've been frustrated in that essentially every comment I've responded to has consisted of you just ignoring my main points, blowing up some side comments I've made completely out of proportion and chalking up my very real counterexamples as "pedantry."

However, I also think it is very easy to go down this rabbit hole and dig our heels deeper, so I want to offer an olive branch and just cut to the heart of our disagreement.

I think we're both saying things that are stronger than what we intend and reading into each other's responses things that are not there. You think that traditional characters are the way to go and provide a smoother transition to simplified. I don't think so. I think traditional or simplified characters are both equally effective as a starting point and you can trivially transition from one to the other (I partially speak of this from personal experience as well when I took up reading and writing Classical Chinese.). You say this is a well-known result in the literature. I earnestly look forward to any examples you might have in the literature.

As for the original thing that sparked all this, the relationship between simplified and traditional characters and whether it's a one-to-many relationship or a many-to-many relationship. I want to emphasize it is certainly true that there are more one-to-many simplified-to-traditional examples than the opposite direction. However, the opposite direction is still a fairly significant minority. The proportion breaks down to something like 75%-25%. It's a large majority, but not an overwhelming one, and not a small enough proportion that I would call it pedantic.


This is a really good summary.

For anyone who want to geek this out further, I highly recommend the the introduction in "Decoding Kanji: A Practical Approach to Learning Look-alike Characters" By Yaeko Sato Habein [1]. Despite being a workbook for intermediate students, the introduction really goes all-in on this stuff and treats the topic both very academically, but it also a treasure trove of kanji related "Fun facts", and the appendix has interesting compilations such as "Pairs of homonymous kanji compounds with one kanji in common".

[1] https://www.amazon.co.uk/Decoding-Kanji-Practical-Look-alike...


Isn't 晴 an example of "3, compound ideograms"? Sun and blue suggested a blue day or in other words a blue sky, no clouds. So it doesn't seem phonetic.

In Japanese there are words like 米国 "beikoku" which means USA but it comes from 米利堅 'merikan'


Man if you really want to trace the etymology of Chinese characters it gets to be a really long and crazy affair. You inadvertently stumbled across a thorny example.

The history of 晴 indicates pretty clearly that 青 is meant to be a phonetic compound rather than a semantic one, but it's not obvious how that came about.

晴 originated as the character 夝 and originally meant a clear night with visible stars after the rain has stopped (夝 is a 形声字 where 夕 means dusk or night and 生 is the phonetic compound) rather than a clear day. By focusing on the stars, we got alternate "spellings" of 暒 (which is a cross between using 星 phonetically and semantically) and 精 (borrowing its connotation of brightness). Gradually this focus on stars became a focus on brightness while maintaining the original "clearness." The latter character 精 then kept its phonetic component 青 and got a new semantic component 日 to distinguish itself from 精 and that's how we got 晴.


That's fascinating!

Doesn't that mean it's 青 that's the phonetic part? It means blue. blue day certainly is a fitting compound meaning regardless of its origin. On the other hand, no idea why moon and that radical (life?) = blue so that part is phontic?


> Doesn't that mean it's 青 that's the phonetic part?

I'm confused. That's indeed what I said and what the article states and what the original parent comment stated.

> On the other hand, no idea why moon and that radical (life?) = blue so that part is phontic?

That's not quite the direct evidence that 青 is phonetic rather than semantic. It's because already in 精 青 is not a semantic component but rather a phonetic one. In particular 精 was being used to describe clear, sunny days before the existence of 晴. Its semantic compound was then replaced with 日 to disambiguate this new meaning of 精 [0]. So the jump from 精 to 晴 is treating 青 as a phonetic component as well.

[0] "Replaced with" indicates an intentionality that may or may not have actually occurred. We don't have any records of someone literally coming out and saying "yeah I think 精 has too many meanings, I'm going to replace one of them with 晴." It's plausible that what may have occurred is the independent growth of a variant character 晴 that occurred after the rise of 精 that eventually supplanted 精. Regardless the existence of 精 is a pretty clear signal that it was the phonetic quality of 青 that was being used.


亜米利加 America, is where 米国 comes from (it couldn't be abbreviated as 亜 because that would have been confusing with e.g. 亜細亜 (Asia)).


I think the somewhat confusing thing here is that Japanese can pronounce the same kanji in different ways, from 'purely local Japanese' to variants of old Chinese pronunciation.

米国 "beikoku" sounds like a strange choice for "America". But then in 亜米利加 it's pronounced "mei", which does indeed explain the choice better.

Chinese use 美 'mei' for America and so 美国 'meiguo' is the US.


According to Kenneth G. Henshall, who wrote a book on the etymology of Kanji [1], it is quite common to choose a phonetic part that lend itself to the meaning of the word and not just a sound.

[1] https://www.amazon.com/Guide-Remembering-Japanese-Characters...


There are arguably three to six types, depending on how deep one wants to get into the etymology of it! On the higher count, two categories missing are (1) phonetic (false) loans where another character was arbitrarily adopted for a different meaning due to pronunciation similarities, forcing the original word to be reinvented to disambiguate, and (2) derivative cognates, where a common ancestor character may have branched out over time into multiple characters.

https://en.wikipedia.org/wiki/Chinese_character_classificati...


That Wikipedia article's decision to label 转注 as "derivative cognates" is entirely speculative.

转注 is closer to "mutually defined" i.e. mutually recursive definitions (or potentially sounds, it's not really clear as I explain later). There is no reason to think this referred to any notion of derivative cognates.

Indeed if it were derivative cognates there would be way way more characters in that category than are put there (e.g. 手 and 又).

Basically nobody really knows what the intended purpose or meaning of 转注 was. This is because, although the Wikipedia article mentions 考 and 老, it fails to mention that this pair is the only commonly agreed upon instance of 转注 in the entire Shuowen Jiezi.

Xu Shen (the author of the Shuowen Jiezi) writes, "转注 consists of those characters forming a single category with the same radical. The same meaning is mutually given among them. 考 and 老 are examples." (转注者建类一首同意相受考老是也)。Indeed if you go to the entries of 考 and 老 they simply refer you to the other.

That is the entirety of what we know about 转注 (barring later interpretations by other scholars trying to make sense of that). No other character is listed as 转注 in the Shuowen Jiezi's 10516 catalogued characters (to be clear not every character is necessarily categorized by Xu Shen among them, many are categorized by later commentaries).

As a result scholars mostly ignore 转注.


but how would 许慎 categorize biang biang mian

[0] https://en.wikipedia.org/wiki/Biangbiang_noodles


That would fall in the rarer category of 虚假字, or bogus characters.


The usual classification is to call it a 会意字 because its structure is popularly given various semantic-based explanations.


A few that I love are:

- 森 (forest) not only looks like a forest, but is a bunch of 木 (tree/wood).

- 众 is a crowd of people, and 人 is person.

- 火 is fire, and 焱 is flame or extremely hot.

- 石 is stone, and 磊 is many stones or great pile of rock.

- 水 is water, 淼 describes a larger body of water.


In addition to that, Chinese does not like single characters as the pronunciation is not unique enough so words are usually made up of 2 characters.

So for 'forest' you get 森林, just to be clear that this is a place of many trees...


- 凸 convex

- 凹 concave

- 凸凹 bumpy

- 串 skewer, 心 heart、患 ill/disease

- 王 king, 玉 ball, 宝 treasure. The King's balls are his treasure

- 男 man, 女 woman, 嬲 ridicule/tease

- 敬 respect, 馬 horse, 驚 surprise (it's surprising to respect a horse)


Many of the explanation you listed are incorrect like 宝 and 驚.


They are correct for Japanese, probably not so much for Chinese.


Please provide some sources for your claim. I don't follow your logic on how 宝 means treasure in Japanese because "The King's balls are his treasure", and not because the character and its meaning was adopted from Chinese.


Sorry, I didn't mean to suggest that "ball under roof" means "the king's balls are his treasure". Rather it's just an easy way to remember the characters. I have no idea of the history of those characters but it is funny that 王 means king and 玉 means ball and just happens to look like king with a mark on the lower half suggesting the kings balls and that "ball under roof" 宝 means treasure.


Also interesting is that 男 (man) is a combination of 田 (rice field) and 力 (power or strength)


Also check out 女 and 姦 for a slightly different story.


姦 always cracked me up.



It means rape.


Not necessarily, eg the famous Japanese saying 女三人寄れば姦しい (three women will be noisy). Most Chinese characters are not limited to a single meaning.


I know, I have a master degree in Japanese and I work with dictionaries daily.

The thing is 姦 is first and foremost associate with sex, in particular bad kind of sex. It’s obvious from the compounds listed by Weblio: adultery 姦通, rape 強姦, incest 血族姦通. Now in Pleco (Chinese dictionary): to fornicate, adultery, rape.

So do your homeworks and stop spreading bullshit on the web: if the average Japanese or Chinese people stumble upon this character out of the very very restricted contexts it can mean something else, his first idea will be very bad. And in this case, in absence of okurigana, it’s clearly not one of those contexts.

Edit: I queried the BCCWJ Corpus with Shonagon, and there is not. a. single. use of 姦 as 姦しい over 673 results. Everything there is is compounds like 姦通、強姦罪、視姦者、姦淫、姦婦.


My friend, you're the one who asserted "it means rape", and all I'm saying that it's more complicated than that. For example, 姦通 may be morally "bad sex", but it's consensual.

But I will readily grant that none of these meanings will be obvious from just seeing a character composed of three women.


No, I’m asserting that the first meaning people will associate with that kanji is far away from the "cute" example the internet loves to speak about since it have mainly very negative connotations (adultery is a crime in Japan). Dick also has a lot of meanings, yet the science fiction author will likely rank quite low if people were asked what they think about when reading this word.

PS: we are not friend.


> PS: we are not friend.

For sure you're rude, arrogant and you don't have good manners.


For a short fun movie about pictographs for learning Chinese, watch this (https://www.youtube.com/watch?v=aqyh1nTCf9M)



I also like 大 (big) which is a person holding their arms out wide.

As a non-reader and non-speaker of any Asian language, some sounded examples of the last category might've helped. Or not. I have no idea.


Here's an example of a series of phono-semantic characters that's fairly easy to pronounce for English speakers.

成, 誠, 城 all are pronounced cheng, so we say that 成 is the phonetic compound that the other two characters use to denote their own pronunciation.

言 and 土 on the other two characters then are semantic compounds that hint at the meaning of the character.

言 indicates speech/language and 誠 can be interpreted as "true to one's word," i.e. honest/straightforward/earnest/sincere.

土 indicates earth and by extension that which is built and 城 is in fact a kind of structure, namely "city walls" (later also just "city").

There's some subtlety here where the pronunciation rule doesn't always hold exactly, e.g. 盛 is pronounced sheng, and sometimes not at all, e.g. 兌, 說, and 悅 are all pronounced very differently from one another. This is due to a variety of hysterical raisins.


Thanks, though I had to zoom in like an old man to see all those details :)

If the spoken sound is the same, I assume the meaning comes from context. Are the written adornments therefore redundant? Is it much like English how we now have knight and night?


In spoken Mandarin Chinese you can normally derive the meaning from context but in Classical Chinese (or in fact anything written more formally) it's not necessarily so straightforward.

There is a poem that illustrates it, where each syllable is pronounced exactly the same, /shi/ just with different tones: https://en.wikipedia.org/wiki/Lion-Eating_Poet_in_the_Stone_... (施氏食獅史). It goes like that:

    « Shī Shì shí shī shǐ »
    Shíshì shīshì Shī Shì, shì shī, shì shí shí shī.
    Shì shíshí shì shì shì shī.
    Shí shí, shì shí shī shì shì.
    [...]
Good luck trying to figure out the meaning without the "written adornments."


Yes I've seen that poem before.

According to google translate though, the tones on the parent example are identical. So context would be all they have to go by unless there is an aspect I'm missing as a non-speaker.

It's probably the case that reading is just harder for us to process than listening in general which is why jokes like http://guidetogrammar.org/grammar/twain.htm exist. We hold onto a lot of redundancy too in the English world.


Amusingly that link actually undersells how English orthography fails to capture English pronunciation and even its final sentence has many pronunciation "mistakes" by its own standard.

> Fainali, xen, aafte sam 20 iers ov orxogrefkl riform, wi wud hev a lojikl, kohirnt speling in ius xrewawt xe Ingliy-spiking werld.

The first "a" in "Fainali" is not pronounced the same as the second "a" in "Fainali" (e.g. finally does not rhyme with rally and rawly, the correspondence between the latter two depending on your own regional accent). The "x" in "xen" and "xe" is not the same sound as the "x" in "xrewawt" or in "orxogrefkl." The "o" in "lojikl" is not the same "o" as in "kohirnt." The "e" in "xe" is not the same "e" as in "xen" and is not the same "e" as in "orxogrefkl." And more.

And that's not even touching stuff that has heavy regional variation such as the "e" in "aafte" and the "i" in "kohirnt."


Formal written Chinese is very different from the spoken form, essentially to the point it could be considered another language altogether. This situation isn't really comparable to anything in English.

In spoken Mandarin, lexical units are generally formed of multiple-character compounds (typically 2 to 4 characters). In such a scenario, the character being used can be figured out just from the context. If such an utterance is transcribed phonetically and written down, the same still holds. So your conjecture is correct to the extent it pertains to the colloquial language.

However, the same cannot be said for the formal written language, which is based on Classical Chinese, and much more succint: here, each character is often an independent lexical unit. At the same time, numerous more obscure characters come into play and annoyingly many of them tend to be homophones. In particular, in Mandarin, as a result of the phonetic processes it underwent, there are a lot of characters pronounced either /yi/ or /shi/.

At this point, figuring out the meaning of what is being said tends to become cumbersome or downright impossible (as in the case of the poem), unless the actual written characters can be seen, which is usually not an issue as this form of the language is used predominantly in writing (with the exception of a huge number of idiomatic expressions that educated people are expected to know by heart and be accustomed to).

And it is this latter form of Chinese that is the language of history, culture, but also anything more sophisticated in general, such as any better news publications. So while a basic, dumbed-down version of Mandarin would survive even when simplified even further, perhaps to the point of being written only phonetically, I doubt any educated Chinese would willingly sign up for it (particularly as there are is no incentive to do so since everybody learned the language as a child anyway, and for those who did it well, it is a marker of status).

If you want the Chinese to ditch their "written adornments," you could just as well be telling them to switch to Esperanto.


I'm curious how the poem with proper pronunciation sounds like.



I am sure there are tongue twisters in English can be used to explain the interpretability of the language /s


Yes it comes from "context," but that's always true right? How the spoken language is understood is in a sense completely orthogonal to the written language for any language. E.g. the English spoken language has no way of distinguishing between the word "night" having two meanings, one being the opposite of "day," the other being a warrior and two words "night" and "knight." That is there is no real distinction between a single word with multiple meanings and multiple words with the same pronunciation at the spoken language level. Think about all the various definitions of the word "get" which also all require context to disambiguate in the spoken language even though there is no "written adornment."

That is to a large extent the "meaning" of the orthography is irrelevant to the spoken language.

I could come up with an alternative highly differentiating orthography for e.g. French without considering its spoken language either. I could mandate that "suis" as in "être" is not spelled "suis" but rather "swi" and "suis" as in "suivre" is still spelled "suis," but that has absolutely no impact on the spoken language. There is no difference to a speaker between a world where we have "swi" and "suis" and a world where we have two meanings of "suis."

Separately, it's also important to realize that Chinese characters don't always correspond to words in the same way that happens in English. From an English perspective, it is easier to think of Chinese characters in modern Mandarin as individual morphemes [0], e.g. stuff like prefixes and suffixes, that are also sometimes standalone words, rather than always entire words themselves.

So for example 誠 is not a standalone word in the same way "hono-" (e.g. "honesty," "honor," etc.) is not a standalone word in English, even though it is a recognizable prefix that means something.

So when spoken, speakers rely on a combination of audio and context cues to disambiguate the meaning and how word segmentation is supposed to be done (in the same way that English speakers both are able to distinguish between "knight" and "night" as well as "prototype" and the non-existent words "pro totype"). Note that this is subtly different from "disambiguating among characters." A less literate Chinese speaker might think that in fact 成, 誠, 城 are all a single character 成 that can be used in different ways when used as prefixes/suffixes in different words and we would have no way of knowing that without testing the speaker's writing capabilities.

Classical Chinese, the written, formal Chinese language from ~2500 years ago to ~100 years ago, however, where individual characters can almost always be used as standalone words, does present an interesting challenge when pronounced with a modern Mandarin pronunciation due to the high number of homophones in modern Mandarin (among characters but not words!). It can be very difficult to understand long tracts of Classical Chinese if it's spoken aloud using modern Mandarin.

This makes Classical Chinese essentially a written-only language. And in fact other systems for pronouncing Classical Chinese exist in countries that used it for writing but did not use Chinese for speaking (e.g. Japan, Vietnam, and Korea). This led to an interesting situation where literate individuals from those countries could communicate in writing with one another but not verbally.

[0]: This is a simplification. Despite many articles to the contrary, it is not true that all Chinese characters are morphemes. There are some rare cases of multi-character words whose constituent characters have no individual meaning and cannot be used outside of that single word.


I love how elephant means 'resemble' uses symbol for 'family' (家) upper radical 'knife' (刀)

Elephant in the (里的大象)

亻人 (r)en

氵水 shui

手扌 shou (of hands)

忄心 xin

犭犬 quan 豬 zhu

刀刂 dao

       到 until
       象 elephant 

不客气 bu ke qi [you are] welcome。


龘 lots dragons


𱁬 lots of dragons and clouds.


Haha, that makes me laugh


I'm not certain about some of these. Change (化) is a combograph? Its meaning derives from an upside-down person and right-side-up person, as is (仲) relationship. 母 is a pictograph, although rotated 90 degrees.


I'd classify 化 as a simple ideogram (part of group 2, called "indicators" in the article) because it conveys the abstract idea of 'change' by showing one person upside down from the other. It isn't a compound ("combograph", group 3) to me because it doesn't build upon the meaning of its individual components.

However, this is a fuzzy categorization. In the top-level comment I wrote, I listed 三 'three' as an example of an ideogram but it could be argued instead that it is a compound of three separate 一 'one' characters. Then there's the character 廿 'twenty', which consists of two 十 'ten' characters, that is probably better explained as a compound. So at a certain point it all becomes quite subjective.


Yeah I'd also class 足 as a pictograph.

It is a category error to put something like 足 (foot) or 母 (mother) as a 指事字 (what the article calls an indicator). 指事字 are abstract things, like "up" (上), rather than concrete nouns.


Absolutely. 足 is an augmentation of 止, which itself originated as a drawing of a foot. 母 is a picture of female breasts. Both are pictograms.


This is a really interesting read for me as a non-Japanese (non-Chinese, non-Korean). I learned that the idea of pictograms is largely misunderstood after reading "The Chinese Language: Fact and Fantasy" by John DeFrancis. It is quite a fun read to start, although it drags at the end due to my lack of knowledge.


My point is more related to kanji learners than it is kanji etymology. I understand that some things have etymological meaning, but it's much more useful to me as a learner to just see it as a pictograph. It's pretty common for learners to make a mnemonic out of the kanji and tie it to the kanji's meaning.

木 is a great intro kanji because it simply means "tree" and if you let yourself believe it looks like a tree it's easier to remember. You can start playing with combos like 林 meaning "woods" and 森 meaning "forest".

相 is a combination of 木 and 目, tree and eye. The article pinpoints its meaning as "physiognomy". 相 is an extremely common kanji. My translations may be a little awkward, but these words are more frequent in Japanese than my translations might suggest:

相手 Tree-eye, hand: "the person you're talking to"

相談 Tree-eye, conversation: "consultation, getting advice from someone"

首相 Neck, tree-eye: "Prime Minister"

相撲 Tree-eye, slap: "Sumo wrestling"

It's really hard to pin down the meaning of this particular kanji. Eventually I realized I was missing the 森 for the 木. "Physiognomy" is basically meaningless to me as an English speaker/Japanese learner because now I have to work to remember the English meaning as well. I decided to stop thinking of this kanji "physiognomy" and start thinking of it as "tree-eye". Tackling large problems systematically is exciting for me but I have to remind myself "making mnemonics for each kanji" is not the goal, "being able to enjoy myself while immersing in Japanese text" is.


I learned 相 as 'mutual'. That seems to work for all the listed examples (a stretch for prime minister but I can still see it working) and most of the other words that contain it from a cursory look.


I wonder if the distribution in categories is different to that of Chinese hanzi.


There's no such categories in Chinese hanzi, or at least not generally taught.


What? This is totally a thing in Chinese, and at least mentioned in primary school/ middle school.

https://zh.wikipedia.org/wiki/%E5%85%AD%E6%9B%B8 or https://en.wikipedia.org/wiki/Chinese_character_classificati...

https://baike.baidu.com/item/%E5%BD%A2%E5%A3%B0%E5%AD%97

https://baike.baidu.com/item/%E4%BC%9A%E6%84%8F%E5%AD%97

Also, for it's worth, I don't think this is generally taught in Japan or in Japanese language class either.


Sorry for the ignorance.. I was just speaking from experience, I wasn't personally taught with these in the Chinese education system


But you've probably heard of 形声字 and 象形字 right?


Actually yes, but not 指事文字 and 会意文字


Yeah that's usually what ends up happening.


These categories originate from Chinese. Specifically these are a subset of the 六书 which are not taught generally in the Chinese educational system (but are common enough that most people will have heard of some of these categories), but will be taught at any undergraduate level Classical Chinese course at a Chinese university (which is basically a prerequisite for a lot of literature/history-based degrees).


They go back to the first real Chinese dictionary, Shuowen jiezi "Explaining simple characters and analyzing complex characters", from the first century.

https://en.wikipedia.org/wiki/Shuowen_Jiezi




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: