Hacker News new | past | comments | ask | show | jobs | submit login

> the simplification is a many-to-one mapping

Simplification is a many-to-many mapping. There are many characters that were either merged in the process of standardizing traditional Chinese characters or have multiple meanings which are split by standard simplified Chinese.

Here's a very small sample of a bunch of Traditional -> Simplified examples.

乾 -> 乾 (qian2)、干 (gan1) (干 is itself a merger of multiple traditional characters)

夥 -> 夥 (huo3)、伙 (huo3)

兒 -> 儿 (er2)、兒 (ni2)

祇 -> 祇 (qi2)、只 (zhi3) (只 is also itself a merger of multiple traditional characters)

> Try for example 發. I bet it's not going to be so straightforward.

I'd be pretty shocked if any literate Chinese speaker in the PRC did not recognize that character.

I think in general you're overestimating the difficulty of learning traditional characters when educated in a system using solely simplified characters.

PRC undergraduate courses will sometimes just flat-out assume knowledge of traditional characters. Indeed I have a textbook from a history of Chinese medicine class that's entirely in traditional characters. The professor (not unreasonably) just assumed we all already knew traditional characters or would pick it up in a week or two. All subsequent course material (e.g. test materials) was also in traditional characters, although we were allowed to write in simplified characters (and I think everyone in the class did so).

I'm sure there's some corner cases you could use to stump people, but PRC kids consume quite a lot of traditional Chinese material even if they're not formally taught it in school. It's more mentally taxing than just reading simplified characters, but it's still fairly straightforward.

In general it's pretty easy to pick up either character set if you're a fluent reader in the other, at least for reading. Learning how to handwrite both sets is trickier and probably requires dedicated training, but even then only on the order of weeks or at most months.

> (to be fair, many probably due to the automated conversion, and some might be an intentional pun)

I'm fairly certain the overwhelming majority are due to automated conversions.




> Simplification is a many-to-many mapping.

Technically you are correct but this is just pedantry. Of the 4 examples you listed, 3 are corner cases, and 1 is also a variant traditional character. You know way too much not to realize this (by the way, that's a compliment!), and that you couldn't come up with anything reasonably common like the 發/髮 → 发 pair (because there isn't anything comparable going the other way) indirectly proves the point I was making.

Unless at a very advanced level, a person switching from traditional to simplified can simply afford to ignore any of these relatively rarely-encountered one-to-many mappings, whereas the same cannot be said about switching from simplified to traditional. It is then one extra thing to study if going one direction but not the other. That might be good to know for somebody who is actually making that decision, although in reality it's probably moot, since people will learn either one or the other first depending on their individual circumstances.

> I'd be pretty shocked if any literate Chinese speaker in the PRC did not recognize that character.

That might very well be true but is beside the point.

The claim in the parent post was that somebody who only ever learned simplified characters would be able to read traditional characters "on the spot" having never seen them before, to which I provided this counterexample of a commonly-encountered traditional character that will not be recognized without context, unless the person is already familiar with it.

All I'm saying is nobody is likely to figure out this character seeing it for the first time without context. (For the record, this applies both ways, just the example would have to be different.)

> I'm fairly certain the overwhelming majority are due to automated conversions.

But in that case it also says something that the software didn't handle the conversion well (implying that perhaps whoever wrote it didn't know about it), and that people posting this stuff also didn't correct the obvious mistake (suggesting they didn't know either).

*

You wrote a lot in response, and I generally don't disagree, I just don't see how most of it relates to my post. For the record, I'm not claiming it's difficult or easy going one way or another. In fact I was only pointing out the following 3 things:

1. If you only learned one set (whether simplified or traditional), you don't know the other automatically.

2. If people can read characters in context, they are likely just guessing - nothing wrong about it by the way, that's how people read in any language - but it doesn't mean they would really "know" the same characters in isolation.

3. Some common, unrelated traditional characters were merged to become the same simplified character. This is an extra layer of difficulty if you don't already know which these are (as you would if you learned the traditional characters first). (I concede it's also the case the other way, but in my opinion none of it is common enough to worry about at this level of granularity, so the burden is not comparable.)

I'm getting the impression you're looking for something to disagree with among the things I wrote, and I'm sure you'll be able to find it but I'd rather what I wrote be helpful even at the cost of having to simplify things a bit, and thus not being rigorously correct.

In other words, I'm trying to describe the forest, not the individual trees. My audience are the people who are looking at the forest from a distance: if someone is already in the forest, they either know what's inside or can look around on their own. That the forest insiders won't benefit from my description is to be expected. I hope you can recognize that.


Oof well that's rough.

> that you couldn't come up with anything reasonably common like the 發/髮 → 发 pair (because there isn't anything comparable going the other way) indirectly proves the point I was making.

Nah I just wasn't thinking straight.

著 -> 著(著作)、着(说着) (this is a common automated translation error from traditional to simplified, search for e.g. "说著")

覆 -> 覆(覆盖)、复(答复) (复 then aggregates other traditional characters)

帳 -> 帐(帐号)、账(账单) (technically 賬 exists but if we're excluding 祇 as a common but non-standard variant, the same holds true for 賬)

and probably some more I'm missing again.

Though the number of common one-to-many mappings in either direction is quite small (I'd guess maybe 10ish clusters), which brings me to my second point.

> although in reality it's probably moot, since people will learn either one or the other first depending on their individual circumstances.

It's moot because it is trivial to go from one to the other (for reading! It's harder for writing) regardless of direction. You give me a native speaker who has only ever seen one character set in their whole life (which is actually quite rare) and I can have them reading slowly in the other character set by the end of the day, and fluently by the end of a week. There's a set of rules for translating between radicals and then maybe in the ballpark of 100 exceptions (in either direction) on top of that and that gets you effectively full fluency.

> But in that case it also says something that the software didn't handle the conversion well (implying that perhaps whoever wrote it didn't know about it), and that people posting this stuff also didn't correct the obvious mistake (suggesting they didn't know either).

The automated translation error is usually an artifact of people being lazy and not checking the translation (see e.g. my 著 example). The fact the software doesn't get it is because differentiating the different cases is a hard NLP problem but an easy human-solvable problem so the investment isn't usually made. But probably part of it too is that this is starting to get into writing in a different character set, which is harder than just reading it.

But to get back up out of the weeds, I was mainly reacting to support mrslave's assertion that

> The Chinese I have asked who learnt Simplified Chinese characters in school attest to being able to read Traditional Chinese characters with very little additional effort

which is true and holds up even if you strip context as you mention.

> If you really wanted to know, you'd have to show them the characters in isolation. Try for example 發. I bet it's not going to be so straightforward.

I've actually played this game with friends (all PRC speakers) once before, i.e. quizzing each other on individual traditional characters. There was only one that threw even one of us for a loop (叢) and would've been obvious given context. Everyone got everything else out of maybe 30 irregular characters (i.e. don't just follow standard simplified radicals).

I think "on the spot" was supposed to mean "without having seen the character before but seeing its context," I don't think mrslave meant literally ab initio but I dunno at that point I'm putting words in mrslave's mouth.

While I agree with your three points in the second half of your reply, I don't think that was the initial thrust of your parent comment.

I interpreted your initial comment to mean that switching from simplified characters to traditional characters is both non-trivial and harder than going from traditional to simplified. Moreover there was the added implication that PRC speakers don't generally know individual traditional characters, but are only able to guess at them when given additional contextual clues. Perhaps I'm completely distorting your claim, but it is regardless a common claim I see on English language forums, so I don't think I'm pedantically attacking a strawman.

I disagree with those points.

It is trivial (relative to most other things language-learning-related) to go from simplified characters to traditional characters (and indeed to go the opposite direction) and it is not appreciably more difficult in either direction. Moreover I think you're drastically underestimating the proficiency of the average literate PRC speaker in understanding traditional characters. Keep in mind there's still quite a lot of exposure to traditional characters despite its absence in the educational system. Signs, foreign film subtitles, decorations, trademarks, etc. traditional characters pop up all over. There's also a ton of cultural imports from Taiwan that preserve their traditional characters. Because it's so trivial, this passive osmosis is enough to attain essentially full fluency in reading traditional characters.

I have such a strong reaction to this not because of insider nit-picking, because I've seen so many examples of Chinese language learners hemming and hawing over whether they should go for traditional characters or simplified characters when there's effectively no difference. Fluency in one makes fluency in the other trivial, at least if you don't care about being able to handwrite both. Choose whichever has the better teachers/resources/etc. or just flip a coin. There's no advantage to one over the other. Barring your environment, the only time I could find even a shred of advantage to choose traditional characters over simplified characters is if you're exclusively studying Classical Chinese and not any modern variety. Even then the advantage is far far less than a lot of people proclaim.


Your examples are excellent in pointing out the many to many mapping during the simplification process. Many people assume chinese characters are somehow "static" and that the modern traditional characters are not aggregately mapped from older characters. Your last point is also very important as the top questions I've come across language learning forums is traditional vs simplified. This seemingly trivial issue to people fluent in Chinese often get exaggerated and politicized to the detriment of Chinese learners. The best analogy I can think of between traditional vs simplified is akin to American English vs British English, except English learners don't make a big fuss about which version they should learn first.


> The best analogy I can think of between traditional vs simplified is akin to American English vs British English, except English learners don't make a big fuss about which version they should learn first.

There isn't really much of an analogy here, and on a general note, trying to approach a new problem by seeking an analogy to something we already know isn't really the best way to gain insight: simply put, not everything we are about to see is going to be like something we have already seen, or even close to it.

With that in mind, if you really wanted to compare this to the varieties of English, I can offer this analogy:

The word "curb" in American English has a number of meanings related to 'restraint', or it can also mean 'a raised edge of pavement'.

However, in British English, when used to mean the latter, the word is spelled "kerb."

So, if you are familiar with British English and want to learn American English, all you have to do is remember to always spell it "curb" and never "kerb."

However, if you are going the other way, you have to remember the new spelling of "kerb," but also only to use it with a certain meaning, and not the others.

In this example, going from British to American English is easier than the other way round, just as going from traditional to simplified characters is easier than the opposite.

Yet another example could relate to the distinction between "shall" and "will," which some people make, and many (if not most) don't. If you are used to making the distinction, and want to stop (go "simplified"), it is effortless: all you have to do is to start using "will" all the time. However, going the other way you'd have to learn the rules of using it, or face the risk of drowning:

(a) "I shall drown, no one will save me!"

(b) "I will drown, no one shall save me!"

Which one do you choose to be rescued?

Answer here: https://en.wikipedia.org/wiki/Shall_and_will#Uses_of_shall_a...


> There isn't really much of an analogy here, and on a general note, trying to approach a new problem by seeking an analogy to something we already know isn't really the best way to gain insight: simply put, not everything we are about to see is going to be like something we have already seen, or even close to it.

My analogy isn't meant for people fluent in both Chinese and English that understand the nuances between variants of languages. The analogy is much more informative for people deciding to learn Chinese than your any of your 發/髮 → 发 examples and analysis. Clearly there are distinctions between traditional Chinese and simplified Chinese, your earlier posts contain many valid examples and analysis which are suitable towards a Chinese etymology audience rather than the general English speaking audience who are curious about Chinese. FWIW I'd teach my kids traditional Chinese, but overexaggerating the difference between the two variants is not helpful for Chinese learners.


> overexaggerating the difference between the two variants is not helpful for Chinese learners

I wholeheartedly agree. In the grand scheme of learning Chinese, this is a relatively minor consideration. It wasn't my intention to play it up (and in fact I don't think I did), it's just how the discussion progressed from the original comment.


> I don't think I'm pedantically attacking a strawman.

Unfortunately I think you are. Note I never wrote anything about the "PRC" that you keep bringing up all the time. It's almost as if you were here to defend the feelings of "PRC speakers" from the perceived slight that I failed to give them enough recognition.

I stand by my opinion that it's easier to switch from traditional to simplified than the other way round. This isn't really controversial, it's well-established in literature, and follows directly from the fact that simplification was a lossy process: to claim otherwise is tantamount to stating that there wasn't really any simplification.

However, there was a simplification as a matter of fact, and it did make things simpler as it was supposed to. It was designed to make it relatively easy for everyone to switch from traditional but not necessarily to go the other way. That there are even so many examples you can provide to the contrary is itself a demonstration of some of its failures, which are also well-known and described, but the fact remains that the issues going the intended way are few and far between, and just not comparable to the fact that as part of the simplification, dozens of common characters such as 隻, 製, 劃, and 錶 were blended into semantically unrelated ones: an extra hurdle you have deal with when going back.

I could provide more examples, or even make an exhaustive comparison but that would be unlikely to change your opinion, since you are now arguing not against what I said but against some "common claim[s] [...] on English language forums," whatever that is even supposed to mean, that you decided to associate with me. I'm sorry to say that due to this I can no longer see it as a discussion in good faith, if it ever was one to begin with.

You are entitled to your opinion of course, as I am to mine: if anyone is serious about learning Chinese beyond the basic communication skills, I would advise them to start with traditional characters. If this happens to hurt the feelings of (some) "PRC speakers," so be it.

Let's agree to disagree.


I have only ever had two main points in this whole thread:

1. Simplified characters and traditional characters share a many-to-many relationship, not a one-to-many one

2. It is trivial (in reading) to go from simplified characters to traditional characters or vice versa. There is no significant difficulty difference in either direction.

My talk about the PRC speakers is not "to defend PRC speakers." It is simply some evidence of the latter. You have an entire country of people who can read traditional characters without any formal training in them (even when presented with single characters isolated from surrounding context). This is fairly strong circumstantial evidence that the barrier between the two is fairly trivial.

> I stand by my opinion that it's easier to switch from traditional to simplified than the other way round. This isn't really controversial, it's well-established in literature

Do you have examples from the literature of this? I don't know of any and I've watched this space (not particularly carefully so I may missed things, but I've read a fair number of papers). I know of results demonstrating the difference between how native simplified readers and native traditional readers break down characters (holistically vs analytically), but know of no other results. Even arguing from first principles is difficult. Even given a one-to-many relationship, that makes writing easier (mindless substitution works) and reading harder (again mindless substitution works now in the opposite direction). It's not obviously clear which direction works.

> to claim otherwise is tantamount to stating that there wasn't really any simplification.

This is not true. The vast vast majority of the simplifications are in simplifying character stroke counts, not in merging characters in any direction.

> It was designed to make it relatively easy for everyone to switch from traditional but not necessarily to go the other way.

This is also not true. This has never been a stated goal of any of the various attempts of simplification (《減省漢字筆畫的提議》, 《第一批簡體字表》, 《漢字簡化方案草案》, or the 《简化字总表》). More than anything its stated audience (when mentioned) has been illiterate speakers who are unfamiliar with any character set. The only reason that most of the characters look familiar is that one of their principles was to as much as possible not introduce new characters and only codify existing characters or character components (so simplified characters ended up relying a lot on variant characters and cursive script) since the point was to remove characters rather than add them. It's important to note this is the same goal the ROC had when creating their set of 教育部標準字體, what is usually taken as the body of "traditional characters" today. The main difference is that the ROC chose to use the most popular characters while the PRC chose to use the simplest characters. This was mainly chosen to avoid the thorny methodological issue of how to create characters wholly from scratch (which is what the now-discarded second round of simplifications did) not for ease of switching itself among the literate class, since the majority of people learning the character sets would be learning how to read for the first time.

To take a step back, I think you've been quite uncharitable to me in this discussion thread and keep implying motives I do not have.

And I understand you think I've been quite uncharitable to you. As far as I can tell you think that I'm essentially trying to show off some smarts here. That I happen to both derive pleasure from pedantically nitpicking where you're wrong and clearly ignoring the overall thrust of your point. I imagine your internal dialogue to something like the following:

"Come on, simplified characters and traditional characters basically share a one-to-many relationship. Everyone knows this. And now you latch on to some throwaway sentence I've written about whether people would recognize and turn this into some weird impassioned defense of "PRC speakers" which I never was even talking about in the first place! Clearly you're just trying to pick a fight here."

But I urge you also to view things from my perspective. I mean clearly I think you're misunderstanding me (why wouldn't I of course!).

My internal monologue goes something like:

"Look I was just making some corrections to some oft-repeated inaccuracies, now this person's accusing me of pedantry, okay now I've presented non-pedantic examples, now they think I'm defending 'PRC speakers' and somehow this is now a discussion about whether simplified characters have failed at their stated goals? Why do they keep ignoring the main points I'm making and mixing up my examples for my main points?"

From my perspective I've been frustrated in that essentially every comment I've responded to has consisted of you just ignoring my main points, blowing up some side comments I've made completely out of proportion and chalking up my very real counterexamples as "pedantry."

However, I also think it is very easy to go down this rabbit hole and dig our heels deeper, so I want to offer an olive branch and just cut to the heart of our disagreement.

I think we're both saying things that are stronger than what we intend and reading into each other's responses things that are not there. You think that traditional characters are the way to go and provide a smoother transition to simplified. I don't think so. I think traditional or simplified characters are both equally effective as a starting point and you can trivially transition from one to the other (I partially speak of this from personal experience as well when I took up reading and writing Classical Chinese.). You say this is a well-known result in the literature. I earnestly look forward to any examples you might have in the literature.

As for the original thing that sparked all this, the relationship between simplified and traditional characters and whether it's a one-to-many relationship or a many-to-many relationship. I want to emphasize it is certainly true that there are more one-to-many simplified-to-traditional examples than the opposite direction. However, the opposite direction is still a fairly significant minority. The proportion breaks down to something like 75%-25%. It's a large majority, but not an overwhelming one, and not a small enough proportion that I would call it pedantic.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: