Hacker News new | past | comments | ask | show | jobs | submit login

I am not familiar with many languages, but something that is unique about Korean "alphabets" is that alphabets combine to form a character.

This makes sentences more compact in appearance, but it also creates difficulty in creating fonts, since a single alphabet looks slightly different in combination with different alphabets. (For example, Gulim font, a standard sans-serif font, contains 49,284 glyphs according to this Wikipedia article: http://en.wikipedia.org/wiki/New_Gulim.)




Devanagari (the script which is used to write Hindi and several other Indic languages[1]) is very similar in this respect. A fairly modest number of consonant characters are combined with vowel indicators ("mudras"), which are then augmented with various diacritical flourishes to indicate non-plosive quasi-consonant sounds (what do you actually call those?) such as trailing "R"s and "N"s, vowel nasalisations, etc. Just to make life interesting, consonants can be conjoined to indicate glides between them. There are 1,296 possible permutations of conjoins, and when you consider all the possible permutations of vowel mudras and diacriticals, the glyph library can get a bit crazy. Not dissimilar from Hangul, except that the Indic languages have vastly more phonemes than Korean does.

Having picked up Hindi/Devanagari more or less by osmosis, I find that this profusion of glyphs means that my brain processes them differently than it does other alphabets (I also read Cyrillic and the Japanese Kanas). The various strokes seem to take on a "feeling" of associated pronunciation that is both intuitive and very, very precise (since every possible aspect of pronunciation is captured by the script). So when I run across bizarrely conjoined characters (like this[2]), I immediately have an intuitive and generally accurate sense of what sound they ought to produce, even if it takes me quite a while to puzzle out exactly why. This is almost the exact opposite of English, for example, where figuring out the characters is the easy part, while figuring out the sound they create is sometimes impossible.

[1]: http://en.wikipedia.org/wiki/Devanagari

[2]: http://upload.wikimedia.org/wikipedia/commons/b/b9/JanaSansk...


According to the web page, New Gulim includes all of CJK, for Koreans this means Hanja, which are thousands of Chinese characters that haven't really been used to write Korean since the 19th century.

The actual Hangul (Korean syllabary) characters require about 30 consonants/clusters and 21 vowels/dipthongs which can appear in characters in 9 different layouts. It's much more work than the 26 latin characters, but most of the font design is going to be copy/paste.

So you could say the purpose of New Gulim is not to create a "Korean font" in the sense that a font you can write Korean with, but rather the sense that Koreans can use it to write Korean mixed with Latin, Chinese, Japanese, Cyrillic, and Greek characters with a unified design between scripts. That's a tall order.


Oops, forgot about the other character sets. But, I think properly designed fonts require a lot more than what you mentioned. According to this article, for Windows, it requires 11,172. (Second paragraph in http://en.wikipedia.org/wiki/Korean_language_and_computers#C...)


> It's much more work than the 26 latin characters, but most of the font design is going to be copy/paste.

Not quite. While the minimal Hangul font design requires two sets of initial consonants, one set of vowels and two sets of final consonants (often called 2x1x2), it would be very quirky. Many bitmap fonts used the 8x4x8 design or its variant, the commercially available TrueType/OpenType fonts now have more than 30 sets of subtly differing glyphs. It is easier than drawing all 11,172+ glyphs but not much.


But you would still copy/paste to create subtly different glyphs, right? Just like I can copy and paste parts of Han characters to make 村 校 林 枚 様 機 横 (look at the left half), with slight modifications as necessary. (At a quick glance, I suspect the font I use has four variations of the 木 radical among those seven characters).


Right. Even worse, a grass radical (艸/艹) has more than several dozens of possible glyphs.


The representation in unicode is also interesting. I never knew there was a range of 11140 characters for encoding the different Korean (Hangul) characters. See: http://www.uni-graz.at/~katzer/korean_hangul_unicode.html It also has a cute javascript that lets you combine syllables into a character.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: