Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is primarily because the legacy character set---KS X 1001---already contained tons (2,350 to be exact) of precomposed syllables. Unicode 1.0 and 1.1 had lots of syllables encoded in this way, with no good way to figure out the pattern, and in 2.0 the entire Hangul syllable block is reallocated to a single block of 11,172 correctly [1] ordered syllables.

So yeah, Unicode is not a problem here (the compatibility with existing character sets was essential for Unicode's success), it's a problem of legacy character sets :-)

[1] Only correct for South Koreans though :) but the pattern is now very regular and it's much more efficient than heavy table lookups.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: