Here's a basic explanation of the diacritic notion as it applies to Asian script...

ubasu · on March 12, 2013

I wouldn't call them "mega-combo characters" ;-) Typically, these are at most two or three letters written together, and they are very much used today in Sanskrit, Hindi and other regional variants such as Bengali etc.

Of course you also have multiple words that have been combined into one large compound word, by way of appropriate linguistic rules of combining sounds. This is similar to long compound words in German.

contingencies · on March 13, 2013

Apparently sometimes called conjuncts. http://pratyeka.org/sanskrit/conjuncts.html

tikwidd · on March 13, 2013

This is the term I'm familiar with. They are similar to ligatures which exist in the Italic scripts e.g & (e+t), œ (o+e), German ß (ſ(long s)+s) etc. But conjuncts in Devanagari are much more numerous and widely used, probably because the individual consonant forms are fairly complex.

sebilasse · on March 12, 2013

i have learned the thai alphabet and that "mega-combo character" comment just made my day. Reading thai pretty much feels like reading regexps. Anditdoesntmakeitanyeasierthattheydontusespaces

contingencies · on March 12, 2013

Actually Thai only has roughly twice the number of characters that we have in Roman scripts - excepting tones (a real pain) it is possible to learn pretty quickly. Lao by contrast has less, but has a rather tricky plethora of vowel combinations for a myriad of hard to distinguish eww, ieww, ooh, iuooh type sounds. :) Cambodian has no tones and is my pick for the one to go for if you are keen on an easy starter.

Tangential tidbit: I sent a copy of The Cambodian System of Writing (http://pratyeka.org/csw/) to TPB's anakata while he was solitary confinement to help him stave off boredom. No idea if he ever read it, though his mother assures me it arrived.

vorg · on March 12, 2013

Besides ก (0xe01), the rest of the Thai characters (in 0xe02-0xe59) are letters ขฃคฅฆงจฉชซฌญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรฤลฦวศษสหฬอฮฯะาๅ, alphabetic non-spacing marks ัิีึืฺุูํ which can follow the preceding characters, and render as stacked (as shown by the OP), five "marks" เแโใไ (with unicode's Logical Order Exception) which can precede the following character, diacritic non-spacing marks ็่้๊๋์๎ which also render as stacked, modifier letter ๆ ,digits 0-9 ๐๑๒๓๔๕๖๗๘๙ ,currency symbol Baht ฿, and punctuation symbols ๏๚๛

Thai looks to be a pretty gr๏๏vy language, like many other natural languages - perhaps some programming languages will catch up in their enhanced use of lexical tokens one day, instead of just relying on grammar, long English names packed into name hierarchies, and multi-ASCII symbols.

boshea · on March 13, 2013

I think I would steer new learners of SE Asian scripts away from Cambodian as a first language to learn. Cambodian, while it is a beautiful script and has more-or-less regular pronunciation, also has a few odd exceptions, and complex vowel pronunciation rules. The consonants are divided into two groups, and many of the vowels are pronounced differently in the first group than in the second. That said, after learning the Thai script, Lao and Cambodian were not too hard. Either way, they are all great languages to learn.

contingencies · on March 13, 2013

Actually I am almost certain Thai and Lao have those consonant divisions as well, in fact I believe 3 or 4. If I am not mistaken they are still taught and are part of the tone system and/or can affect unwritten vowel selection. More certainly, the consonant classes somehow stem from the need to preserve pronunciation of Pali, a middle-Indian prakrit language (with features not present in these SEA countries' modern languages) that is used as the littoral language of Theravadin ("older school") Buddhism. See http://pali.pratyeka.org/ for more info on that.

sirn · on March 13, 2013

(Native Thai speaking.)

The worst thing about "not using space" when it comes to computer is that it's nearly impossible to do word-breaking/line-breaking without relying on dictionary[1] which is very hard to convince software developers not using system text engine to add support for line breaking[2].

The suffer still continues to this date, Android, for instance, still lack a proper support (break by character instead of word) and few apps lack of Thai word-breaking support at all (Twitter, whose only do word break by space).

[1]: http://linux.thai.net/svn/software/libthai/trunk/data/ [2]: https://bugzilla.mozilla.org/show_bug.cgi?id=7969 it took Mozilla 9 years. Opera, on the other hand, still lacks proper Thai support.