Here's a basic explanation of the diacritic notion as it applies to Asian scripts.
Thai belongs to a family of scripts known as abugidas. Abugidas include pretty much all South Asian and many Southeast Asian scripts, for example Burmese, Cambodian, Dai, Lao, Thai, etc. They all pretty much derive from Brahmi, which was the proto Indian script. You can see an example of Brahmi over here: http://en.wikipedia.org/wiki/Brahmi
Abugidas are based upon combining multiple glyphs in to syllables, often allowing glyphs above, below, to the left and to the right of the initial consonant, and often including a closing consonant. Most glyphs tend to be consonants, though some are vowels, and others can be special marks for indicating tone or other notions. Often shorter vowels are excluded (as in Modern Standard Arabic).
In old times, such scripts were handled with wacky font-hacks. However, with Unicode, there are some super complex algorithms that make glyphs combine both visually (when typesetting) and logically (when saving/searching/etc). You can actually type a character and a diacritic and it can sometimes automatically combine to form a single character, if such a beast exists, not just visually but when saving to disk.
What makes it even more confusing is that South Asian scripts in particular have mega-combo characters, where whole chunks of glyphs sort of fold in to flowing short-hand symbols. In the case of Sanskrit, I believe loads of these were used in history but few are used these days.
I think that's a fair pontification - corrections welcome!
I wouldn't call them "mega-combo characters" ;-) Typically, these are at most two or three letters written together, and they are very much used today in Sanskrit, Hindi and other regional variants such as Bengali etc.
Of course you also have multiple words that have been combined into one large compound word, by way of appropriate linguistic rules of combining sounds. This is similar to long compound words in German.
This is the term I'm familiar with. They are similar to ligatures which exist in the Italic scripts e.g & (e+t), œ (o+e), German ß (ſ(long s)+s) etc. But conjuncts in Devanagari are much more numerous and widely used, probably because the individual consonant forms are fairly complex.
i have learned the thai alphabet and that "mega-combo character" comment just made my day. Reading thai pretty much feels like reading regexps. Anditdoesntmakeitanyeasierthattheydontusespaces
Actually Thai only has roughly twice the number of characters that we have in Roman scripts - excepting tones (a real pain) it is possible to learn pretty quickly. Lao by contrast has less, but has a rather tricky plethora of vowel combinations for a myriad of hard to distinguish eww, ieww, ooh, iuooh type sounds. :) Cambodian has no tones and is my pick for the one to go for if you are keen on an easy starter.
Tangential tidbit: I sent a copy of The Cambodian System of Writing (http://pratyeka.org/csw/) to TPB's anakata while he was solitary confinement to help him stave off boredom. No idea if he ever read it, though his mother assures me it arrived.
Besides ก (0xe01), the rest of the Thai characters (in 0xe02-0xe59) are letters ขฃคฅฆงจฉชซฌญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรฤลฦวศษสหฬอฮฯะาๅ, alphabetic non-spacing marks ัิีึืฺุูํ which can follow the preceding characters, and render as stacked (as shown by the OP), five "marks" เแโใไ (with unicode's Logical Order Exception) which can precede the following character, diacritic non-spacing marks ็่้๊๋์๎ which also render as stacked, modifier letter ๆ ,digits 0-9 ๐๑๒๓๔๕๖๗๘๙ ,currency symbol Baht ฿, and punctuation symbols ๏๚๛
Thai looks to be a pretty gr๏๏vy language, like many other natural languages - perhaps some programming languages will catch up in their enhanced use of lexical tokens one day, instead of just relying on grammar, long English names packed into name hierarchies, and multi-ASCII symbols.
I think I would steer new learners of SE Asian scripts away from Cambodian as a first language to learn. Cambodian, while it is a beautiful script and has more-or-less regular pronunciation, also has a few odd exceptions, and complex vowel pronunciation rules. The consonants are divided into two groups, and many of the vowels are pronounced differently in the first group than in the second. That said, after learning the Thai script, Lao and Cambodian were not too hard. Either way, they are all great languages to learn.
Actually I am almost certain Thai and Lao have those consonant divisions as well, in fact I believe 3 or 4. If I am not mistaken they are still taught and are part of the tone system and/or can affect unwritten vowel selection. More certainly, the consonant classes somehow stem from the need to preserve pronunciation of Pali, a middle-Indian prakrit language (with features not present in these SEA countries' modern languages) that is used as the littoral language of Theravadin ("older school") Buddhism. See http://pali.pratyeka.org/ for more info on that.
The worst thing about "not using space" when it comes to computer is that it's nearly impossible to do word-breaking/line-breaking without relying on dictionary[1] which is very hard to convince software developers not using system text engine to add support for line breaking[2].
The suffer still continues to this date, Android, for instance, still lack a proper support (break by character instead of word) and few apps lack of Thai word-breaking support at all (Twitter, whose only do word break by space).
Thai belongs to a family of scripts known as abugidas. Abugidas include pretty much all South Asian and many Southeast Asian scripts, for example Burmese, Cambodian, Dai, Lao, Thai, etc. They all pretty much derive from Brahmi, which was the proto Indian script. You can see an example of Brahmi over here: http://en.wikipedia.org/wiki/Brahmi
Abugidas are based upon combining multiple glyphs in to syllables, often allowing glyphs above, below, to the left and to the right of the initial consonant, and often including a closing consonant. Most glyphs tend to be consonants, though some are vowels, and others can be special marks for indicating tone or other notions. Often shorter vowels are excluded (as in Modern Standard Arabic).
In old times, such scripts were handled with wacky font-hacks. However, with Unicode, there are some super complex algorithms that make glyphs combine both visually (when typesetting) and logically (when saving/searching/etc). You can actually type a character and a diacritic and it can sometimes automatically combine to form a single character, if such a beast exists, not just visually but when saving to disk.
What makes it even more confusing is that South Asian scripts in particular have mega-combo characters, where whole chunks of glyphs sort of fold in to flowing short-hand symbols. In the case of Sanskrit, I believe loads of these were used in history but few are used these days.
I think that's a fair pontification - corrections welcome!