Curious how Twitter detects when it's Chinese or Japanese to restrict the length to 140 chars. What if it's a tweet that starts with English, then includes Chinese characters? Is that what the progress indicator is for, to let Twitter detect the language?
They've made CJK characters (including fullwidth characters of any kind, without exception for Latin letters or Arabic numerals) count as 2 characters. In a mixed alphanumeric/CJK tweet, the CJK part will simply count doubly towards your character limit.
Interesting... of course 70 characters of Mandarin would be the equivalent of over 400 characters of English. I suppose that plus goog-Translate would be another even less readable way to avoid the limit... since Chinese language speakers by and large just use WeChat.
It is a brain-dead solution: any Unicode scalar value not matching /[\u0000-\u10ff\u2000-\u200d\u2010-\u201f\u2032-\u2037]/ doubles the cost. [1] The primary range ends at U+10FF because it conveniently excludes virtually all CJK characters (Hangul starts at U+1100) with relatively low error rates. Yet, it's still brain-dead.