As a user of non-English languages, this is not really a problem in practice. We...

KirinDave · on July 1, 2019

Really. So you don't think Japanese folks just don't run into problems? Or Koreans? Or anyone using a primarily upper Unicode alphabet that is phonetic?

sirn · on July 1, 2019

Japanese and Chinese in particular can compress a lot more meaning in a byte than many other languages[1]. I pick up a random article at Nikkei.com[2] and calculate number of bytes of the first paragraph, and it's only 449 bytes in UTF-8[3]. Chinese is even more efficient at this, as you can basically fit the whole news in a Tweet.

[1]: Idiomatic Yojijukugo 四字熟語 is an extreme example for this, but there's non-idiom Yojijukugo too, e.g. 日米関係 is a 12 bytes word that translates to "United States-Japan relations"

[2]: https://www.nikkei.com/article/DGXMZO46571150V20C19A6000000/

[3]: It describes how people are walking around the park in Chicago on Jun 13 to catch a rare Pokemon with a one line interview of a son of Mr. Stuart from California.

(I speak three languages: Thai, English, Japanese)

KirinDave · on July 1, 2019

This is fair, I should have thought more about the list.

For Japanese, I don't think the way people talk casually to one another is as amenable to compression as newspaper headlines

But surely for Thai you're in a sub-optimal boat?

sirn · on July 1, 2019

This is completely anecdotal, but I have an alternative Twitter account where I interact with Japanese people I know, and I rarely hit the 140 characters limit except when I’m in a heated debate (or when I’m VERY excited about something).

For Thai, yeah, this one is a little more complicated. I’ve commented about this in sibling thread.

KirinDave · on July 1, 2019

I tried to rig up a Shavian chat group and we just hit the wall over and over. It was frustrating so we moved to Matrix.

ddevault · on July 1, 2019

私の二番目の言語は日本語だよ

KirinDave · on July 1, 2019

This makes it all the more baffling to me. You get 510 a line, but for a channel with a modest name in any other language, you get much less than that.

Let's just use a modest channel name like "#𐑥𐑨𐑔 𐑯 𐑕𐑲𐑧𐑯𐑕". I've now got a base 45 bytes without any message at all. If I want to aim a message at someone I have even less than that. Your pithy reply with a similarly modest title is 20% of the total allocation for a line, half of which is just overhead.

We run into line limits talk about category theory in #haskell even in English and folks are quite good at compressing contexts. The only alternative is to slice your messages across lines and make a confusing experience for participants.

lifthrasiir · on July 2, 2019

I do operate a Korean IRC network and a message cut in the middle (often between UTF-8 boundaries, making clients guessing a wrong encoding from time to time) is a typical sightseeing.