Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As a user of non-English languages, this is not really a problem in practice. We settled on UTF-8 years ago. My second language is pratically the worst case for bumping up against these limits and I never have an issue with them.


Really. So you don't think Japanese folks just don't run into problems? Or Koreans? Or anyone using a primarily upper Unicode alphabet that is phonetic?


Japanese and Chinese in particular can compress a lot more meaning in a byte than many other languages[1]. I pick up a random article at Nikkei.com[2] and calculate number of bytes of the first paragraph, and it's only 449 bytes in UTF-8[3]. Chinese is even more efficient at this, as you can basically fit the whole news in a Tweet.

[1]: Idiomatic Yojijukugo 四字熟語 is an extreme example for this, but there's non-idiom Yojijukugo too, e.g. 日米関係 is a 12 bytes word that translates to "United States-Japan relations"

[2]: https://www.nikkei.com/article/DGXMZO46571150V20C19A6000000/

[3]: It describes how people are walking around the park in Chicago on Jun 13 to catch a rare Pokemon with a one line interview of a son of Mr. Stuart from California.

(I speak three languages: Thai, English, Japanese)


This is fair, I should have thought more about the list.

For Japanese, I don't think the way people talk casually to one another is as amenable to compression as newspaper headlines

But surely for Thai you're in a sub-optimal boat?


This is completely anecdotal, but I have an alternative Twitter account where I interact with Japanese people I know, and I rarely hit the 140 characters limit except when I’m in a heated debate (or when I’m VERY excited about something).

For Thai, yeah, this one is a little more complicated. I’ve commented about this in sibling thread.


I tried to rig up a Shavian chat group and we just hit the wall over and over. It was frustrating so we moved to Matrix.


私の二番目の言語は日本語だよ


This makes it all the more baffling to me. You get 510 a line, but for a channel with a modest name in any other language, you get much less than that.

Let's just use a modest channel name like "#𐑥𐑨𐑔 𐑯 𐑕𐑲𐑧𐑯𐑕". I've now got a base 45 bytes without any message at all. If I want to aim a message at someone I have even less than that. Your pithy reply with a similarly modest title is 20% of the total allocation for a line, half of which is just overhead.

We run into line limits talk about category theory in #haskell even in English and folks are quite good at compressing contexts. The only alternative is to slice your messages across lines and make a confusing experience for participants.


I do operate a Korean IRC network and a message cut in the middle (often between UTF-8 boundaries, making clients guessing a wrong encoding from time to time) is a typical sightseeing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: