As a person living with the CJK languages (I’m specifically a Korean), I find th...

SiVal · on Oct 29, 2019

For decades I've been saying that all text input should go through OS-level IMEs (input method editors). For more complex scripts, the need is obvious, but even writing in English, we can get great benefits from a system that expands abbreviations, replaces easy-to-type sequences with proper Unicode chars, runs little scripts and inserts the output, gives you quick dictionary/thesaurus lookups, gives you emmet-style powers, etc., whenever you're writing and in any app.

Of course, everyone's preferences will be different, so you'll get default IMEs with default configs, but with the idea that you can reconfigure or replace them entirely with systems that work the way you want to work everywhere you input text. There have been utilities that do this sort of things for decades, but they've always been treated as clever hacks rather than standard text input.

In other words, instead of powerful text input methods being the exception, they would be the rule, and apps that didn't use them would be the exception.

goranmoomin · on Oct 29, 2019

> I've been saying that all text input should go through OS-level IMEs

This is so true, all commercial OSes that take i18n seriously do this, while most open source OSes' community (which communication revolves around English) decided that IMEs are an add-on for CJK people.

It's a pity, and this one reason is enough for Linux to be never adopted for ordinary users in the non-western world.

mikekchar · on Oct 29, 2019

That's not really fair, though. Linux users still overwhelmingly use X windows. There is no standard IME for X. This is just a historical issue. X Windows is OLD -- far older that Windows or Mac (even the old Mac), for instance.

My biggest problem with IMEs in free software land is that we've had groups like Gnome lean on IME developers and push through their vision of how it should work -- even though the people pushing their vision don't use IMEs. I ended up migrating to FCITX just because it was the last hold out not to cave to pressure.

shadowgovt · on Oct 29, 2019

Old really means there's been more time to identify and solve the problem, and the fact it hasn't been cleanly solved is a lot more indicative of priorities than time or tech.

Microsoft and Apple have the incentive of selling to billion-user markets. The open source community, on average, appears to have demonstrated a lack of interest in opening up the user-base further (and expecting that user-base to just roll their own solution creates multiple catch-22 and tragedy of the commons problems).

Why one of the commercial open-source vendors hasn't taken this on as a core challenge, I do not know.

hsivonen · on Oct 29, 2019

> There is no standard IME for X.

Technically true, but for practical purposes IBus is the standard. That's what Fedora and Ubuntu use out of the box. Firefox beta telemetry shows about 89% IBus vs. 11% FCITX.

> My biggest problem with IMEs in free software land is that we've had groups like Gnome lean on IME developers and push through their vision of how it should work -- even though the people pushing their vision don't use IMEs. I ended up migrating to FCITX just because it was the last hold out not to cave to pressure.

What wrong thing did the Gnome folks push for?

mikekchar · on Oct 30, 2019

Insisting that there can only be one input type for the entire session rather than one per window. If you are using multiple languages which each require an IME, Gnome's interpretation is completely broken. I had to stop using IBus because of it. It is possible they have changed their mind since then (several years back), but I haven't followed it. Incidentally, it was also the thing that meant I had to stop using Gnome. Before that I was a happy Gnome Shell user :-(

hsivonen · on Oct 30, 2019

Gnome appears to have fixed this.

efdee · on Oct 29, 2019

That's not really fair either. X Windows might be older, but neither Mac OS nor Windows had anything like this in their older versions.

Some things evolved, some things didn't. O hi X Windows!

rmah · on Oct 29, 2019

MacOS had rather robust internationalization support 25 years ago. Non-standard, but still.

pvg · on Oct 29, 2019

far older that Windows or Mac (even the old Mac)

They are all very nearly the same age.

musicale · on Oct 29, 2019

Pretty much. The W window system for the V distributed system predates the Macintosh (as does the Apple Lisa/1983), and W was ported to Unix in 1983. Their immediate successors - X and the Macintosh - came out in 1984, and Windows 1.0 in 1985.

Windows 1.0 was fairly primitive, but Windows 2.0 supported overlapping windows (!) in 1987, coincidentally the year that X11 was released.

rodgerd · on Oct 29, 2019

> That's not really fair, though

It's completely fair. Most of the developer community just doesn't care.

hsivonen · on Oct 29, 2019

> This is so true, all commercial OSes that take i18n seriously do this,

Does Windows really?

> while most open source OSes' community (which communication revolves around English) decided that IMEs are an add-on for CJK people.

IIRC, Fedora ships an iOS/Android like Latin-script IME with autocomplete. It's not mandatory, though.

> It's a pity, and this one reason is enough for Linux to be never adopted for ordinary users in the non-western world.

The Korean situation doesn't really generalize. Among CJK, AFAICT, the Korean IBus IME on Ubuntu 18.04 is pretty broken but the Japanese IME and the various Chinese IMEs appear to be at least OK. (It's a rather surprising situation considering that the Hangul part of a Korean IME should be much simpler than the Japanese and Chinese IMEs.)

WorldMaker · on Oct 29, 2019

> Does Windows really?

Yes, the signs that even English is passing through Windows' IME infrastructure in Windows 10 is pretty minimal by default (in Desktop mode, Tablet mode immediately turns on a couple more), but at this point Windows 10 makes almost all of it opt-in. Some of it is referred to as accessibility tools from an English perspective, because IMEs are also useful for accessibility.

The "big one" IME for most English users is the Emoji Keyboard accessible with Win+. or Win+; (whichever you prefer). It's really interesting how well emoji have helped Latin script users with further understanding the complexities of Unicode, fixing old ugly bugs in Unicode handling, and even introducing some such users to an IME that they want to use (sometimes every day).

Under Settings > Devices > Typing > Hardware Keyboard you can turn on the IME "Show text suggestions as I type" even on a hardware keyboard in Windows 10, which gives you mobile-style auto-suggestions (you can also turn on mobile-style autocorrect even on a hardware keyboard).

hsivonen · on Oct 29, 2019

> The "big one" IME for most English users is the Emoji Keyboard accessible with Win+. or Win+; (whichever you prefer).

No, API-wise the on-screen keyboard generates keystrokes for emoji (astral keystokes, multiple keystokes for multi-scalar-value emoji) even to an IME-aware app. In contrast, the emoji picker built into the Windows 10 Pinyin IME enters emoji via IME API.

> Under Settings > Devices > Typing > Hardware Keyboard you can turn on the IME "Show text suggestions as I type" even on a hardware keyboard in Windows 10, which gives you mobile-style auto-suggestions (you can also turn on mobile-style autocorrect even on a hardware keyboard).

Thanks!

WorldMaker · on Oct 29, 2019

The emoji keyboard may not be entirely using the IME API, but it does do some IME-like things even in English. The big thing I'm thinking of is the way it works when you type English words to search the emoji. I think it still most often defaults to passing the keys along to the application as well and replaces it with keyboard keystrokes or selection APIs, but I have seen it sometimes do the IME thing that the text you are typing is shown underlined and not sent to the application. Though in mentioning it, I don't recall the exact combination of app and emoji I was trying to find where I saw that happen or know precisely enough why it would vary in order to reproduce it just now, and maybe that was just a difference between early Insider versions of the emoji keyboard and current operations or something similar that I'm misremembering.

That said, even if it isn't using the IME APIs directly in most cases, it's still useful as a teaching tool/analogy tool/example tool to English speakers of what an IME can be like to use, even if a nice-to-have for an English writer versus a necessary required tool for other languages.

munmaek · on Oct 29, 2019

On debian 9/10 ibus works for the Korean IME (both Hangeul and Hanja input).

Windows (10 at least) has native support for these IMEs; setting it up on Windows was much easier than on Linux (which doesn't even have a standard IME). Windows also comes with basic CJK fonts, which different distros may or may not have. On Debian I have to install noto-cjk or Adobe's source han fonts.

I originally tried fcitx but had to switch to ibus. Definitely not a great experience.

hsivonen · on Oct 29, 2019

There isn't a single IME setup experience on Linux.

In my experience, Debian is worse than Fedora, Ubuntu, and openSUSE. Fedora is better than Windows 10 when it comes to IMEs: Fedora installs the IMEs by default. Ubuntu, like Windows 10, installs IMEs when you request the addition of an IME-requiring language. OpenSUSE gives you an IME for the language you use at install time if you install openSUSE using an IME-requiring language.

I haven't tried Debian 10, but when I installed Debian 9 _in Japanese_ with a _Japanese keyboard layout_ chosen, the installer didn't bother to set up a Japanese IME!

Fedora comes with an OK set of Noto CJK fonts by default. Ubuntu comes with a minimal set of Noto CJK fonts by default, but when enabling Chinese, Japanese, or Korean, Ubuntu drops more language-appropriate fonts on the system, like Windows 10.

munmaek · on Oct 30, 2019

I didn't mean to use italics here. Oops.

Pxtl · on Oct 29, 2019

> For decades I've been saying that all text input should go through OS-level IMEs (input method editors).

Which is why it's infuriating when the built-in widgets of a platform don't include simple operations for obvious workflows like filtering characters, masked input, etc. That's what leads developers to have to roll their own implementation using keydown/keyup.

I can't tell you how many GUI frameworks have forced me to roll my own numeric input widget.

ygra · on Oct 30, 2019

Wouldn't it be enough if there was an obvious event for actual text input? WPF for example has that and it's the obvious and IME-friendly choice, since of course not every keystroke results in a character (dead keys exist too, after all).

polm23 · on Oct 29, 2019

Totally agree. The best use case here is hooking up your keyboard to a personal database to autocomplete notes you saved, like locations, bookmarks, and so on.

I have a note-taking system that I use to store all this data, and after I set it up years ago I looked at implementing an IME to search my notes and get a link to an entry, but every platform has its own byzantine IME API and I never got anything working. I've been thinking it might be easier to just make a cross-platform program that interacts with the clipboard rather than being a true IME.

In general one effect I'd like to see from having flexible IMEs would be allowing logging and chat could be left to separate apps. Imagine a birdwatching (or whatever) keyboard that lets you pick the birds you saw, their age, behavior, and punch it in while you're in the field, and have it output in structured form into whatever app is handy.

saurik · on Oct 29, 2019

So that sort of makes sense, but then I realize I am looking for all of those behaviors to be different in different contexts (such as writing code vs. writing an essay).

TheSmiddy · on Oct 29, 2019

The OS knows the destination app (and with the correct apis the context within that app) so this would be trivial to implement once the base exists.

tempguy9999 · on Oct 29, 2019

> we can get great benefits from a system that expands abbreviations, ... emmet-style powers

I can't agree here (excepting perhaps the unicode char replacement which a non-english speaker needs to comment on the viability of, vs actually having a non-english keyboard (excepting huge alphabets like chinese where that is used already AFAIK)).

Writing is fundamentally a thinking process, the text entry for a touch typist is relatively quick. If you are to propose expanding abbreviations, how much time do you expect to save? I mean, actually measured it?

> runs little scripts and inserts the output

What is the purpose of this?

> gives you quick dictionary/thesaurus lookups

I use those perhaps once a week or less. If you use it say 3X a day, you'd save little time, perhaps a couple of minutes.

Dunno what emmet powers are though.

MS word & libreOffice does some of what you want and the first time I install them, I spend several minutes tracking down each setting and turning them off - they drive me bonkers. They think they know what I want but they don't. Touch typists can hit many keys a second and kind of pipeline their typing. Having the input modified automatically is rarely useful IMO.

Your idea may be good but like many other ideas such as graphical programming, except in restricted cases they don't work. Perhaps if you measured it I'd be convinced but I can't accept it now as an obviously great benefit.

TeMPOraL · on Oct 29, 2019

> Writing is fundamentally a thinking process, the text entry for a touch typist is relatively quick. If you are to propose expanding abbreviations, how much time do you expect to save? I mean, actually measured it?

Not quick enough. I think faster than I type, and I type fast. It's fine until I start getting impatient with myself. Call it micro-impatience, a flash of irritation in which you're suddenly conscious of not having finished typing in the thought. It's distracting.

Honestly, I like parent's idea; a good chunk of Emacs's awesomeness and the reason people like me use it as an operating system is because of that - unified, fully configurable and expandable text-based interface. I often wish to have something like it system-wide, because standard UIs are very far from optimum ergonomy-wise. But then again I wouldn't trust Apple or Microsoft to do it right; they'd quickly find a way to dumb it down, or restrict the extensibility in the name of security.

lioeters · on Oct 29, 2019

> the reason people like me use [Emacs] as an operating system is because of that - unified, fully configurable and expandable text-based interface. I often wish to have something like it system-wide..

I can totally imagine that. Underneath all the GUI layers, every operating system and application has (or has the potential of) a fully text-based interface. There's just no standard or integration, and tools that allow that (like a system-wide middleware) haven't caught on, I guess. Maybe in an alternate historical timeline, such a feature could have been a fundamental layer of an OS.

From the grandparent comment:

> a system [with powerful text input methods] that expands abbreviations, replaces easy-to-type sequences with proper Unicode chars, runs little scripts and inserts the output, gives you quick dictionary/thesaurus lookups, gives you emmet-style powers, etc., whenever you're writing and in any app.

Yes, yes - and the last point: in any app. I picture it like how TCL can script other programs, even ones that weren't designed to be "remote controlled".

madhadron · on Oct 29, 2019

> Underneath all the GUI layers, every operating system and application has (or has the potential of) a fully text-based interface.

Why do you say this? There is nothing fundamental about text. Is there something fundamental about text in Smalltalk? Or AmigaOS?

> There's just no standard or integration, and tools that allow that (like a system-wide middleware)

COM on Windows. Scripting interface to apps on MacOS. They're there.

lioeters · on Oct 29, 2019

Yeah, I had some vague doubts while I was writing that comment. I guess I meant "text-input based", or maybe better to say "keyboard based" with a system-wide/application-agnostic middleware of some kind.

madhadron · on Oct 30, 2019

"Accessible via a programming language" perhaps?

TeMPOraL · on Oct 30, 2019

Not that. More like, UX paradigm more fixed and forced on applications, but also being customizable and user-programmable externally to any given application. So that e.g. you could have a system-wide autocomplete/code completion, whether you're in a code editor or text editor or in a dialog box of some other program somewhere; that system-wide autocomplete would be configurable and trivial to extend or replace wholesale with another widget.

This is a reality within Emacs (which really is a 2D text OS running lots of applications inside, including a text editor), and being text-based does play a role. When it's very hard to draw arbitrary pixels on screen and most of all apps deal with text, it's easy to make a large set of very powerful interface tools, and it's easy to pull data out of an app and put data into it, whether the app intended it to happen or not.

In the back of my mind, I sometimes wonder how something like Emacs could be made with modern browser canvas, to enable cheap rich multimedia, while retaining the ability for inspection and user-programmability. Introducing arbitrary GUIs is hard, because next thing you know, half of the stuff is drawing to canvas directly and it's all sandboxed away from you.

lioeters · on Oct 30, 2019

> user-programmable externally to any given application

I think this is why it reminded me of TCL, specifically the "expect" command that can script apps that know nothing about it. From the Wikipedia page, the TCL Expect extension: "automates interactions with programs that expose a text terminal interface".

So how I imagine this "Emacs as an OS" paradigm you're describing, is that it mediates interactions with any and all apps that expose a text input/edit interface, to allow programmatic customizations.

Like I'd love to script my own shortcuts for Firefox (or other apps) - possibly with multiple steps, taking input from some config file, or sending a link to another app.. Or, as you mentioned, Emmet-style expansions that work in any input field or textarea..

tempguy9999 · on Oct 30, 2019

Excellent post, now I really see what you're getting at now.

tempguy9999 · on Oct 29, 2019

To address just one point, in emacs you've dabbrev-expand (bound to M-/). I like it and use it but it is not automatic. I have to invoke it myself which means it can't get in the way.

If you want larger clumps of code then you have various options such as skeleton mode, but again that's something the user has to ensure happens - again they remain in control.

> But then again I wouldn't trust Apple or Microsoft to do it right

Oh hell yes!

TeMPOraL · on Oct 29, 2019

> I have to invoke it myself which means it can't get in the way.

For the sake of completeness, you can always make it automatic. All it takes is to add a function to post-self-insert-hook, and make it e.g. call dabbrev-expand if you pressed space twice. So you can have it any way you like - manual, automatic, semi-automatic. You're in full control.

> If you want larger clumps of code then you have various options such as skeleton mode

Yes. I currently use yasnippets for code. Still, my favourite yasnippet is one I use in comments - it expands "todo" into: "TODO: | -- [my name], 2019-10-29.", and similarly for "note", "hack" and "fixme". | is where the caret ends after expansion.

That's the kind of flexibility I wish my OS had. Unfortunately, it goes against the commercial interest of mainstream OS providers.

chrisweekly · on Oct 29, 2019

IIUC the parent was suggesting an OS-level system that _supports_ these features natively, as the foundation layer for any number of userland tools to sit atop... vs your compelling arg for why said features must be straightforward to disable. I don't see a conflict.

tempguy9999 · on Oct 29, 2019

True, upvoted. My point was that these facilities are of questionable value (I'd like to see how much time they really save, or indeed even lose when triggered accidentally), and that they have to be in easy control of the user. With MS there's too much "we invented it so you're getting it", and bad designers (who always outnumber good) will do the same.

Actual example, was working with vis studio with another guy. Open a bracket and VS automatically added a closing bracket. That is fucking annoying and saves you a whole keystroke while breaking muscle memory and interfereing with our work. We had trouble turning that off.

maest · on Oct 29, 2019

I don't have strong feelings about what the GP says, but:

> I use those perhaps once a week or less. If you use it say 3X a day, you'd save little time, perhaps a couple of minutes.

Maybe you only use them so rarely because they're not convenient to use.

tempguy9999 · on Oct 29, 2019

I use them rarely because my vocabulary is reasonably large. Also, given a choice of words I'll prefer the more conventional one.

For people learning, perhaps it could be a good thing.

JadeNB · on Oct 29, 2019

This seems an awful lot like arguing that, since you are happy at the CLI, there's no need for a GUI. Widespread availability of rich OS-level IMEs wouldn't hurt anyone, and could help everyone. (Even for someone like you who wants to pass through raw hex, it's easier to tell that once to the OS rather than to have to argue with each app individually.)

tempguy9999 · on Oct 29, 2019

I was unclear. I'm not against anything, just saying let the user control it, just make sure some great idea actually works (user testing shows many great thing aren't; people are complex and so are their mental models) then not force it on users.

zbraniecki · on Oct 29, 2019

Hi! I work on Firefox! Would you be open to share your experience and file bugs on issues you encountered?

I'd love to help with the CJK support in Firefox!

ken · on Oct 29, 2019

I think the point is to get native behavior, you need to use the native functionality. When you try to emulate it, you'll always be missing something.

We see this in every corner of Firefox: the text editing, the toolbar, the context menus, the form controls, etc. Yet Firefox seems to be all about writing everything from scratch. There are many bugs filed for using native functionality, like [1], that have been open for decades with no activity. 20 years ago, it wasn't done because "it's a time thing", and since then it's racked up other bugs as dependencies because stuff's just broken.

This isn't a situation where you can tweak a couple little problems and call it done. This is a fundamental change in the Firefox architecture. Asking for more bug reports is not going to help.

[1]: https://bugzilla.mozilla.org/show_bug.cgi?id=34572

hsivonen · on Oct 29, 2019

Firefox uses native IMEs. Most of the time they aren't broken in obvious ways. To the extent they are broken in non-obvious ways, it's unhelpful not to say how on the level that would allow the problem to be reproduced.

Firefox can't just use native text edit controls. First, Firefox needs to support contenteditable, which doesn't map to an OS-supplied text box. Second, the multiprocess architecture leads to a situation where the UI process talks to the native IME API but the Web content process hosts the text being edited, so native text boxes don't work even for things that look similar to OS-supplied text boxes.

fouc · on Oct 29, 2019

OR better yet, use the base OS editor support.

vezycash · on Oct 29, 2019

Hi. Sent in feedback yesterday concerning addon privacy. Specifically, the option to grant or restrict addon access on per site basis.

I'd want to be able to right click an addon icon in the toolbar and click, "Don't run on this site." And have even more options in the extension detail page.

zapzupnz · on Oct 29, 2019

This isn't relevant to the thread, nor to what the Firefox dev was asking for.

innocenat · on Oct 29, 2019

> On Linux, it’s terrifying; I’ve never seen any app that allows input systems to work naturally, and after a week of use you get used to pressing space & backspace after finishing every Hangul word. The Unix-style composability they want (apps should work whether or not input methods are used - and looks like Linux users that use Latin characters don’t use any input methods (opposed to macOS where Latin characters are input by a Latin input system), so looks like this state will persist.

Living in Japan and using Linux almost all the time, I never remember having any problem with typing in Japanese whatsoever.

kbumsik · on Oct 29, 2019

Korean case is totally different from Japanese. I guess Korean is very unique in this area, because Korean Hangul characters consists of multiple small characters. Input systems needs to have a state system to allow type multiple small characters to complete a character. The problems usually come from improper state systems.

hsivonen · on Oct 29, 2019

Korean is different from Japanese, but as far as IME complexity goes, a Hangul IME should be extremely simple to develop compared to a Japanese IME. (The Hangul part of a Korean IME doesn't need any pop-ups like a Japanese IME does. As far as UI requirements go, the Hangul part of Korean IME can be as UIless as a Vietnamese Telex IME.) That e.g. on Ubuntu 18.04 the Korean IME is broken is not due to anything intrinsic to the writing system.

oefrha · on Oct 29, 2019

Yes, Korean (specifically Hangul) additionally suffers from NFC/NFD issues which is not experienced by Chinese or Japanese. I’ve had the privilege (/s) to work with Korean file names in the past and it was a nightmare.

int_19h · on Oct 29, 2019

It's not quite unique - there are many other scripts that behave in a similar way, e.g. for some Indian languages.

Izkata · on Oct 29, 2019

How is that any different from Japanese Kanji?

munmaek · on Oct 29, 2019

Hangeul is not an alphabet. It's an alphabetic syllabary. ㄱ is one character but it needs to be composed into a final glyph like 각, which require 2+ characters. 가, 각, 갉. 뷁.

Kanji (not to be confused with Hiragana or Katakana) are different because the characters are already composed.

Izkata · on Oct 29, 2019

See my other reply [0]; as far as I can tell, the typing experience is identical.

[0] https://news.ycombinator.com/item?id=21390039

munmaek · on Oct 29, 2019

Yeah I had a brain-fart earlier. I totally forgot about that. Typing hangeul is basically the same except there's no need to possibly choose different hanja/kanji. ...unless you press the hanja key after the glyph is typed but before moving onto the next glyph. (Usually F9 or F10, iirc Windows IME defaults to ctrl+space).

Qwertystop · on Oct 29, 2019

Kanji is logographic: each symbol is a complete word, phrase, or idea. Hangul is alphabetic-syllabic: each symbol is a segment (vowel-or-consonant), except that they're written in two- or three-letter blocks each representing a syllable.

carlmr · on Oct 29, 2019

I'm not sure it's that different IME wise. I only know Chinese pinyin IME, but I assume since Kanji in contrast to Chinese can map to multiple syllables you probably need to keep the state of the last few syllables as well and then let the user choose the appropriate Kanji (if available).

With pinyin input it's the same in Chinese. You enter the Latin characters and the IME gives you options to select from. Also even abstracting from the single syllables you often can narrow down the selection of compound words in Chinese IME if you continue typing. So again state is important.

Izkata · on Oct 29, 2019

Correct, this is why I'm wondering. Input is done through hiragana such as "わたし" (or even a step further removed, transliterated from "watashi") and then the IME is triggered to convert it to "私" (or other matches).

On a laptop, my experience is that romaji -> hiragana is as-you-type due to being unambiguous, while the default trigger for hiragana -> kanji is the spacebar, same as described for Korean. Hence my confusion as to how it's different - the individual characters certainly are, but it sounds functionally identical.

On my phone just now, typing this comment, the experience was switching to a Japanese keyboard and inputting the hiragana directly, then the kanji suggestions appeared where autocorrect suggestions normally would.

hsivonen · on Oct 29, 2019

Typing Hangul is just like typing Latin, Greek, Cyrillic, Hebrew, Arabic, etc., text: one alphabetic unit at a time. The only thing a Hangul IME does is it groups the typed jamo into syllables. The grouping is unambiguous, so there's no need for popups or space presses to guide the grouping.

(If there had been the kind of rendering technology that is used for Indic text today back when Korean text processing on computers started, chances are that the syllable grouping would be handled as rendering-time shaping and not as an input-time IME issue.)

Additionally, Korean IMEs have a feature to convert a word into Hanja, but it's something you need to take action to invoke as opposed to Japanese IMEs offering to convert to Kanji by default.

Izkata · on Oct 29, 2019

> The grouping is unambiguous, so there's no need for popups or space presses to guide the grouping.

...that's the opposite of the comments that triggered my question. Multiple people said space is needed to trigger it, then backspace to remove an erroneously-added space character.

Is the answer that they're using it wrong, and are actually inputting a space directly because the IME already acted for them?

cyborgx7 · on Oct 29, 2019

Learning japanese and full-time Linux user.

Getting Japanese writing to work was a pain in the ass and still doesn't work everywhere.

innocenat · on Oct 29, 2019

I use ibus and mozc (i.e. what come with Ubuntu) and never have any problem.

ptero · on Oct 29, 2019

This is interesting. Not doubting your story, but my personal experience, as a Cyrillic user, is opposite: at least in early versions of Windows apps I constantly struggled with a random mix of hardcoded assumptions for encodings, key presses, characters and data stored which often produced gibberish on screen.

When I switched to Linux everything just works out of the box: I can copy and paste text between gvim, xterm, etc. with no issues. I admit that this is likely due to app writers, not underlying OS. And my experience is only with single-byte characters. Just my 2c.

weeb_throwaway · on Oct 29, 2019

Another issue with CJK is vertical right-to-left writing: https://www.w3.org/International/articles/vertical-text/

It's pretty popular with japanese novels/manga and traditional chinese but really hard to do without bugs on the web.

L_Rahman · on Oct 29, 2019

Hadn't fully registered till this comment, the degree to which the modern web is anchored to horizontal (usually left-to-right) writing and the design patterns of vertical scrolling that come with that assumption.

rcthompson · on Oct 29, 2019

It's not just the web, it's all of computing, even down to the hardware design. A vertical scroll wheel is standard on all mice. A horizontal scrolling method of some sort is not, and even on mice that include one, it's usually not as good as the vertical one (e.g. leaning the wheel left and right).

hsivonen · on Oct 29, 2019

> I use macOS & Linux, and while the default text handling system called Cocoa Text System in macOS handles input methods well, almost all applications that implement it’s own, like big apps like Eclipse and Firefox, don’t get this right.

What specific IME problems do you have with Firefox on Mac?

> On Linux, it’s terrifying; I’ve never seen any app that allows input systems to work naturally, and after a week of use you get used to pressing space & backspace after finishing every Hangul word.

Do you mean you have to press space twice and erase the second space? With IBus?

dfcowell · on Oct 29, 2019

There are no spaces between words in Chinese or Japanese.

Pressing space confirms the current selection in the Japanese IME, which is expected behavior. Where some Linux implementations get it wrong is they also insert a space after the word, meaning the user has to select the desired word in the IME with the space bar and then remove the erroneous inserted space.

Edit: Correction based on feedback below. Previously stated that Hangul does not have spaces.

jfk13 · on Oct 29, 2019

No, Korean is normally written with spaces between words nowadays (perhaps that wasn't always the case?).

https://www.omniglot.com/writing/korean.htm

tasogare · on Oct 29, 2019

> There are no spaces between words in Hangul

Wrong, there are spaces between words in Korean. It’s in Japanese and Chinese that there isn’t. And in Vietnamese there are spaces between everyone syllables, even in words.

dfcowell · on Oct 29, 2019

Thanks for the fact check.

Not sure your comment on Vietnamese is accurate though. I work in a company with ~35% native Vietnamese speakers and I’ve seen plenty of multi-syllable words.

Are you talking about traditional Vietnamese (when it still used Chinese characters) or modern Vietnamese (post-French-colonialism) which uses the Latin alphabet with accents?

gdx · on Oct 29, 2019

It is accurate, there are spaces between each syllable in modern written Vietnamese, except in foreign words. The syllables can have as many as 7 characters, and you need an IME to type the tone marks. The written language looks like this: https://vi.wikipedia.org/wiki/Vi%E1%BB%87t_Nam

dfcowell · on Oct 29, 2019

Wow, you’re right.

I never noticed (even while studying the basics myself) that syllables are space-separated.

I always saw the words (e.g. “thanh pho” = city - don’t have the keyboard on my phone) as independent units. Didn’t even recognize the spaces.

Amazing how something can be right in front of you without noticing it.

So much makes sense now. Thanks again.

hsivonen · on Oct 29, 2019

> you need an IME to type the tone marks

The standard Vietnamese keyboard layout works without an IME layer. However, apparently most people who write Vietnamese _prefer_ to use an IME.

bicolao · on Oct 29, 2019

Vietnamese does not really belong in CJK group because it's written with Latin alphabet.

tasogare · on Oct 29, 2019

That was not my point. I mentioned Vietnamese because the spacing it uses is interesting.

Also like dfcowell said, Vietnamese used to be written with han & chu nôm characters (respectively Chinese characters and Chinese-like characters created by Vietnamese), a lot of which are encoded in Unicode. Hence the existence of the CJKV acronym.

kevin_thibedeau · on Oct 29, 2019

It is actually CJKV to deal with historical Vietnamese.

From Unicode spec:

> Although the term “CJK”—Chinese, Japanese, and Korean is used throughout this text to describe the languages that currently use Han ideographic characters, it should be noted that earlier Vietnamese writing systems were based on Han ideographs. Consequently, the term “CJKV” would be more accurate in a historical sense. Han ideographs are still used for historical, religious, and pedagogical purposes in Vietnam.

dfcowell · on Oct 29, 2019

Traditional Vietnamese (before French colonialism) used Chinese characters.

munmaek · on Oct 29, 2019

> On Linux, it’s terrifying;

Are you using ibus? I'm on debian and once a character is finished, it automatically moves onto the next character (unless you press space, enter, etc). All I have to do is ctrl+space to switch input methods.

I've had issues with terminals like the character building not working in alacritty. The only major annoyance I've found is having to install and configure ibus/ibus-daemon, and CJK fonts.

microcolonel · on Oct 29, 2019

> On Linux, it’s terrifying; I’ve never seen any app that allows input systems to work naturally

Kind of interested to hear what sort of input method you're using. Even my terminal emulator supports Japanese input methods well, through IBus. Maybe it's just that as a non-native I'm more often going to convert chunk by chunk anyway, I have noticed that some input methods do not do bulk conversion well; I must say I never thought hangeul would even warrant conversion other than composing the syllables, is it because you mix in hanja conversion? I think that your problems will mostly a matter of the quality of the input method, IBus and the input method system in GTK+ at least seem to be not preventing anyone from writing better input methods.

I feel it with Firefox though, then again Firefox is very poor quality software in my experience, almost everything is at least subtly wrong, and there seems to be more interest in niche feature work than basic work on product quality. I could in some ways say the same for Eclipse, every time an SDK I want to use is only documented in terms of their special Eclipse frontend, I get a bit depressed.

usr1106 · on Oct 29, 2019

> and looks like Linux users that use Latin characters don’t use any input methods

Right, I prefer slim systems and I typically uninstall everything input method related that my distro has chosen to preinstall. I cannot read or memorize a single CJK character, so why would I need that.

Aditionally programming and computers for me means English, although that is not my mother tongue. I would never install anything in my mother tongue.

Basically I use my mother tongue (and a couple of other European languages I speak) only in Email, chat or maybe some web form. I can feel your pain though, because 10+ years ago we had the same problem with the couple of non-ASCII characters you need in most European languages.

In order to have the situation in Linux improve there just need to be enough CJK contributors to fix existing bugs. And reviewers / unit test cases to make sure we Westeners don't break it again with our next commit.

goranmoomin · on Oct 29, 2019

> Right, I prefer slim systems and I typically uninstall everything input method related that my distro has chosen to preinstall. I cannot read or memorize a single CJK character, so why would I need that.

Yes, I'm exactly talking about this mindset. This is basically why Linux has such poor input method support. Because English has a special privilege of not needing input methods to be input in, combined with the fact that the majority of Linux application programmers use English only, that means basically all apps that don't consider i18n seriously are by default 'wrong', opposed to apps running on Windows/macOS which are by default 'right'.

> Basically I use my mother tongue (and a couple of other European languages I speak) only in Email, chat or maybe some web form.

Does that mean European languages are able to being input without special input methods?

> I can feel your pain though, because 10+ years ago we had the same problem with the couple of non-ASCII characters you need in most European languages.

The non-ASCII characters fit in the character array model that most western people think in, and as a plus they are fittable in the upper half of ASCII.

Asian CJK languages require a different model from the western ones.

> In order to have the situation in Linux improve there just need to be enough CJK contributors to fix existing bugs.

It's a failing fight. That only works on an ideal world where every program has enough contributors. Thats not true.

usr1106 · on Oct 29, 2019

>> Right, I prefer slim systems and I typically uninstall everything input method related that my distro has chosen to preinstall. I cannot read or memorize a single CJK character, so why would I need that.

>Yes, I'm exactly talking about this mindset. This is basically why Linux has such poor input method support.

Don't understand me wrong. That input methods are useless for me, because I know zero CJK characters, does not mean I think they are useless to everybody or Linux in general. How would I help the CJK users by having something installed I never use and I have no knowledge to use?

goranmoomin · on Oct 29, 2019

> That input methods are useless for me, because I know zero CJK characters, does not mean I think they are useless to everybody or Linux in general.

Yeah, but every app would 'just work' if we have a level of indirection with an IME by default, even for languages with Latin characters.

I mean, there is a reason why Windows & macOS all selects a similar architecture on text inputting.

usr1106 · on Oct 30, 2019

> I mean, there is a reason why Windows & macOS all selects a similar architecture on text inputting

Yes, there is a reason. Microsoft and Apple want to make money in CJK countries. And they have architects that make system-wide decisions.

That is not how Linux works. Companies contribute where their business is. That is server or embedded. After Canonical closed bug number 1 https://bugs.launchpad.net/ubuntu/+bug/1 no big player is interested on Linux on the desktop anymore. Individual contribute what they are interested in and what they know best. I fear most Westerners don't understand the challenges of CJK and other more "complicated" scripts. I myself "blame" Americans if they do it wrong and something accepts only 7 bit ASCII. I can fully understand if CJK or right to left people blame us "8 bit people" (character set, not coding) for doing it wrong. We just don't get it, that's a fact. But I don't think studying Korean etc. is a realistic solution. The only way to change it is to have more people and companies contribute that a) need it and b) really understand the user needs.

usr1106 · on Oct 29, 2019

> Does that mean European languages are able to being input without special input methods

I cannot talk about East-European languages or really small languages. But for the bigger West and Central-European languages the answer is yes.

Every character is either on the national keyboard or (if typing another language) can be typed using dead-key accent or AltGr. Sometimes the compose character is needed, but I need that so rarely that I forget the combinations...

hsivonen · on Oct 29, 2019

> Does that mean European languages are able to being input without special input methods?

Yes. Dead keys are logically like a tiny IME, but on Windows and Linux they aren't API-wise IME-related.

6gvONxR4sf7o · on Oct 29, 2019

>Aditionally programming and computers for me means English...

Linux isn't just for programming. I personally hate that I have to dual boot because ubuntu can't do what windows or mac can. Anyhow, English has plenty of words containing non-keyboard characters, they're just infrequent.