While I don't enjoy being negative, this doesn't sound like a solution that would work - it essentially needs 'boiling the ocean' i.e. convincing everyone else.
(That's literally the plan described in the article)
You are certainly not wrong. In this specific case however, I'd like to point out that the author, leonerd, has been busily working on "making stuff" to this end for years now.
They are the lead author of libvterm, a popular modular terminal emulator library that is for example used in neovim and emacs-libvterm.
They have also been working on libtermkey, a library that accepts input from the divised keybinding system, now part of libtickit mentioned in the post.
Can anyone point me to an article that explains why we are still stuck with these standards from the 1970s? I understand backward compatibility can be important, but I don't quite understand why terminal standards are the ONE thing in the computing world that just can not move into the future.
It can. How would you like them to move into the future? It's not as if there's been no recent evolution in terminal emulators.
Notice that evolving into the future and using old ideas are not incompatible. Modern cryptography still uses Euclid's algorithm from more than two millennia ago.
Here's one example: Why can't I interact with text at the prompt in the way I do in every other text input context on my modern computer? Selecting text with shift and arrow keys, moving the cursor with the mouse.
Keep in mind that I do not deeply understand the stack of standards and implementations that comprise a terminal. That said, my assumption is that this complaint is an unfortunate, unavoidable byproduct of that stack.
While introducing newcomers to terminal usage, there are simple actions from the gui world that do not work at all, or even seem to break the terminal display. It's extremely confusing for them, they get over it eventually, and come to accept that the terminal just can't do certain things. This is what I mean by being unable to move into the future.
As a point of comparison, consider how much of a mess it is to deal with spaces in file names, in bash. To a newcomer, this seems like a crazy hassle, some antiquated nonsense. It's hard for them to understand how experienced Linux users can stand to deal with it. It's annoying, but it's a direct consequence of the choice to use the space character as the delimiter between tokens in bash. This choice is actually super convenient most of the time, because that is the best use of the space character/key in a terse shell language.
What is the analogous rationale behind the shortcomings of the terminal paradigm? What would we need to give up, in order to make the terminal interface a little bit more modern?
One of the problems with treating terminals like other text input contexts is that they're inherently different and you're going to constantly run into issues when the context switches to displayed text.
For example, you could get a terminal that allows selecting text with keyboard but what happens when a user inevitably wants to or accidentally selects text from the non-input part of the terminal? Should the input part of the terminal and the display part of the terminal be treated differently? In Firefox as I type this I get a nice big input box where I can do multi-line paragraphs but if I start clicking and dragging to select the text in the input box it'll never let me select text from your comment, likewise in reverse, but on a terminal this would be undesirable behaviour for the common pattern of selecting text to copy output for documenting or troubleshooting.
Some terminals have plugins or scripts to allow such selection of text but not for input purposes, if you decide to only allow text selection on the input field as a means of improving text input do you just get two such methods of being able to select text? The sane approach might be to allow selecting all text but you'll end up with paper cuts where a user doesn't care about what is going on before the $ but might end up in situations where text is selected far before the $ because of a reverse search gone wrong, etc.
As an aside, terminals/bash do have some form of text-editor like functionality, readline is the usual library involved (man bash, /^readline) and has reasonable support for moving cursor around words, move cursor to character search, deleting/yanking/pasting words, etc. It's not the best text editing interface but for dealing with a single command line it's usually sufficient. There's even a vi-like editing mode (set -o vi) built into bash if you feel like you need a modal editor for a single line but it seems even less intuitive and harder to grok.
In-band signaling — or rather, in-band markup — is a big part of what makes a "terminal" (i.e. a TTY/PTY device) semantically a "terminal": a hybrid character-grid / event-log that works both as a sink for streamed-in text, and as a "plotter" for making line-printer art. A device that both streams out as, effectively, an input-event log (where this log can be captured for replay, as with script(1) or most logging systems); but which also maintains a notion of being a ring-buffer "containing" a rectangularly-bounded volume of text (and control-events), such that clients just connecting onto it can begin streaming just from the beginning of that buffer, to end up with one complete "image" of the latest PTY state, minus any scrollback, without needing previous history.
All that is kind of predicated on control-characters being embedded in the text and "following" the text around, such that taking a slice of the text (as the PTY's ring-buffer does every time a line is expunged) will preserve the corresponding slice of events.
What would reading back the contents of a PTY device look like in a world with out-of-band TTY signalling? What would flow over a serial port? Would it even be a character-stream device, or are you imagining a TTY/PTY as operating in something more akin to a structured datagram event-stream mode, such that you'd use sendmsg(2) and recv(2) on it rather than write(2) and read(2)?
I mean, it's not an impossible dream; but that really is a "start over with a whole separate ecosystem that no existing software works with until made compatible" kind of change. Effectively it'd be a separate thing from TTYs, that just happens to have similar functionality. But it wouldn't support any existing software, or any existing hardware, except by virtualization (i.e. running a PTY emulator process inside your modern OoB-signalling terminal emulator.) Kind of like what Windows has been going through to replace its own command-line.
---
Personally, I'd prefer to keep in-band signalling (in the "in-band markup" sense above, not the "you have to recognize conventional escape-code sequences heuristically to even know they're not regular text" sense.)
But I'd rather just make the in-band signalling structured — i.e. to make TTYs into a data-stream containing a variable-length self-synchronizing bit-encoding with clear prefix separation for control- and data- packets.
Y'know, like UTF-8 is for text.
...or, well, speaking of Unicode: we could just use Unicode for this, reserving another block† of control characters to go with the 30-odd ones that sit at the beginning of the BMP. Then "is this is a control codepoint, and if so, what does it mean" could just be answered by consulting a Unicode table. (In such a setup, CSI command parameterization would be accomplished with zero-width joiners, variant selectors, and other things. Just picture control-characters as invisible emoji — specifically like the flag emoji that are formed by spelling out country-codes in a sort of "flags meta-alphabet"; or like that family emoji [https://emojipedia.org/family/] with the combinatoric variants.)
† Why not use the Private Use Area? Because this would be an explicitly inter-compatible signalling standard, not a proprietary usage. It's not text, but it is a standard signal within a text document. Just like emoji — or like the existing control codepoints in Unicode.
Thank you. I appreciate the detailed explanation, but you really did sum it up well in the first sentence.
I have some frustrations with terminals, which I have always interpreted as being caused by their adherence to some old standards. After reading this comment, I think I can see how it's more the terminal paradigm itself that is responsible for some of these things.
Now, in the future I will be able to look at these frustrations in a new light, and hopefully understand a little better why it makes sense to continue using the terminal paradigm despite them.
This is something I've struggled to understand for a long time, and your comment, obvious though it may seem to some, is one of the first helpful answers I've seen.
Typing "echo <ctrl-v><esc>c<return>" to fix a garbled Vt100 like terminal remains embedded in my brain from long ago. Despite simpler things like "stty sane" existing. I don't know if maybe I was an admin on some box with a wonky stty, or why that's in my brain.
<ctrl-v> essentially means "insert the next key I press verbatim". So, like in Vim, <esc> would switch modes, unless you press <ctrl-v> first. Helpful if you're trying to insert a control character into a file, or echo a literal <ctrl-c> etc.
Whichever facility receives and processes the inbound keystrokes. In simple software that uses the terminal in canonical mode, the line discipline (usually in the kernel) handles the batching up of bytes into lines before passing them onto the software, and thus provides the line editing. In more complex software that puts the terminal into raw mode, it's then handled in that software; e.g., by libedit or readline or some bespoke terminal handling.
I agree that using the 8-bit CSI to eliminate the escape ambiguity would be great. The rest I'm a bit "meh" on. In particular I use C-S as the modifier for my terminal shortcuts, so if the terminal could detect it I'd be looking for yet another set of bindings that no terminal programs will ever use.
The only decent solution in my view would be a new, separate
kind of terminal spec, maybe even completely focused at terminal
emulators. It could make sense to handle input better(maybe even
with keyup/keydown events) and maybe display images in a
non-hacky way(it can already be done via w3m...kind of).
Changing the existing standards just won't work as it would
break a ton of programs, although if I'm being honest - while I
like the idea of such a terminal rework - I don't know if it can
be done nowadays without it opening the flood gates by
developers who just can't restrain themselves from adding so
many features it might as well be a new browser engine.
And either way I don't think it's worth the effort as those
input limitations only affect a few specific situations. I use
hundreds of custom keybindings in Vim and so far the only
obstacle I encountered was the `Tab` & `Ctrl-i` overlap. If I
need to hold down more than one modifier I did something wrong
anyway.
Looks great, the only gripe I have with Kitty is how slow it is,
especially on startup. I guess mostly because the combination of
Python and tons of features has a negative effect in this area.
In fact that was the only reason I switched away from it despite
its great font ligature support; I use terminals a lot and so
the constant delay when opening a new window was a bit much.
However I like those modernised protocols and it would be neat to
have widespread support for it.
This! I would love a "purified" version of kitty which is just an xterm with font ligatures. And no silly features like underlining of links, etc. Also, an option to display bold text by using brighter colors (as God intended).
It's still an order of magnitude slower during startup than other terminals such as xterm, rxvt or even mlterm. On my intel laptop I can often see the gl context flashing before becoming the final background color, which is annoying. Requires also way more ram.
kitty is a great terminal, but it's one example of fast not being also lightweight.
Personally I feel like the fix for terminals is not to use them. Emacs for example has a lot of effort put into supporting various different terminals and efficiently displaying text. Some xterm-specific support is for many colours, mouse support, and the control codes to ask xterm to send more precise key escape sequences so TAB and C-i aren’t confused for example. It also has a gui which doesn’t need to pile on layers of hacks and works great so long as you don’t need to go through a terminal (eg running over ssh.) And yet it is still straddled with backwards compatibility from those days (some users expect TAB and C-i to do the same thing.)
To me it feels silly to put a lot of effort into supporting things better in terminal emulators because they are not as flexible as actual user interfaces and I think the main reasons we still have them are historical.
Aside: I’d also like to see a command line shell which runs outside the terminal emulator rather than inside it.
>Aside: I’d also like to see a command line shell which runs outside the terminal emulator rather than inside it.
The problem with such a shell would be that all the simple utilities and tools one likes to use in a shell will not work without a tty (or pty or fake-pty-conhost.exe). So the shell by itself would be pretty useless.
This theoretical shell and its family of utilities would all have to emulate terminal emulators by using something of a common library of wrappers around tty functionality. At that point you've re-invented cygwin for your OS of choice. And you're going to be running the same kinds of stuff you'd run in a real terminal emulator.
I'm not being snarky. I gave this kind of thing a lot of thought many years ago when I was thinking of native GUI unix-style shells on Windows.
Unless you have a different use case in mind for this shell ?
> The problem with such a shell would be that all the simple utilities and tools one likes to use in a shell will not work without a tty (or pty or fake-pty-conhost.exe). So the shell by itself would be pretty useless.
Right -- no vi, no emacs, no readline, no curses -- nothing that uses cursor addressing. You have to start all over. Plan 9 can be seen as an experiment to find out how much you can simplify if you can abandon backwards compatibility.
I’m suggesting running outside the terminal by which I mean pulling up new terminal emulators as necessary rather than running inside a single terminal emulator and taking over the whole thing for each command (and allowing applications to get it into a bad state). If you want to run less on the output of one command or watch some thing change while still running other commands you either need to open a new second terminal emulator and go to the same place as the first or you need to use something like tmux (and either have some janky hack or be very careful about when you run tmux if you want to duplicate shells on a remote box for example).
I also think it’s not so useful anymore to just set up a pipeline and let it run off and do it’s work. Interactive shell usage is often mostly about editing or extending a small suffix of the previous command, so I think a shell should be optimised to do that well.
All the issues are just matters of backwards compatibility at this point. As far as I'm concerned, we'll always need a text based REPL oriented interface.
> Aside: I’d also like to see a command line shell which runs outside the terminal emulator rather than inside it.
So how would that work over something like ssh or even a serial terminal. And you'd have a shell containing a lot of very platform specific code. That really breaks everything. I'd prefer to see well thought out evolutions to the ANSI escape sequence standards that solve real problems. The proposal on the original post does that for keys. Other things like bracketed paste and support for more than a few colours have seen some adoption. Some good ideas get less attention like having a stack for titlebar changes. Some things that weren't a great idea security-wise have been dropped (key redefinitions, retrieving the title).
One of the problems is that the concept behind termcap and terminfo doesn't really scale or allow for innovations. As a user of rxvt-unicode I often suffer the frustration of it being unrecognised on bare OS installs. But I respect the fact that it doesn't just claim to be xterm while emulating it imperfectly like many.
Well no one uses a serial terminal for much so that can be disregarded. For ssh, you can just stop pretending that it is just like any other program and have actual integration with it (if you still care about serial terminals, just make the mechanism that understands ssh generic). On the remote host you either need your shell executable to run in some “remote” mode like rsync does or you need your shell to be able to generate appropriate bash commands.
Take eshell for an example: it connects processes together in emacs and can natively support navigating to places on remote hosts, opening files there, or running (remote) shell commands.
I sympathise with you on the pain of terminfo+ssh where the remote host doesn’t have the appropriate files. But I don’t think that further piling on “standards” to terminal escape sequences is a long-term solution to terminal woes.
You can also learn to use the terminal. This can clear many of the problems that you would otherwise have by thinking that the terminal is just another windows/macos application.
I don’t know how to respond to this as you seem to just think I’m an idiot who doesn’t know how to use computers.
The problem with using a terminal is that control of it is distributed between a few things the user can control and the escape sequences (or just output) produced by the shell and any processes that run in it. These may end up conflicting with each other leaving your terminal in a bad state or you may just get broken output (ever tried piping pv something | ... | less?). One way to regain some control is with something like tmux but this can become unwieldy (and heaven forfend it sees a multibyte Unicode character—terminal emulators don’t really have a way to communicate with applications about how wide a character is going to be when drawn)
I think a few things get conflated because there are few text-centric or command-line user interfaces. I put it to you that it is possible to have good composable text-centric user interfaces that don’t rely on pretending to be a VT100 or an ancient single-byte-stream-with-control-sequences protocol.
The problem with GUIs (I assume you mean them by "actual user interfaces") is that they rely too much on the user's memory, and that they do not scale, especially for the cloud systems.
I have many text notes which record various commands. I have them because it is so easy -- once you figured out how to do something, get the history and copy relevant stuff. At work, we share the commands on Slack, put them into Wiki, paste them in the docs, and so on. If the commands grow too complex, they turn into scripts -- after all, scripts are just file rename away.
I don't think I could do this efficiently I had to use GUIs. Theoretically, I could write an manuals with dozens of steps[0], but this is much more significant effort, and they'll likely go stale anyway.
You are confusing command line and terminal here, I believe. Note how OP was specifically using emacs as an example. From your "store and share commands" point of view so called TUIs (like emacs in terminal) are as opaque as GUIs (emacs in X11).
Note that nothing stops a GUI program from also supporting programmability and scripting. While it's true that most of them don't, you'll also find that a surprising number do.
For example, the entire MS Office Suite is programmable - you can write VB Script (or lately, JS?) and achieve most if not all of the functionality supported in the GUI.
Rather more well known on this front, Emacs and many other Lisp systems have always supported the same kind of programmability as a shell from within the GUI environment - both in the form of a simple REPL and more advanced GUI-command interaction (e.g. executing the current selection as elisp code, executing a command with the current selection as input etc).
I'd also really like to just be able to interact with a keyboard like other apps i.e. receive keyup events, reading multiple keys at once etc. Kitty has something like that but popular libraries don't support it. I've had many ideas for terminals games and apps but didn't continue because of these limitations.
The conflation of various control characters, i.e. `Ctrl-i` and `Ctrl-I`, was one of the primary reasons that convinced me to try using Emacs outside of the terminal. I'm glad I switched. Now I can make much more ergonomic keybindings.
Indeed, terminals have to be fixed. Even DOS/Windows text mode (I don't mean the shell) always felt way better than a Linux terminal.
Perhaps it finally is time to come up with an entirely new terminal technology re-engineered from [almost] scratch based on today achievements and needs.
For a period of time near 2000 people felt like terminals were going to be abandoned in favor of GUIs so changing them didn't feel reasonable. But now it's clear that was wrong.
> Perhaps it finally is time to come up with an entirely new terminal technology re-engineered from [almost] scratch based on today achievements and needs.
What do you mean, exactly, by that? Do you think that programs designed to work in current terminals (e.g., vim) should stop working with the new terminal technology? That seems a bit tough for progressive adoption. Otherwise, how do you think that these programs should be changed? Or wrapped in some intermediary "terminal emulator" so that they can run unchanged in the newer terminals?
Also, the new terminal technology doesn't necessarily have to be fundamentally incompatible with the legacy technology. It probably can be made reasonably easy to compile classic apps for any terminal technology. Many of them are compatible (although not necessarily 100% compatible) with both Linux and DOS/Windows already.
How does this interact with alternative and personalised keyboard layouts and with sticky keys for accessibility?
I don't have a proper overview of how these things work, but I'm a bit worried that there may be multiple layers (reprogrammable keyboard, kernel, X server, terminal, editor) trying to solve similar problems in similar ways and they might end up interfering with each other.
I have few mobile devices with limited keyboard: smartphones with keyboard, chromebook. They lack some keys presented on a standard PC keyboard. Will this new input library work with virtual keyboard, limited keyboard? How I will be able to input a missed key or control combination?
That doesn't work in a terminal. Terminals can operate over many different channels which don't pass through raw keyboard events (like SSH), or which don't have them at all (like serial links).
(That's literally the plan described in the article)
The first paragraphs of https://clojure.org/community/etiquette nail it: the software community is about making. If you want something done, make it.
How would a "maker-driven" solution look like in this domain? For example:
- make a terminal emulator that accepts the devised keybinding system
- make at least one popular program (e.g. Emacs) play fine with said terminal
- promote that combination, showing the world how the technology can in fact work. Word of mouth would do the rest.