> Well, because it’s gosh-darn hard to do it the right way.
I think this overstates the difficulty. This of course depends a lot on the language, but for a reasonable one (not C++) you can just go and write the parser by hand. I’d ballpark this as three weeks, if you know how the thing is supposed to work.
> it doesn’t have to redo the whole thing on every keypress.
This is probably what makes the task seem harder than it is. Incremental parsing is nice, but not mandatory. rust-analyzer and most IntelliJ parsers re-parse the whole file on every change (IJ does incremental lexing, which is simple).
> The reason (most) LSP servers don’t offer syntax highlighting is because of the drag on performance.
I am surprised to hear that. We never had performance problems with highlighting on the server in rust-analyzer. I remember that for Emacs specifically there were client side problems with parsing LSP JSON.
> Every keystroke you type must be sent to the server, processed, a partial tree returned, and your syntax highlighting updated.
That’s not the bottleneck for syntax highlighting, typechecking is (and it’s typechecking that makes highlighting especially interesting).
In general, my perception of what’s going on with proper parsing in the industry is a bit different. I’d say status quo from five years back boils down to people just getting accustomed to the way things were done. Compiler authors generally didn’t think about syntax highlighting or completions, and editors generally didn’t want to do the parsing stuff. JetBrains were the exception, as they just did the thing. In this sense, LSP was a much-needed stimulus to just start doing things properly. People were building rich IDE experiences before LSP just fine (see dart analyzer), it’s just that relatively few languages saw it as an important problem to solve at all.
I don't think you can write a production quality parser for any "real" language in 3 weeks ... You can get something working in 3 weeks, but then you'll be adding features and fixing bugs for a year or more.
If you take something like Python or JavaScript, the basics are simple, but there are all sorts of features like splatting, decorators, a very rich function argument syntax, etc. and subtle cases to debug, like the rules of what's allows on LHS of assignment. JavaScript has embedded regexes, and now both languages have template strings, etc. It's a huge job.
It's not necessarily hard, but it takes a lot of work, and you will absolutely learn a lot of stuff about the language long after 3 weeks. I've programmed in Python for 18 years and still learned more about the language from just working with the parser, not even implementing it!
And this doesn't even count error recovery / dealing with broken code ...
I don't see what is challenging about any of what you mentioned, furthermore parsing a language is not the same thing as verifying that what is parsed is a semantically valid. Python is almost a context free language with the exception of how it handles indentation. With indentation taken into account, the entire language can be parsed directly from the following grammar using something like yacc:
JavaScript is not a strictly context free grammar either, but like Python the vast majority of it is and the parts that are not context free can be worked around. Furthermore the entire grammar is available here:
It isn't trivial to work around the parts that aren't context free, but it's also nothing insurmountable that requires more than 120 hours of effort. The document explicitly points out which grammar rules are not context free and gives an algorithm that can be used as an alternative.
Parsing is really not as challenging a job as a lot of people make it out to be and it's an interesting exercise to try yourself and get an intuitive feel for. You can use a compiler compiler (like yacc) if you feel like it to just get something up and running, but the downside of such tools is they do very poorly with error handling. Rolling out a hand written parser gives much better error messages and really is nothing that crazy. C++ is the only mainstream language I can think of that has a grammar so unbelievably complex that it would require a team of people working years to implement properly (and in fact none of the major compilers implement a proper C++ parser).
For statically typed languages things get harder because you first need to parse an AST, and then perform semantic analysis on it, but if all you need is syntax highlighting, you can skip over the semantic analysis.
> but if all you need is syntax highlighting, you can skip over the semantic analysis.
I wish we could move toward semantics highlighting.
I will chime in with you though and agree, as a writer and teacher of parsers, it doesn’t have to be that hard. In fact, if you implement your parser as a PEG, it really doesn’t have to be much longer than the input to a parser generator like YACC. Parser combinators strongly resemble the ebnf notation, it’s almost a direct translation. That’s why parser generators are possible to write in the first place. But in my opinion they are wholly unnecessary, since true grammar itself is really all you need if you’ve designed your grammar correctly. Just by expressing the grammar you’re 90% of the way to implementing it.
The thing is, for IDE purposes “production ready” has a different definition. The thing shouldn’t have 100% parity with the compiler, it should be close enough to be useful, and it must be resilient. This is definitely not trivial, but is totally within the reach of a single person.
> And this doesn't even count error recovery / dealing with broken code ...
With a hand written parser, you mostly get error resilience for free. In rust-analyzer’s parser, there’s very little code which explicitly deals with recovery. The trick, is, during recursive descent, to just not bail on the first error.
Those are some very nice insights, thanks for sharing them! Can you recommend a good resource on writing a parser by hand that doesn't bail on the first error? Or would you instead suggest studying the source code for e.g. the rust-analyzer parser?
In that case, they're parsing a haskell-like language and can use indentation as a guide for how far to skip ahead.
In a C-like language, I'd imagine you'd use braces or semicolons to see how far to skip ahead - the error bubbles up to a parser that knows how to recover, like say a statement or function body, it scans ahead to where it thinks its node ends and returns an error node, allowing the parent to continue.
> I remember that for Emacs specifically there were client side problems with parsing LSP JSON.
I am given to understand that this is not a problem any more (since Emacs 27.1). Before that, the JSON parser was written in elisp which is a slow language (though somewhat mitigated with recent native-compilation). But now Emacs has preference to just use native bindings (jansson), and afaik this had solved most of the performance grievances raised by LSP clients.
> I think this overstates the difficulty. This of course depends a lot on the language, but for a reasonable one (not C++) you can just go and write the parser by hand.
I don't agree. Newer languages are all being designed with the constraint that the grammar should be easy to parse and not require indefinite lookahead and full compilation to get back on track after an error.
That's a big change from the C/C++ heritage.
It's no coincidence that "modern" languages (call it the last 10 or so years) tend to have things like explicit variable assignment (let-statement-like) and delimiters between variable and type, for example.
> Newer languages are all being designed with the constraint that the grammar should be easy to parse
I think that says less about the difficulty of parsing and more that language designers have realised that 'easy to parse' is not incompatible with good readability and terse syntax. In fact, the two go hand in hand: languages that are easy for computers to understand are often easy for users to understand too.
This has nothing to do with old or new and everything with both C and C++ being serious aberrations in programming language design. Most languages not directly influenced by C (new or old) simply don't have these bizarre issues. Also a lot of languages are becoming significantly harder to parse as time goes on (python for example).
> Most languages not directly influenced by C (new or old) simply don't have these bizarre issues
I don't agree. Lisp is "easy" to parse, but difficult to add structure to. Tcl similarly. Typeless languages are now out of favor--everybody wants to be able to add types.
Perl is a nightmare and probably undecidable. Satan help you if you miss a period in COBOL because God sure won't. FORTRAN is famous for it's DO LOOP construct that would hose you.
About the only language that wasn't hot garbage to parse was Pascal. And I seem to recall that was intentional.
> I don't agree. Lisp is "easy" to parse, but difficult to add structure to.
I have no idea what you mean by this, or how you think it relates to your original claim that having languages with a less terrible grammar than C++ or even C is some recent development.
> Perl is a nightmare
And it's pretty clearly C-inspired, even if it added lots of new syntactic horrors of its own invention. Also, it's late 80ies not early 70ies, so hardly a poster-case for languages becoming grammatically saner.
> About the only language that wasn't hot garbage to parse was Pascal.
In addition to Pascal and Lisp which you already mentioned Algol, Prolog, APL, Smalltalk are all famous languages from around the same time as C or significantly older and none of them are "hot garbage to parse". Neither are important 80ies languages like Postscript or SML. In fact the only significant extant 70s language I can think of from the top of my head that is syntactically noticeably more deranged than C and maybe even C++ is TeX.
> And I seem to recall that was intentional.
Well yes, why would anyone create a language that's really hard to parse for no discernible benefit? This is not the counterintuitive recent insight you make it out to be. If anything, the trend for popular languages would seem to become harder to parse -- none of the significant languages from the 2000s (like Swift, Julia or Rust) are anywhere as easy to parse as the languages I listed above.
Readers, please don't accept anything anyone writes about "FORTRAN", unless in a historical context. They probably last encountered the leading edge of the language 40 years ago.
Python is not typeless, it is strongly typed. Each value has one, precisely known type. Names may refer to values of different types, which is the "dynamic" part of pythons typing.
Javascript is weakly typed and most of its mayhem comes from there.
This is me being obtuse, but it seems like an appropriate time to ask... What is the difference? You mention that each value has one known type in a strongly typed language. Isn't this the case for Javascript as well? I'm having a difficult time trying to conjure a situation in JS where a value has multiple types (but I'm certainly no expert in JS).
It's a bit of a mixed bag and the terminology is difficult to grasp. I'd say Tcl and Bash are languages that only have strings ('stringly typed') that can be interpreted in a number of ways. JavaScript, PHP, and SQLite's SQL OTOH have lots of implicit type coercion---`1` and `'1'` can both act as a string, a number, or a boolean.
Python is considerably more picky in what it allows in which constructs; it does all the numerical typing stuff implicitly (so switching between integer, big int, and float happens most of the time without users even knowing about it), and b/c of backwards compatibility, coercions between numbers and booleans still happen (`True + 1` is still `2` in Python3.9). By extension, this includes empty lists, strings, and dictionaries evaluating to `False` in appropriate contexts.
I believe that in retrospect most of these efforts—coming up with a Just Works arrangement of strategically placed implicit coercions—that so much defined the rise of scripting languages in the 90s are questionable. Subsequently, many millions of USD went into making JavaScript faster than reasonable (on V8) giving its pervasive dynamic nature. Just bolting down types and banning any implicit coercions would have given it similar performance gains with a fraction of effort and a fraction of the resulting complexity. Backward compatibility could have been done with a pragma similar to the existing `'use strict'`.
I guess what I want to say is that strong and loose typing exists on a continuum much like human languages are never just of a single idealized kind.
I had an interesting experience making a simple "Wiki" style editor for a web app back around 2008 or so. To my surprise, even an ANTLR-generated JavaScript parser could easily handle keystroke-by-keystroke parsing and fully updating the entire DOM in real time, up to about 20-30KB of text. After 60KB the performance would drop visibly, but it was still acceptable.
A hand-tuned Rust parser on a 2021 machine? I can imagine it handling hundreds of kilobytes without major issues.
Still, there's some "performance tuning itch" that this doesn't quite scratch. I can't get past the notion that this kind of things ought to be done incrementally, even when the practical evidence says that it's not worth it.
> This is probably what makes the task seem harder than it is. Incremental parsing is nice, but not mandatory. rust-analyzer and most IntelliJ parsers re-parse the whole file on every change (IJ does incremental lexing, which is simple).
Glances at the memory usage of Goland in a moderately sized project and weeps
> I think this overstates the difficulty. This of course depends a lot on the language, but for a reasonable one (not C++) you can just go and write the parser by hand. I’d ballpark this as three weeks, if you know how the thing is supposed to work.
Having a parser which generates an AST is just the first step. Then, you actually need to implement all the rules of the language, so for instance the scoping rules, how the object system works, any other built-in compound/aggregate types, other constructs like decorators, generics, namespaces or module systems, and on and on and on. Depending on the language, this will usually be the main work.
And then of course there's dynamic typing - if you want to enable smart completions for a dynamically typed language, you need to implement some kind of type deduction. This alone can take a lot of time to implement.
If you want syntax highlighting, the AST is enough to generate pretty colours for the source code. If you want semantic highlighting… sure, that's another story entirely. And even then you don't necessarily have to do as much work as the compiler itself.
And don't even try to be smart with dynamically typed languages, it cannot possibly be reliable, short of actually executing the program. If your program are short enough you won't need it, and if you do need such static analysis… consider switching to a statically typed language instead.
Yeah, to clarify, memoization happens after parsing. So for syntax highlighting we have a situation where from-scratch parsing is faster than incremental typechecking.
I was under the impression that rust-analyzer (and more generally LSP) provides augmentative (contextual) syntax highlighting, whereas most of the highlighting still comes from editor-specific configuration. Is this not the case? If so I would be thrilled; as someone authoring a custom language right now it has been very frustrating to not be able to provide a single source of syntax highlighting for all popular editors.
rust-analyzer highlights everything, I have an empty vs code theme for it somewhere. But yeah, in general LSP highlighting is specified in augmentative way.
Before this conversation is railroaded by talk about language servers, as the article points out, tree sitter tends to need to be a bit closer to the environment to be effective.
There’s still work to do, but having tree sitter in neovim feels like a great step forward.
Yes, it's more for syntax highlighting where you don't want the lag of an external server and don't need the deep language analysis needed for diagnostics, refactoring, etc. I'm not sure what other use cases it would be superior to LSP for, but I'm sure there are some.
Thanks for the article. Even though I'm not active on the development side in either editor, I love the idea that people are toiling away on these same sorts of enhancements in both environments (and I get the benefit in neovim).
Heh. A long time ago I wrote a video game[1] somewhat similar to Williams Defender, and casting about for some sort of "theme" for the game, I hit upon the "editor wars", the ancient storied battle between vi and emacs. You are ostensibly "vi", (a little spaceship vaguely reminiscent of the Vipers from Battlestar Galactica) cruising through system memory, evading system processes, GDB instances, etc trying to recover your ".swp" files. How to represent Emacs? Obviously, via a giant blimp! and I could display all sorts of messages on the side of the blimp, singing the praises of Emacs, and disparaging fans of vi. And the Emacs blimp had a "memory leak", which meant that pieces of the xemacs source code would literally leak out of the back end of the blimp, with the letters floating lazily away, like smoke. So that meant I had to take a look at the xemacs source, dig through it and try to find some funny bits to put in. Of course, "semantic bovinate" jumped out at me.[2]
I suppose that these days I am one of the few professional programmers who has an active dislike of syntax highlighting. I find it immensely distracting. The only stuff I allow the highlighter to touch are my comments (I turn them bold) and I consider this a somewhat frivolous indulgence.
To each their own, and fortunately most(all?) editors allow such features to be turned off.
On the other hand, I find the "frivolous indulgence" perspective extremely obnoxious along with the related implication of moral or technical superiority of not using syntax highlighting.
Sometimes I wonder if those who don't prefer it might have some synesthesia which might allow their brain hardware to provide what the syntax highlighting does for the rest of us.
While I'm aware that OP and I are in a minority - there is a cognitive overhead to having that information surfaced at all times when you may be trying to focus on something at e.g. the method level rather than the individual syntactic element level, and if that cognitive overhead exceeds the utility of having that information available, the sensible answer is to turn the highlighting off.
If I could have some sort of focus follows mind where highlighting automatically happens commensurate to what level of granularity I'm currently thinking about the code at I would be extremely interested, but absent "focus follows mind" it's a trade-off that everybody has to make for themselves.
Some people prefer to highlight almost everything, some almost nothing, some people find it helpful for some languages/tasks but not for others.
It's similar IME to the extent to which preferred debugging styles (printf versus interactive versus hybrid versus situational choices) are also something people have to figure out, and, well, different people are different, and that's neither a bad thing nor an avoidable thing.
I wonder if perhaps this is also a generational thing. Programmers from before syntax highlighting became popular would be less likely to prefer it, no? I’m not even sure if programmers from current/recent generations ever prefer not to have syntax highlighting, but I’m genuinely curious if there are such people out there.
I've known people who've started without synhi but wouldn't leave home without and people who started with rich synhi everywhere and now avoid it except for the first couple months of learning a new language, so it's certainly not just generational.
I would not at all be surprised if people who started off one way or the other (for generational reasons or indeed any 'whatever environment they were first introduced to' style reasons) are less likely to end up switching, but that probably says more about perceived switching costs as what would be most comfortable for somebody.
e.g. I know people who took a month to be comfortable without synhi but then loved that, and I've spent weeks trying to be more comfortable -with- and given up, and honestly anything that half screws your productivity for over a week is going to be a hard sell even if the end result -would- be better (waves in "also, still can't manage to drive emacs" ;)
I grew up with fully fledged syntax highlighting, but I still prefer to use really minimal themes as I find them to reduce cognitive overhead and eye strain.
It was hard for the first few hours, but then I eventually got used to it, and now I can't use anything else.
I know this is not quite as extreme as working without syntax highlighting :)
I started before syntax highlighting became widely used (in the Emacs 18 era), but was super-excited for syntax highlighting when it became available (Emacs 19 and XEmacs), and probably went overboard with it. These days, I prefer minimal syntax highlighting.
Where the cursor is and where my mental focus is don't necessarily match, and it's really the granularity problem - expression level versus statement versus block versus etc. that causes 'mismatch between highlighting and focus' for me at least.
It certainly sounds like an experiment that would be interesting to try, though.
The way I see it, most syntax highlighting is actively adding mostly irrelevant information to the cognitive load of programming: stuff that should be obvious if you know the language. It does as little for my understanding as a novel in which, say, every proper noun was printed in red.
I can imagine more useful highlighting than color coding the types of the symbols encountered. Lighting up the active scopes. Giving the same hue to names that look like each other. There are probably highlighters out there that do that. But "simple" syntax highlighting is still the norm.
The same argument could be made for seeing anything in colour:
> most colours are actively adding irrelevant information to the cognitive load of existing. It should be obvious that apples and red and the sky is blue.
That’s silly, because it does add relevant information. Obviously it’s a spectrum - too many colours can hide information, but when used appropriately it’s fine.
Also everyone is different. Perhaps your brain gets distracted by the colours more than the majority of people.
As you had guessed a little later, there are a few different emacs packages that do this. One of them is "rainbow parentheses" that gives every bracket a different colour (remember that emacs supports lisp, so differentiating between lots of different parentheses is arguably more useful in emacs than any other editor). [0].
Another one is highlight parentheses [1] which highlights all parens that enclose the cursor position, and gives a darker colour to those "further away" from the cursor.
While I don't fully disable syntax highlighting, I use a minimal theme [0,1] that only has highlighting for comments, strings and globals. It reduces eye strain for me, and I never find myself relying on highlighting to navigate through code.
LSPs provide an "outline" which can be very useful to navigate through code. I find "jump to symbol" function in my text editor to be faster than scanning all of the code to find the line.
Also most themes dim the comments, but IMO if something in the code needed an explanation, it should be brighter, not dimmer.
A few years ago I switched my color theme to something very simple, just as an experiment.
Somehow I never found a need to change that. I highlight comments, keywords, and strings. Comment and string highlights are helpful if they contain code-like text, to make them obviously not-code. Keywords give some structure to the text.
Everything else is frivolous to me. Books do not highlight verbs in green, either.
While I will not argue with your general point -- I also don't really need highlighting and I read a lot of plaintext code -- I wonder about this.
Would this make languages easier for non-native speakers? Would improve comprehension?
It's funny that the industry spends so much time on syntax highlighting for programming languages, when humanity's written languages are arguably more complex and difficult to parse and master.
> Would this make languages easier for non-native speakers? Would improve comprehension?
When I've been trying to learn languages, I can typically part-of-speech tag unknown words quite easily (common prefixes/suffixes/word length/sentence position give lots of information – and some of this is shared across languages as well). The comprehension difficulty is nearly always due to content words I haven't seen before (or have forgotten).
Not to bikeshed on this, but I have a pretty strong preference for minimalist syntax highlighting. I'm currently using tao-themes in emacs: matching light and dark themes that are grayscale or sepiatoned, and mostly use character properties like bold or italic along with a few shades of gray. Much more calming than the usual "angry fruit salad on black" programmer themes, but also providing more intuitive information than no syntax highlighting.
I feel the same way.
Never understood what the point of highlighting certain keywords or if something is a type or a function would be, it's all obvious from the grammar and where things are positioned anyway. And When I read code I want to read all of it, not draw any particular attention to "if" or "else".
Keyword highlighting is explicitly called out as an antipattern in the book Human Factors and Typography for More Readable Programs, which I highly recommend.
I would normally respond that, as others have pointed out, you're basically saying you prefer to "hide" information that, to most people, is relevant (is this a keyword, a global or local variable, a type, a method, a static function...), but I've noticed that when I'm doing code review, using the shitty BitBucket interface which shows everything red or green, without any code highlighting, actually helps me a little bit to focus on the changeset as opposed to what the code is actually doing in general. This is helpful because the changeset is what I care about when doing a review (what's different than before is the first question, with understanding what code is actually doing comes second)... Later, I might need to look at the code in my IDE with proper highlighting to better understand what the changed code is actually doing in more detail, but that's rarely needed (unless it's comprehensive changes).
So, it occurred to me that whether syntax highlighting is actually useful depends somewhat on the context, what are you trying to do?!
I suppose it's easy to extend that realization to people who are different and might feel overloaded by information more easily, so I can sympathize with what you're saying (hope this doesn't sound condescending, I am just trying to say people can have very different cognition overload levels, regardless of how capable they actually are in general).
Anyone interested in syntax highlighting should read the book Human Factors and Typography for More Readable Programs. The majority of the book is devoted to non-color techniques, but they do present some ideas for how to effectively use color near the end.
Much of syntax highlighting in the wild is junk, just distracting eye candy.
I am very fond of things akin to vim's 'showmatch' mode where when I close a paren or block or etc. the editor highlights the opening element for a second or so and then returns to baseline.
(I have almost no -ambient- highlighting in my baseline but I know lots of people who do and still derive great value from showmatch for the feedback - from discussions with other people rainbow parens style lisp modes seem to provide a maximum overkill approach to that question but I very much prefer maximum underkill in my own tooling even while wanting to be very sure I'm not making it unduly difficult for collaborators with opposite preferences)
There's no correct answer here, it's totally subjective.
I can sympathise with both sides; I like syntax highlighting when it's done well - when it's distracting I turn it off.
Seeing a keyword highlighted within a comment is an instant red flag - unfortunately it happens loads in Azure Data Studio (which I need to occasionally use).
parentheses matching is a surprisingly non-trivial problem. (i.e. simple counting of opening and closing parens isn't sufficient, given that quoted parens shouldn't be counted) For humans, ignoring quoted parens is maybe easier, but i would say it's a flex to assert that you can tell if (3,(g(f('('),x))) is balanced at a glance.
even if you can, surely you're wasting time and/or focus on an automatable task.
But we all have something we do 'the hard way' because it feels like more effort to relearn the task than its worth, or because we tried the easy way once and were put off by some side-effect.
paren highlighting never comes as a single unit, its always packaged with other 'helpful' tools, some subset of which will always be infuriating to someone.
> For humans, ignoring quoted parens is maybe easier, but i would say it's a flex to assert that you can tell if (3,(g(f('('),x))) is balanced at a glance.
True, but mentally balancing parentheses is usually something that you do while writing the code: you pushpop a little stack in your head and this becomes second nature.
Mentally verifying if parentheses are balanced while reading code is hardly ever required. You can usually safely assume that they are (unless that darn compiler tells you otherwise).
you're probably right, difficulties when writing are mainly due to tools 'helpfully' adding a ket as soon as i type a bra.
maybe i just dont have that stack well enough built in my head--if im editing in a plugin-free vim, i do find i have to backtrack and count to make sure i've put the right number of kets at the end of a nested expression.
if i used s-expressiony instead of tab-heavy languages more often i'm sure i'd be better at it.
Wait until you try a fully featured "real" IDE. The features language servers provide are only some of the many things that IDE users have had for literally decades.
It's kind of hilarious that programmers, who learn again and again the value of decoupling and cohesion, fell so hard for the idea of an Integrated Development Environment. There's nothing about syntactic/semantic code analysis, to pick one example, that requires it to be packaged along with a particular text editor in a single big blob.
Ironically, the most successful IDEs today, the Jetbrains ones, are demonstrations of this. They are built out of reusable components that are combined to produce a wide range of IDEs for different languages.
LSP and DAP aren't perfect, but they're a huge step in the right direction. There's no reason people shouldn't be able to use the editor of their choice along with common tooling for different languages. The fact that IDEs had (for a while) better autocomplete, for example, than emacs wasn't because of some inherent advantage an IDE has over an editor. It's because the people that wrote the language analysis tools behind that autocomplete facility deliberately chose to package them in such a way that they could only be used with one blessed editor. It's great to see the fight back against that, and especially so to see Microsoft (of all people) embracing it with LSP, Roslyn, etc.
The technical design isn't the user experience. IDEs are an integrated user experience. It literally doesn't matter to the user how nicely decoupled everything is or isn't under the hood if the end results are indistinguishable.
One point in favor of tight integration and against LSPs is that editing programs isn't like editing unstructured text at all and shouldn't be presented as such. There are tons of ways in which the IDE UX can be enhanced using syntactic and semantic knowledge of programs. Having a limited and standardized interface between the UI and a module providing semantic information will just hamper such innovation.
The user experience is entirely determined by the technical design. The difference between emacs or vim and the editor built into say Visual Studio is enormous, and if a developer is prevented from using the former (if that's what they're comfortable with) alongside the language analysis capabilities of the latter, that has a huge impact on the user's experience.
It's true that if you own both the editor and the language analysis tools you can more rapidly add new capabilities, but many facilities that were historically the domain of IDEs, such as autocomplete, are very easy to standardise an interface for, and this has been done. Supporting such interfaces doesn't prevent you from also supporting nonstandard/internal interfaces for more cutting-edge capabilities. The argument made by Jetbrains is similar to the one you've made and it's entirely false. They could easily support LSP and it would have no impact on their ability to innovate. They refuse to do so for purely business reasons (as is their right).
Editing instead of replying as the depth limit is reached (bad form, perhaps, but gmueckl's reply is in the form of a question and I'd like to respond): The necessary UI capabilities for the features you describe already exist in emacs. Multiple alternative implementations of them, actually (lsp-mode vs eglot). It's the editor's job to provide the UI and the LSP server's job to provide the backend. The interface between them is easy to standardise and it has been done (yes, even for the features you mention).
For example, how do you use a feature like "Navigate to derived symbols" without the required UI integration (prompt for which derived symbol to go to, opening up the correct code location...)? How do you define an "Extract interface" or "Extract base class" refactoring (name of new class/interface, members to extract, abstract or not...)? There are tons of UX aspects to good code navigation and refactoring. In order to get the equivalent of the feature set of, say, Resharper into vi or emacs, you'd have to add tons of new UI stuff. And once you do that, you are back to pretty tight coupling.
The necessary UI capabilities for the features you describe already exist in emacs. Multiple alternative implementations of them, actually (lsp-mode vs eglot). It's the editor's job to provide the UI and the LSP server's job to provide the backend. The interface between them is easy to standardise and it has been done (yes, even for the features you mention).
(Seems I can reply after all so have done so, and now I appear unable to edit the GP and remove the text above from it :-/)
Emacs can be described as an interactive, lisp-based environment for building textual UIs (TUIs). It's very easy to extend it with arbitrary, dynamic behaviours, including ones that need to collect information from the user.
As a trivial example, let's say for some reason I keep needing to generate the sha256 hash of a password and add it to the current file. I could add this to my .emacs:
Now if I hit Ctrl-C followed by p then h I will be prompted for a password in the minibuffer and the hash of the string I provide will be added where my cursor is. I didn't need to write any GUI code.
This kind of user interaction can equally easily allow the user to select from lists of dynamically determined options, including very large ones (with nice fuzzy matching menus if you use ivy, helm or similar). It's also trivial to write functions that prompt for several pieces of information, ask for different information depending on context, etc.
In the case of LSP, the server only has to provide information about what options are available and what possible responses are permitted. It's easy for emacs to dynamically provide the corresponding UI.
Ok, so you're not pointing to an implementation of the feature that I asked about. Instead, you're asking me to implement it as a special case. That's backwards.
You're effectively confirming that there needs to be feature-specific integration code for each and every navigation/refactoring/... feature in the editor. Once you have that, you again have tight coupling.
Not at all. The server allows for discovery of code actions and their parameters and the UI is displayed in response, by code that knows nothing about particular actions. It goes something like this (won't be accurate in the details, check the lsp spec [1] under Code Action Request if you want chapter and verse):
User to emacs: I want to perform a code action.
Emacs to server: The selection is this. What code actions can you perform?
Server to emacs: I can perform actions called "Extract Method", "Extract Base Class", etc.
Emacs to user: Choose an action from this list
User to emacs: I'll do "Extract method"
Emacs to server: We're going with "Extract Method", what info do you need.
Server to emacs: I need an item called "method name" which is a string and an item called "visibility", which is one of the following: "public", "private", "protected".
Emacs to user: Enter method name... Select visibility...
Emacs to server: Here are the parameter values.
Server: Here are the edits necessary to perform the code action
Emacs: Updates code.
No special per-action code is required. If you want to see an implementation have a look at lsp-mode[2].
I hope that makes it clear. I've spent more of my day explaining this than I should have now, so I'll leave it there.
Please forget "Extract Method" as an example. It's simply far too trivial. Imaging extracting a new superclass or interface from a class with 100 methods and you need to pick the set of methods to extract. You can't do that with 100 y/n choices.
>Having a limited and standardized interface between the UI and a module providing semantic information will just hamper such innovation.
Maybe. That can certainly be a downside of standardisation in general. However, it doesn't necessarily follow in all cases, and this is, I think, one where it doesn't. The features LSP provides are stable - and have been standard across most editors/IDEs for quite some time. Implementing them once for N editors, rather than N times, is just far more approachable (and appealing) for language tooling developers.
It doesn't stop those developers (or anyone else) adding features beyond the LSP standard. But that means doing it in an editor-specific way. Which is no worse than where we were before anyway.
I think the primary reason why IDEs are generally better than maximally customised editors like vim, emacs, sublime or vscode and whatnot is pretty simply put: money.
People buy IDE -> money goes to improving the IDE -> IDE gets better
People download one of 6 competing open-source plugins -> a couple of people improve it a little -> 3 years pass, the author loses interest -> someone else else reinvents the wheel, there are now 7 competing open-source plugins 3 of which are good but not maintained anymore.
Great features require time, I just don't see non-commercial work succeeding here.
Doesn't mean it's not possible to create commercial fantastic open-source standalone language tools, it's just not happening for some reason. Probably just because most businesses are still hesitant to open-source their core business?
"Commercial open-source" will always be oxymoronic to me, despite all
the kumbaya naysayers. There's a basic law of nature at work here which
US patent law and a fictional character ("if you're good at something,
never do it for free") understood. To a programmer, opening the source
essentially renders the product gratis, some custom integration work
notwithstanding.
Yeah I get that, and I think that's the primary reason it's difficult to do well.
I think there _are_ ways to do it right. For instance, open-sourcing a Windows application is not necessarily problematic if 99.9% of your user base has never ever compiled something from source. Heck, my father is the kind of person who doesn't know the difference between "Windows" and "gmail". He has purchased software for his business once, it would've made no difference to him whether it was open-source or not.
Despite my believing that it's possible, I can't really think of any examples other than redhat and Qt from the top of my head...
> open-sourcing a Windows application is not necessarily problematic if
> 99.9% of your user base has never ever compiled something from source.
What a fatuous remark. I publish the Coca-Cola recipe to Pepsi
drinkers. I'm fairly sure the recipe will eventually get around to a home
brewer who's sick of paying the Coca-Cola company for its product.
I'm old enough to have used IDEs, the issue is that my job involves dealing with multiple different languages and markup files. In turn, a general purpose editor with language servers just suits my workflow better.
For me, there is no IDE feature that can compete with the experience of editing in vim/neovim. When I use any other editor I just feel like I have a hand tied. The development of LSP and tree-sitter just makes the whole experience even better.
I'm not familiar with state of the art for language servers but here's common IntelliJ refactors I use across Go and TypeScript (and Java a while ago):
- Add a parameter to a method signature and fill it in with a default value.
- Reorder method parameters.
- Extract a block of code to a method that infers the correct input and output types.
The most advanced refactoring I've done with IntelliJ is structural replace for Java which can do something like: for every private method matching the name "*DoThing" defined on a class that extends Foo, insert a log line at the beginning of the method: https://www.jetbrains.com/help/idea/structural-search-and-re...
I make heavy use of the "integrated" aspect of IntelliJ. One of the nicer benefits is that SQL strings are type-checked against a database schema.
All of these are doable with LSP. I'm a big fan of the "LSP rename" action which will rename a particular semantic item (e.g. a method) across files, or the refactoring actions (e.g. change an "if let" into a "match" in Rust).
Everything is doable with LSP as it is an extensible RPC essentially.
But the above things are not done in LSP generally. It doesn’t have first-class support for structural search replace. It doesn’t have support for interactive refactors which require user input.
Do you have an example of a language server capable of structural refactoring of the type mentioned in the GP? The “semantic rename” is table stakes from ~20 years ago in IntelliJ and ReSharper, and even Eclipse.
Sure, but a lot of us don’t like that extra overhead. LSP is great in that it’s a tool you can tap into to use in a workflow that’s best for you. More of a library than an application (not technically, but in terms of how you use it).
> The features language servers provide are only some of the many things that IDE users have had for literally decades.
Yes of course, because that's what they were explicitly designed to do. The novel thing about language servers isn't that they enable code intelligence features like auto-complete and variable renaming. It's that they do so over a standard protocol that any editor or IDE (or website or CI system or ...) can use.
I'm a maintainer of a cli HTTP client with a text plain file format, Hurl [1].
I would like to begin to add support for various IDE (VSCode, IntelliJ), starting from syntax highlighting, but I have hard time to start.
I struggle on many "little" details, for instance: syntax error should be exactly the same in the terminal and in the IDE. Should I reimplement exactly the same parsing or should I reuse some of the cli tools parser? If I reuse it, how do I implement things given than, for instance, IntelliJ plugin are written in Java/Kotlin, while VScode plugin are Javascript/TypeScript, and Hurl is written in Rust...
Very hard to figure all when it's not your core domain,
If it's something simple (and it sounds like it is) then I would strongly recommend just making a single parser library that you use in both the language server and CLI. That's what I've done for my RPC format.
I used Nom. Even though it's not incremental, parsing is easily fast enough to just reparse the entire document on each change.
An alternative is to just use Tree Sitter as your parser for the CLI too. You won't use the incremental parsing feature in the CLI but that's fine.
Supporting IntelliJ may be tricky but there is a WIP plugin that adds LSP support.
tree-sitter is a great framework. I have used it quite a bit in past. I even created a small library on top of it, called tree-hugger (https://github.com/autosoft-dev/tree-hugger) Really enjoyed their playground as well.
> The reason (most) LSP servers don’t offer syntax highlighting is because of the drag on performance. Every keystroke you type must be sent to the server, processed, a partial tree returned, and your syntax highlighting updated. Repeat that up to 100 words per minute (or whatever your typing speed is) and you’re looking at a lot of cross-chatter that is just better suited for in-process communication.
While I agree... he might be surprised to know that that is what all language servers do anyway, even if they don't provide syntax highlighting. Every keystroke gets sent over the LSP. As JSON. It's amazing it works as well as it does.
I think this overstates the difficulty. This of course depends a lot on the language, but for a reasonable one (not C++) you can just go and write the parser by hand. I’d ballpark this as three weeks, if you know how the thing is supposed to work.
> it doesn’t have to redo the whole thing on every keypress.
This is probably what makes the task seem harder than it is. Incremental parsing is nice, but not mandatory. rust-analyzer and most IntelliJ parsers re-parse the whole file on every change (IJ does incremental lexing, which is simple).
> The reason (most) LSP servers don’t offer syntax highlighting is because of the drag on performance.
I am surprised to hear that. We never had performance problems with highlighting on the server in rust-analyzer. I remember that for Emacs specifically there were client side problems with parsing LSP JSON.
> Every keystroke you type must be sent to the server, processed, a partial tree returned, and your syntax highlighting updated.
That’s not the bottleneck for syntax highlighting, typechecking is (and it’s typechecking that makes highlighting especially interesting).
In general, my perception of what’s going on with proper parsing in the industry is a bit different. I’d say status quo from five years back boils down to people just getting accustomed to the way things were done. Compiler authors generally didn’t think about syntax highlighting or completions, and editors generally didn’t want to do the parsing stuff. JetBrains were the exception, as they just did the thing. In this sense, LSP was a much-needed stimulus to just start doing things properly. People were building rich IDE experiences before LSP just fine (see dart analyzer), it’s just that relatively few languages saw it as an important problem to solve at all.