Lezer: A parsing system for CodeMirror, inspired by Tree-sitter

lewisjoe · on March 24, 2024

I'm always amazed by Marijn's brilliance (Author of Lezer, CodeMirror and the awesome ProseMirror toolkit)

The level of depth he dives into to perfect his projects is insane. For example, lezer is a parser generator( which by itself is not a trivial feat with novel ideas like incremental computations applied to parsing) to power his mainstream project which is CodeMirror.

Like this Prosemirror too has some insane levels of engineering underneath married with thoughtful architectural decisions.

Apparently, a big fan of his works.

simonw · on March 24, 2024

He's also the author of a really great JavaScript book: https://eloquentjavascript.net/ - which he's been intermittently updating since 2007!

AlexErrant · on March 24, 2024

He's incredibly responsive too. He just answered my question about an hour after it was asked... on a Sunday evening.

Slightly off topic; is anyone aware of any Lezer grammar for regex? I've not been able to find any in the FOSS world. I suspect Regex101 has one, bit it's sadly closed source.

lewisjoe · on March 24, 2024

Yeah, this happened to us too a couple of times. Once, we asked for a clarification and he took it as a sensible requirement, made the changes and took it to master in just over an evening.

The next day we just had to update our package to the latest version and marvel at his response time.

ananthakumaran · on March 25, 2024

Have you checked the js grammar? It has a regex grammar, though not sure if that's what you are looking for

AlexErrant · on March 25, 2024

Indeed ;) https://discuss.codemirror.net/t/help-with-regex-expression-...

RyanHamilton · on March 24, 2024

Massively agree. Code mirror 5 was excellent. Code mirror 6 was a big enough improvement to justify the upgrade. I've used it as part of 2 large projects and it's handled every expanding use case I've needed it to. It supports themes, sql,js and 2 weeks ago I used its diff functionality. Really great library I can fully recommend.

umvi · on March 25, 2024

Code mirror 6 is super hard to use though unless you are a front end expert. Code mirror 5 was basically plug and play. Code mirror 6 is a box of Legos and you are responsible to build and bundle your own editor from scratch.

spankalee · on March 25, 2024

I have a project that's intended to make Code Mirror 6 as ready to use as any HTML tag: https://github.com/justinfagnani/codemirror-elements

You can put a basic code editor on your page like:

    <cm-editor></cm-editor>

and drop in themes and modes like:

    <cm-editor>
      <cm-lang-javascript typescript></cm-lang-javascript>
      <cm-theme-one-dark></cm-theme-one-dark>
    </cm-editor>

This works in plain HTML or any framework.

Turing_Machine · on March 25, 2024

Looking at this now. Thanks!

How about a dist folder with a plain js file (or a js file and a css file) that you can load directly?

I am sick unto death of Node and all its associates.

spankalee · on March 25, 2024

The npm package has the pre-built files: https://www.npmjs.com/package/codemirror-elements

I publish to npm because 1) Package managers are a fabulous idea. They're an easy way to download a package and it's dependencies, and update them over time. 2) it's not a good idea to push build artifacts to your repo, 3) This depends on CodeMirror, which is published to npm. Otherwise I would have to provide it too. That's too much work. 4) you don't need to be a Node developer to use npm.

kermire · on March 25, 2024

Not as extensive as the previous poster. I wrote a small custom component wrapper a while back: https://github.com/flawiddsouza/code-mirror-custom-element.

Turing_Machine · on March 25, 2024

> Code mirror 6 is a box of Legos and you are responsible to build and bundle your own editor from scratch.

Yep. Let me stress that there's absolutely nothing wrong with this if that's what you want to do -- but it's less than optimal for the person who just wants to add a JavaScript file for the editor, with maybe a JS/CSS combo to support syntax highlighting for a specific language. Codemirror 5 was like that, pretty much.

You basically have to set up a whole independent build and packaging system to configure Codemirror 6, and a lot of people just don't want to deal with that.

It's a pity it's the only JavaScript-based editor that works at all reliably on mobile (I would be delighted to be proven wrong about this).

The_Colonel · on March 25, 2024

That's why I'm staying with CodeMirror 5 too.

It seems like for the past few years many high profile JS projects increased the complexity of using their software for (to me) unclear benefits.

Turing_Machine · on March 25, 2024

https://en.wikipedia.org/wiki/Second-system_effect

bpev · on March 24, 2024

I jusy used this recently for making Traindown syntax highlighting! It was pretty intuitive, even though I haven't done this kind of thing before!

https://github.com/inro-digital/lang-traindown

https://traindown.com/

I did notice that it seems (rightfully) very focused on the exact usecase of syntax highlighting. Do people also use this kind of system for quickly building ways to parse data from arbitrary text syntaxes?

jitl · on March 24, 2024

On one hand, it’s a nice framework. I customized the Typescript one a bunch for a lil side project and enjoyed myself. On the other hand, it would be great if CodeMirror could just work with Tree-sitter or similar. There’s a lot of ecosystem around other parsing systems, and needing to figure out Lezer stuff is a big friction for adopting CodeMirror 6 for me. There are not a lot of language packages listed: https://codemirror.net/docs/community/

There’s a kind of importer thingy here but it doesn’t work well for complex grammars: https://github.com/lezer-parser/import-tree-sitter

AlexErrant · on March 24, 2024

> Unfortunately, tree-sitter is written in C, which is still awkward to run in the browser (and CodeMirrror targets non-WASM browsers). It also generates very hefty grammar files because it makes the size/speed trade-off in a different way than a web system would.

I would be curious if there's been an effort to get tree-sitter working on the web.

gushogg-blake · on March 24, 2024

Tree-sitter does run on the web. I got it working for my editor, but it did involve several days' worth of effort and getting into the weeds with emscripten. Details here - https://gushogg-blake.com/p/emscripten-web-modules/.

lukan · on March 25, 2024

Can you explain the motivation why you did not use lezer?

I just discovered lezer and am tinkering with integrating it into my projects, but maybe tree-sitter would be a better fit?

(Speed could be a determining factor, but working with wasm can also make things slower, if you often have to get data in and out)

gushogg-blake · on March 25, 2024

I didn't know about Lezer. I think I probably would have used it if I'd known - or at least tried it before Tree-sitter, as WASM obviously brings in extra complexity.

Having used Lezer and CodeMirror a bit on a different project, they both seem like they are probably high quality and well thought out projects from a speed and reliability perspective, but I found the architecture and docs to be confusing and unergonomic.

The main issue I have with the docs/architecture (for CM at least) is that they use a concept called "facets" without really explaining what it means, and to be honest it felt like a level of abstraction/indirection that my brain couldn't handle.

The other issue with the docs is that they don't seem contextualised enough, somehow. They would list the functions/methods of an object but not really explain how it fit into the system as a whole. Maybe this is to do with not understanding facets, or CM's extension architecture, enough, or something, but it was definitely a recurring theme of my experience. I also found the CM API slightly confusing in that (almost) everything you do is via a function that takes some state object as input, as opposed to via calling methods on that object, but that's more of a style issue once you realise that you have to `import {doSomething} from "codemirror"` and `doSomething(state)` as opposed to `state.doSomething()`.

lukan · on March 25, 2024

Thx. I will just try both I think. The other comment mentioned treesitter now provides native bindings, so it should be easy to try it out.

k8svet · on March 25, 2024

Lezer is 5 years old. It's quite clear which won out between Tree-Sitter and Lezer, and it's not Lezer.

lukan · on March 25, 2024

Sorry, but it is not clear to me. Lezer obviously works in the real world, as codemirror is quite widespread (integrated in chrome dev tools for example).

And since I use codemirror anyway, it might make sense to also use lezer, unless there are downsides I am not aware of.

jitl · on March 25, 2024

I think Lezer is great if there’s a parser for your language, or you only need to support a few languages and you have time to build grammars yourself.

When you need to build a general tool that supports a ton of languages, that’s when Lezer doesn’t fit as well since there just aren’t a ton of existing grammars out there.

lukan · on March 25, 2024

I see. Well I am fine with just js and ts for now, but long term I need more language support. So I probably should choose tree sitter.

jitl · on March 24, 2024

I learned from a google search that these days upstream tree-sitter provides WebAssembly bindings.

Source: https://github.com/tree-sitter/tree-sitter/tree/master/lib/b...

NPM: https://www.npmjs.com/package/web-tree-sitter

Download from the latest Github release: js file (https://github.com/tree-sitter/tree-sitter/releases/download...) and wasm file (https://github.com/tree-sitter/tree-sitter/releases/download...)

It's great to see upstream maintaining the bindings. I maintain a Typescript/Empscripten/WebAssembtly binding for quickjs and it's more involved than I would like.

Besides providing a WASM build, Emscripten can build a lot of C code to straight JS, which you can use in non-WebAssembly browsers, or in runtimes like Cloudflare or Vercel Edge that support WebAssembly but it's a pain in the behind. Unfortunately the tree-sitter upstream doesn't output such a build but given they already maintain the bindings it wouldn't be a lot of effort to pass the option to emcc (set -s WASM=0 -s SINGLE_FILE=1 instead of -s WASM=1)

conartist6 · on March 25, 2024

I will probably be able to run tree-sitter grammars in plain JS (no WASM) soonish (within the next six months, say) on the BABLR VM

amluto · on March 25, 2024

Silly question. The OP says:

    if (true) something()
    // Comment
    otherStatement()

> ... the comment should not be part of the if statement's node.

Why not? A comment seems like it isn’t functionally of the syntax tree, but, to the extent that one treats it like it is, it does seem like it’s in the middle of an if statement.

lioeters · on March 25, 2024

The first line is a complete statement that finishes at the end of the line - unless there's an else clause in the following line (not counting comments or empty lines). Expressed another way:

  if (true) {
    something()
  }

  // Comment

  otherStatement()

If this last line started with "else", then the comment would be within the if statement.

jonrosner · on March 24, 2024

I've been using both codemirror and lezer in Yaade (https://github.com/EsperoTech/yaade). Thanks to lezer I was able to write a JSON extension language that supports Yaade environment variables. Pretty cool project and very nicely documented! I love building OSS on top of OSS.

habitue · on March 25, 2024

The discussion of the error recovery strategy is pretty interesting. It seems very similar to me to monte carlo tree search, which kind of suggests using a value network to estimate which branches to prune

timhh · on March 24, 2024

I attempted to use this but was disheartened but the fact that it doesn't statically type node names. Tree Sitter doesn't either but it has much more of an excuse given that it targets C.

https://github.com/lezer-parser/lezer/issues/8

The dev seems mildly hostile to outside involvement too, so I moved on. These days I use Chumsky which is Rust rather than Typescript, but also way more awesome, if you can deal with the often incomprehensible compilation errors at least!

https://github.com/zesterer/chumsky

stevemk14ebr · on March 25, 2024

None of those responses were at all hostile

lukan · on March 25, 2024

He probably means this:

"It's rarely a good idea to jump in and start a big refactor of someone else's library without first discussing the direction you're going in."

But I agree, not hostile, just unwilling to do something, he does not want to do. Even if the other one already put work in it.

timhh · on March 25, 2024

Yes it's unsolicited "advice" that shows an uncooperative attitude IMO. You can say "I don't want to do that" in a nice way without being patronising. If you look at the other closed MRs you'll see a similar attitude.

E.g. here (I'd forgotten about this actually): https://github.com/lezer-parser/generator/pull/6#issuecommen...

Here https://github.com/lezer-parser/lr/pull/64#issuecomment-1802...

It's nothing major but just emanates "difficult to work with" vibes so I didn't want to spend my time working with a project like that (looks like not many other people do either).

rokkitmensch · on March 25, 2024

The onus when working on an existing project is for new collaborators to understand how to work effectively with the existing norms, not for project operators to contort themselves into whatever chimera is expected of them by drive-by patchmakers.

lukan · on March 25, 2024

Well yes, he does not want to discuss the direction of developement. He decides that. If he thinks something is not helping, he won't do it.

So sure, taking part in the project would mean accepting his way of things.

So difficult to work with? Depends on your expectations I guess.

And the comment above could have been maybe written nicer, but I see it as a defense of "no, I don't need to accept your PR just because you put work in it, you should have checked with me first, before doing it"

Because he had those discussions quite often I think (I follow decelopement from time to time since quite a while)

timhh · on March 25, 2024

> no, I don't need to accept your PR just because you put work in it, you should have checked with me first, before doing it

Which is a bit rude IMO, and not really in the spirit of open source. I never demanded that he accept it. A PR is like "here's some code, let me know if it's ok" not "you should merge this code without question".

Here is what I would have written:

"Hi, thanks for the code but I don't think I want to go in that direction because X Y Z." (he didn't give any clear reasons so you'll have to imagine those).

Anyway he's free to do his thing. I was just explaining why I moved away from that library.

lukan · on March 25, 2024

"Which is a bit rude IMO, and not really in the spirit of open source. I never demanded that he accept it. "

Other people did before you. But I can see how you perceived that reaction to you as hostile. It was definitely a bit rude. I disagree a bit though, that it is not in the spirit of open source. Open source does not necessarily mean open for collaboration. He does accept PRs, if they fit his vision .. and get defensive, when they don't, because he don't want to spend energy arguing why not. He does not have to. So yes, it would be nicer, if he would be nicer. But he is the way he is and still he managed to build a great open source product.

timenova · on March 24, 2024

(2019)

josh11b · on March 24, 2024

(2019)