I'm always amazed by Marijn's brilliance (Author of Lezer, CodeMirror and the awesome ProseMirror toolkit)
The level of depth he dives into to perfect his projects is insane. For example, lezer is a parser generator( which by itself is not a trivial feat with novel
ideas like incremental computations applied to parsing) to power his mainstream project which is CodeMirror.
Like this Prosemirror too has some insane levels of engineering underneath married with thoughtful architectural decisions.
He's incredibly responsive too. He just answered my question about an hour after it was asked... on a Sunday evening.
Slightly off topic; is anyone aware of any Lezer grammar for regex? I've not been able to find any in the FOSS world. I suspect Regex101 has one, bit it's sadly closed source.
Yeah, this happened to us too a couple of times. Once, we asked for a clarification and he took it as a sensible requirement, made the changes and took it to master in just over an evening.
The next day we just had to update our package to the latest version and marvel at his response time.
Massively agree. Code mirror 5 was excellent. Code mirror 6 was a big enough improvement to justify the upgrade. I've used it as part of 2 large projects and it's handled every expanding use case I've needed it to. It supports themes, sql,js and 2 weeks ago I used its diff functionality. Really great library I can fully recommend.
Code mirror 6 is super hard to use though unless you are a front end expert. Code mirror 5 was basically plug and play. Code mirror 6 is a box of Legos and you are responsible to build and bundle your own editor from scratch.
I publish to npm because 1) Package managers are a fabulous idea. They're an easy way to download a package and it's dependencies, and update them over time. 2) it's not a good idea to push build artifacts to your repo, 3) This depends on CodeMirror, which is published to npm. Otherwise I would have to provide it too. That's too much work. 4) you don't need to be a Node developer to use npm.
> Code mirror 6 is a box of Legos and you are responsible to build and bundle your own editor from scratch.
Yep. Let me stress that there's absolutely nothing wrong with this if that's what you want to do -- but it's less than optimal for the person who just wants to add a JavaScript file for the editor, with maybe a JS/CSS combo to support syntax highlighting for a specific language. Codemirror 5 was like that, pretty much.
You basically have to set up a whole independent build and packaging system to configure Codemirror 6, and a lot of people just don't want to deal with that.
It's a pity it's the only JavaScript-based editor that works at all reliably on mobile (I would be delighted to be proven wrong about this).
I did notice that it seems (rightfully) very focused on the exact usecase of syntax highlighting. Do people also use this kind of system for quickly building ways to parse data from arbitrary text syntaxes?
On one hand, it’s a nice framework. I customized the Typescript one a bunch for a lil side project and enjoyed myself. On the other hand, it would be great if CodeMirror could just work with Tree-sitter or similar. There’s a lot of ecosystem around other parsing systems, and needing to figure out Lezer stuff is a big friction for adopting CodeMirror 6 for me. There are not a lot of language packages listed: https://codemirror.net/docs/community/
> Unfortunately, tree-sitter is written in C, which is still awkward to run in the browser (and CodeMirrror targets non-WASM browsers). It also generates very hefty grammar files because it makes the size/speed trade-off in a different way than a web system would.
I would be curious if there's been an effort to get tree-sitter working on the web.
Tree-sitter does run on the web. I got it working for my editor, but it did involve several days' worth of effort and getting into the weeds with emscripten. Details here - https://gushogg-blake.com/p/emscripten-web-modules/.
I didn't know about Lezer. I think I probably would have used it if I'd known - or at least tried it before Tree-sitter, as WASM obviously brings in extra complexity.
Having used Lezer and CodeMirror a bit on a different project, they both seem like they are probably high quality and well thought out projects from a speed and reliability perspective, but I found the architecture and docs to be confusing and unergonomic.
The main issue I have with the docs/architecture (for CM at least) is that they use a concept called "facets" without really explaining what it means, and to be honest it felt like a level of abstraction/indirection that my brain couldn't handle.
The other issue with the docs is that they don't seem contextualised enough, somehow. They would list the functions/methods of an object but not really explain how it fit into the system as a whole. Maybe this is to do with not understanding facets, or CM's extension architecture, enough, or something, but it was definitely a recurring theme of my experience. I also found the CM API slightly confusing in that (almost) everything you do is via a function that takes some state object as input, as opposed to via calling methods on that object, but that's more of a style issue once you realise that you have to `import {doSomething} from "codemirror"` and `doSomething(state)` as opposed to `state.doSomething()`.
Sorry, but it is not clear to me. Lezer obviously works in the real world, as codemirror is quite widespread (integrated in chrome dev tools for example).
And since I use codemirror anyway, it might make sense to also use lezer, unless there are downsides I am not aware of.
I think Lezer is great if there’s a parser for your language, or you only need to support a few languages and you have time to build grammars yourself.
When you need to build a general tool that supports a ton of languages, that’s when Lezer doesn’t fit as well since there just aren’t a ton of existing grammars out there.
It's great to see upstream maintaining the bindings. I maintain a Typescript/Empscripten/WebAssembtly binding for quickjs and it's more involved than I would like.
Besides providing a WASM build, Emscripten can build a lot of C code to straight JS, which you can use in non-WebAssembly browsers, or in runtimes like Cloudflare or Vercel Edge that support WebAssembly but it's a pain in the behind. Unfortunately the tree-sitter upstream doesn't output such a build but given they already maintain the bindings it wouldn't be a lot of effort to pass the option to emcc (set -s WASM=0 -s SINGLE_FILE=1 instead of -s WASM=1)
> ... the comment should not be part of the if statement's node.
Why not? A comment seems like it isn’t functionally of the syntax tree, but, to the extent that one treats it like it is, it does seem like it’s in the middle of an if statement.
The first line is a complete statement that finishes at the end of the line - unless there's an else clause in the following line (not counting comments or empty lines). Expressed another way:
if (true) {
something()
}
// Comment
otherStatement()
If this last line started with "else", then the comment would be within the if statement.
I've been using both codemirror and lezer in Yaade (https://github.com/EsperoTech/yaade). Thanks to lezer I was able to write a JSON extension language that supports Yaade environment variables. Pretty cool project and very nicely documented! I love building OSS on top of OSS.
The discussion of the error recovery strategy is pretty interesting. It seems very similar to me to monte carlo tree search, which kind of suggests using a value network to estimate which branches to prune
I attempted to use this but was disheartened but the fact that it doesn't statically type node names. Tree Sitter doesn't either but it has much more of an excuse given that it targets C.
The dev seems mildly hostile to outside involvement too, so I moved on. These days I use Chumsky which is Rust rather than Typescript, but also way more awesome, if you can deal with the often incomprehensible compilation errors at least!
Yes it's unsolicited "advice" that shows an uncooperative attitude IMO. You can say "I don't want to do that" in a nice way without being patronising. If you look at the other closed MRs you'll see a similar attitude.
It's nothing major but just emanates "difficult to work with" vibes so I didn't want to spend my time working with a project like that (looks like not many other people do either).
The onus when working on an existing project is for new collaborators to understand how to work effectively with the existing norms, not for project operators to contort themselves into whatever chimera is expected of them by drive-by patchmakers.
Well yes, he does not want to discuss the direction of developement. He decides that. If he thinks something is not helping, he won't do it.
So sure, taking part in the project would mean accepting his way of things.
So difficult to work with?
Depends on your expectations I guess.
And the comment above could have been maybe written nicer, but I see it as a defense of "no, I don't need to accept your PR just because you put work in it, you should have checked with me first, before doing it"
Because he had those discussions quite often I think (I follow decelopement from time to time since quite a while)
> no, I don't need to accept your PR just because you put work in it, you should have checked with me first, before doing it
Which is a bit rude IMO, and not really in the spirit of open source. I never demanded that he accept it. A PR is like "here's some code, let me know if it's ok" not "you should merge this code without question".
Here is what I would have written:
"Hi, thanks for the code but I don't think I want to go in that direction because X Y Z." (he didn't give any clear reasons so you'll have to imagine those).
Anyway he's free to do his thing. I was just explaining why I moved away from that library.
"Which is a bit rude IMO, and not really in the spirit of open source. I never demanded that he accept it. "
Other people did before you. But I can see how you perceived that reaction to you as hostile. It was definitely a bit rude. I disagree a bit though, that it is not in the spirit of open source. Open source does not necessarily mean open for collaboration. He does accept PRs, if they fit his vision .. and get defensive, when they don't, because he don't want to spend energy arguing why not. He does not have to. So yes, it would be nicer, if he would be nicer. But he is the way he is and still he managed to build a great open source product.
The level of depth he dives into to perfect his projects is insane. For example, lezer is a parser generator( which by itself is not a trivial feat with novel ideas like incremental computations applied to parsing) to power his mainstream project which is CodeMirror.
Like this Prosemirror too has some insane levels of engineering underneath married with thoughtful architectural decisions.
Apparently, a big fan of his works.