It's really annoying to produce an AST from tree-sitter. I tried writing my prog...

vkazanov · on May 20, 2023

I hope the author of the blog post realizes that treesitter was never mean to be "an editable ast" or something. It is a partial ast parser taking into account presence of errors and error reporting. It is written in C for a reason: it needs to quickly get a partial ast and report errors in it.

There SO MANY perfect tools for language implementation and ast manipulation..! Starting with ML family languages that are very good at this.

hardwaregeek · on May 20, 2023

There aren’t that many great tools that also buy you into a syntax highlighting, code searching and the general tree-sitter ecosystem. If tree-sitter could produce an AST, it’d be a very compelling option for writing a compiler. Not to mention the robust, fast incremental parsing is precisely what modern compilers need, see Roslyn’s red green trees or rust analyzer’s Rowan crate.

vkazanov · on May 20, 2023

Having a reasonable formal grammar is like 90% percent of making a parser. Tresitter or not.

hardwaregeek · on May 21, 2023

If you’re writing a parser for a simple language, sure. But most programming languages have a sophisticated enough grammar, with stringent enough performance requirements and error handling requirements, that a library that can function as a single source of truth for your compiler and your tooling is very valuable. Take JavaScript, which has extremely hairy logic around JSX parsing and arrow functions. Or C’s issues with preprocessing.

vkazanov · on May 21, 2023

So you want to use a hammer for cutting trees because an axe looks somewhat similar anyway and it just doesn't make sense to have both :-) don't blame you hammer for being a bad axe!

People have been looking for universal approach to parsing for so long... maybe there is one, maybe not, but treesitter was never meant to be one.

And it's great for what it does!

hardwaregeek · on May 21, 2023

No...more like we're building a sophisticated infrastructure for cutting trees that handles trees that are malformed, processes them super efficiently, and handles all sorts of different species. And you seem to think that I want an axe.

If you want to create a parser for a toy language that produces an AST or a single error, then sure, that's trivial. But if you want a parser that does good error recovery, produces a high fidelity CST, and reuses memory in an efficient manner (red-green trees ideally), that's a lot of work. And that's table stakes for good programming language tooling. We're not in the era of emacs plugins that do regex syntax highlighting and call it a day. If there was a framework that could accomplish this, and function as a parser for the compiler (which is not so crazy, since most modern compilers are also the engines for tooling, i.e. language servers)

I agree that tree-sitter was never meant to be a universal solution, but I think it's easily could be with some adjustments. And because of the existing infrastructure, because of the existing parsers, I think that it's reasonable to consider pushing tree-sitter in that direction instead of creating yet another parsing framework.

vkazanov · on May 22, 2023

Let's see if somebody can come up with something replacing treesitter :-)

As somebody who came up with a couple of quick modes and parsers for Emacs and in Emacs Lisp I can say that for people like myself it's a blessing. I sincerely hate how there are numerous implementations of everything in dozens of editors out there, but nobody benefits from each others work in a reasonable way... Treesitter's universal community-centric approach kind of resonates with the stronger side of OSS: suddenly all of these little steps individuals do contribute to the ecosystem as a whole.

Now, admittedly, all I need is an axe. I know I need an axe, treesit gives it to me and this makes me a happy little contributor.

So let's say somebody comes up with a factory of a tool. All inside: properly incremental, smart error handling, tree editing, transformations and stuff. Something tells me it would much harder to contribute a simplified barely working grammar for that thing. And this kind of kills the point of emacsy-sh moonlight hacker tool.

It would be useful, sure, but would it work in practise?

hardwaregeek · on May 24, 2023

Oh I totally agree with your sentiment about tree-sitter. That's why I want it to be extended in functionality. It makes so much sense to have a single place where one parser can be written and everybody benefits. Much like language servers.

Where I disagree is that IMO, tree-sitter already is very close to this ideal model. It has incremental parsing. It has great tree querying. Where it needs help is an AST facade over the raw syntax tree, which is very much feasible. rust-sitter[1] does it for instance. Tree-editing and tree construction is also very much doable. I don't think it'd have an impact on grammar construction at all. As for error recovery, I think it could function as a reparsing feature where you can drop down to a manual parser (or even a secondary grammar) that is more tolerant. Or an error recovery function that can be written in any language. tree-sitter already has the ability to use a manual lexer written in native code, so this is not such a stretch.

[1]: https://github.com/hydro-project/rust-sitter.