It is, but the counter argument is that parsers are already so fast that streami...

diffxx · on March 22, 2024

I am quite sure that batch will work with good responsiveness for many, if not most, common languages provided source files have fewer than say 30k lines in them. If you just think about the io performance of modern computers, it should not be that difficult to parse at 25MB/sec which I estimate translates to between 500K to 1M loc, which again is in the 15k-30k loc range per 30ms.

I'm not saying that incremental is bad per se, but that the choice of guaranteeing incrementalism complicates things for cases where it isn't necessary. I am not super familiar with lsp, but I can imagine lsp having a syntax highlighting endpoint that has both batch and incremental modes. A naive implementation could just run the batch mode when given an incremental request and later add incremental support as necessary. In other words, I think it would be best if there were another layer of indirection between the editor and the parser (whether that is tree-sitter or another implementation).

Right now though, you have to opt in whole hog to the tree-sitter approach. As mentioned above, incrementalism has no benefit and only cost for a batch tool like difftastic or semgrep to mention two named in this thread.

kstrauser · on March 22, 2024

That makes sense to me. I don't know for sure that you're right but it sure seems plausible.

I do wonder how much of a range there is on non-brand-new computers though. I'm typing this on an M2 Max with 64GB of RAM. I also have a Raspberry Pi in the other room, and I know from hard experience that what runs screamingly fast on my Mac may be painfully slow on the Pi.

I could also imagine power benefits to an incremental model. If I type a single character in the middle of a 30KLOC document, a batch process would need to rescan the entire thing where a smart incremental process could say "yep, you're still in the middle of a string constant".

chubot · on March 23, 2024

I think it simply boils down to the requirements of interactive editors vs. batch tools.

I have no doubt that interactive editors like Atom/Zed can really make use of incremental parsing, and also lenient parsing.

Syntax highlighting and parsing isn't the only thing they do -- they still need the CPU for other things.

But yeah the problem is incremental is very different than batch, and lenient is very different than strict, so basically every language needs at least 2 separate parsers. That's kind of an unsolved problem, and I'm not sure it can be solved even in principle ...