Hacker News new | past | comments | ask | show | jobs | submit login
How to write a tree-sitter grammar in an afternoon (siraben.dev)
112 points by siraben on March 13, 2022 | hide | past | favorite | 9 comments



I found this about implementing Imp[0] in Coq (which is the language he writing tree-sitter for) to be interesting as well.

[0]: https://softwarefoundations.cis.upenn.edu/lf-current/Imp.htm...


Hmm... For a work project I'd like to parse a bunch of our SQL files and make a data lineage report. I looked into ANTLR4 to do it, but it seemed to heavy and slow (could very well be my understanding of it). I wonder if tree-sitter would be better.


What’s the difference between tree-sitter and LSP[0]? They both seem to provide the same functionality. [0]: https://microsoft.github.io/language-server-protocol/


LSP is a protocol while tree-sitter is a parsing tool and library. LSP is a protocol developed by Microsoft for IDEs to communicate with a separate process to provide annotations and insight into source code. Practically, tree-sitter is usually used for syntax highlighting, while servers implementing LSP are used for compilation errors, warnings, jump to definition, linter warnings, symbol highlighting, etc etc etc.


Tree-sitter works like a syntax highlighting plugin, and doesn’t require communication with an external process. It’s like syntax highlighting but with a deeper understanding of the programming language, not based just on regular expressions. So not only is the highlighting better, but you can do more things, such as selecting blocks or jumping among them. One of my favorite goodies is being able to ask the editor what function my cursor is inside.


They don't overlap in functionality at all. LSP is a protocol for communicating with IDE plugins. Tree-Sitter is a parser.

They're often used together. I've written a couple of language servers that use Tree-Sitter to parse documents.

For example when you hover something in VSCode it uses the LSP to communicate with the language server and say "oi, what's on line 5 column 10" and then the language server uses Tree Sitter to parse the document and figure out the answer (or some other parser).


Does tree-sitter support nested languages? E.g. Javascript or CSS within HTML within Javadoc within Java source code?


Yes, this is possible with language injection[0]

[0] https://tree-sitter.github.io/tree-sitter/syntax-highlightin...


Does this cover things like separating leading /// comment indicators? That’s something Vim’s :syn-include can’t cope with.

  /// This is Markdown-in-Rust.
  ///
  /// ```rust
  /// // This is Rust-in-Markdown-in-Rust.
  /// /// This is Markdown-in-Rust-in-Markdown-in-Rust.
  /// const RUST_IN_MARKDOWN_IN_RUST: &'static str = "
  ///     In this multi-line string (as the most common example of such token-mixing),
  ///     those triple slashes are Comment, not String.";
  /// // And if Trait and Type get parsed carefully below, the /// must not interrupt that.
  /// impl
  /// Trait
  /// for
  /// Type {}
  /// ```
  struct ThisIsRust;
The remark about injection.include-children in your link suggests it can, but I don’t know enough about it yet to be confident of that preliminary assessment.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: