Hacker News new | past | comments | ask | show | jobs | submit login

I'm in the midst of creating a DSL (it's my first try) and this sounds like really good advice. I'm not clear on the third item though - by LR parser generator are you talking about something like ANTLR?



Raku is a general purpose language with built in Grammar capability - some info here https://gist.github.com/raiph/32b3ba969b4eb939a63f48de14d0a1...] and here https://docs.raku.org/language/grammars. I use it for small (and composable) grammar segments and for prototyping. The Raku community over at Discord is quite helpful.


Yes, something like ANTLR for Java, or the entire list here: https://en.wikipedia.org/wiki/Comparison_of_parser_generator...

I've had good experiences with Menhir (for OCaml) and Tree-sitter, and implemented my own SLR parser generator for C# https://github.com/Lokad/Parsing

In the end, what matters is that they should be able to report conflicts and ambiguities.


Are you saying PEGs can't conflicts and ambiguities? I didn't know that.


Not having ambiguities is actually the main selling point of PEGs. If you have two rules A and B that can both match the input, then a CFG A|B has an ambiguity (two possible derivations), but a PEG A/B explicitly says that the A derivation is chosen. The good part is that unlike a CFG, the PEG doesn't require you to go and fix anything (the / operator already did that for you). This makes the initial implementation of the grammar easier.

On the other hand, if you already have code in the wild that uses the old grammar G1, and in order to add new features, you introduce a new grammar G2 that is a superset of G1. You need to know if any of the existing code has a derivation in G2 that is different from its derivation in G1 (as that would cause backwards incompatibility). With a PEG, there's no way to tell, so you have to check this by hand (and mistakes are easy). With a CFG, you know that backwards incompatibility happens if and only if G2 has conflicts, and those conflicts are precisely the cases that are not backwards compatible.


Excellent answer, thanks


A PEG is not actually a generative grammar but a domain-specific language for specifying top-down parsers. So they are free of conflicts and ambiguities by definition of their semantics.

PEG is actually just syntactic sugar on top of (G)TDPL: https://en.wikipedia.org/wiki/Top-down_parsing_language


you might want to look at Rascal[0]

[0]: https://www.rascal-mpl.org/


Nice, this looks interesting. I was building my lexer in tree-sitter but not sure what to use after that. I'll check this out, thank you.


What host language are you using?


To be perfectly honest, I'm not sure what that means. I'm new to language design. I can describe what I'm trying to do.

I'm creating a UI description language for designers. The goal is to let them use terminology and mental models familiar to them to describe their designs in code, instead of a visual tool like Figma.

A trivial example would look something like this:

  component Button {
    elements
      shape btn-container
        text btn-label

    style btn-container
      fill-color: blue

    style btn-label
      content: Click Me
      text-color: white
  }
The output of the language would be an intermediate representation, I'm imagining a JSON object or something with a very specific schema. This can then be transpiled into any format the developer wants - you could build a transpilation target for React, Vue, Svelte, plain old html/css, etc etc.

So I'm in a weird spot where I know what I want to make, but I don't know any of the conventions or tools common in the language design world, because I'm just stepping into it.


Cool idea!

Have you ever designed a library? If not, then my recommendation would be to start with it, for example in Typescript if you are familiar with it. I.e. build a set of functions that allow you to create the datastructure (AST) above. That is the most important part, encoding it into json or something like that is very easy afterwards.

If you feel comfortable with your result, then you can think about turning it into a DSL. Reasons to do so include mostly syntactical concerns (since you don't want designers to learn typescript, even if it's a subset). You will have to do a lot of the things that the host-language otherwise does for you though, including syntax-highlighting, type-checking and so on.


The host language is the language that you are using to write the compiler/interpreter. I suppose the grandparent is asking the question in order to recommend a parser generator for your use case. From your GitHub profile, I assume it's JS or TypeScript ?


Ah I see, thanks. Probably Typescript, although I'm considering Rust as well. I'm using tree-sitter for the lexer.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: