Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You don't want an AST or even a full parser for formatting most languages.

Tree-sitter deals with errors better than most parser generators but if you just lex and separate into chunks then you can much more flexibly format broken code.



Hej!

I agree with you in that there are many languages where skipping parsing altogether could still result in a good formatter, and I would love to see a Topiary-like project attempt it.

I don't feel confident in saying that that holds for most languages however, worrying that it can lead to a lot of ambiguity in languages with more complex formatting conventions.

Regardless, the eventual goal of Topiary is to be able to format the widest possible spectrum of languages, and so limiting ourselves to just lexing didn't seem like the right choice at the time.

Like you mention, this does mean we give up being able to format broken code. In fact, we currently even ensure that TS is able to parse the entire input before formatting. This is a shame, but ultimately what we decided was the best approach for Topiary to achieve its goal.


How do you produce something like this with just lexing?

    aaa(
      bbb,
      ccc(
        ddd,
        eee,
      ),
      fff,
    )


Configure the line breaking heuristic properly.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: