> note that gofmt doesn't indent, which is the hard part Huh? I'm pretty sure it...

sa46 · on Feb 20, 2024

I suspect the author meant that gofmt doesn’t break lines, which dramatically simplifies the complexity. See https://journal.stuffwithstuff.com/2015/09/08/the-hardest-pr... for a fun, related read.

munificent · on Feb 20, 2024

Yeah, line splitting is the hard part.

I wrote that article and I'm actually in the middle of rewriting dartfmt right now to have an entirely new internal representation.

chubot · on Feb 20, 2024

Oh I'm curious why you're rewriting it? Maybe speed, quality, handling new language features, or all of the above? I may dive into this problem

What do you think of the functional "pretty printing languages" like Wadler's (cited in the blog post)?

It is kinda interesting that the author of prettier credits the algorithm, but other people say that it's more about labor and testing of specific style rules.

Having never written a pretty printer, it does seem like there is a lot of labor in encoding the "rules people like", and flags for different styles. And then there's the actual algorithm to find the line breaks.

Or maybe they are not really separate things. I think the more academic work is trying to separate the 2 things, but they can be fairly intertwined? (e.g. for performance reasons)

munificent · on Feb 20, 2024

> Oh I'm curious why you're rewriting it?

The primary driver is that we're moving to a fairly different formatting style: https://github.com/dart-lang/dart_style/issues/1253

The formatter works sort of like a compiler in that it parses the code, translates it to an internal representation, does optimization on that IR, and then outputs final code. The main difference is that the "final code" is also source code, and the "optimization" is line splitting.

The old IR grew organically over time and got increasingly difficult to work with. It baked certain formatting choices directly into the IR (mainly indentation) which line splitting then had no control over. For example, given a function call like:

    someLongFunctionName(some + long + argument + expression, [firstElement, anotherElement, aThirdElement]);

We might want to format it like this if the function name and first argument fits on one line:

    someLongFunctionName(some + long + argument + expression, [
      firstElement,
      anotherElement,
      aThirdElement
    ]);

But if the first argument doesn't fit, then we probably want:

    someLongFunctionName(
        some + long + argument + expression,
        [
          firstElement,
          anotherElement,
          aThirdElement
        ]);

Note how the indentation of the list elements depends on how we choose to line split the argument list. The old formatter's IR just couldn't model that at all.

For years, I've wanted a better IR that could express formatting like this. And since we were making sweeping changes to the formatting style (including some that would be very hard to implement with the old IR), it seemed like the right time to move to a new internal representation too.

> What do you think of the functional "pretty printing languages" like Wadler's (cited in the blog post)?

I have to confess that I worked on dartfmt for a few years before I stumbled onto that paper. I'm somewhat familiar with it, but I've never taken the time to really dig into it.

I could be wrong, but I strongly suspect that the formatting rules we want for Dart are too complex to model using Wadler's formalism directly, and I'm not sure if extending it to support the formatting rules we want would sacrifice its simplicity or performance.

Given that, I sort of stuck with the devil I already knew—dartfmt's current architecture—and built off of that.

> it does seem like there is a lot of labor in encoding the "rules people like", and flags for different styles. And then there's the actual algorithm to find the line breaks.

Yes, and the two are deeply intertwined. The majority of dartfmt's code by line count is just implementing the style rules for every part of the language grammar. The most difficult code to write and maintain in dartfmt is the line splitting algorithm, largely because it's combinatorial when done naïvely.

chubot · on Feb 21, 2024

OK very interesting! I skimmed over the issue and some of the examples.

It does seem like there are at least two schools of thought on the pretty printers -- those influenced by the Wadler pretty printing language (functional style), and those more influenced by go fmt (which does no line wrapping) and clang-format (which does).

There's a very recent paper linked here which has a good survey of the functional style. Table 1 is a nice summary of the different algorithms and IRs, a bunch of them appearing only in the last 10 years.

https://lobste.rs/s/aevptj/why_is_prettier_rock_solid

A Pretty Expressive Printer (with Appendices) - https://arxiv.org/pdf/2310.01530.pdf

I'm probably biased toward the "non-PPL" style of clang-format because I think it's the best formatter I've used. It's fast and does what I want. I would probably attribute that to all the elbow grease and testing put in over the years, and being written in C++, not necessarily the wrapping algorithm.

But now that I see the examples, my suspicion is that some of the newer IRs and algorithms do address some of the Dart problems (though I didn't try anything!). Dart seems to have long expressions very much in common with functional languages, as opposed to Go, which apparently doesn't need wrapping at all!

I found this post linked from the paper, and it has 3 examples of formatting s-expressions that seem pretty similar to your examples.

And they discuss what IR is needed to express that. They talk about greedy algorithms vs. "non-local" dependencies.

https://jyp.github.io/posts/towards-the-prettiest-printer.ht...

It seems like the IR of chunks, rules, and spans has a lot of similarity to what they call "groups" and so forth. Several years ago, long before I looked at any of this, I actually implemented a data structure printer with the same 2 rules as on page 2 of this paper - https://lindig.github.io/papers/strictly-pretty-2000.pdf

i.e. try to fit all the parts on a single line, and if that fails, put each part on its own line. So the basic intuition is natural and obvious.

---

But I always want to see how these algorithms are tested with real languages, and how they perform. I'm still not sure how easy it is to encode all the language specific rules in these "cost functions", but I may give it a stab (I have more than one formatter to write!).

The paper says it's tested on Racket S-expressions, and that there is a large amount of non-trivial convention around that syntax. But I think that's still a far cry from C++ or Python, in both the amount of syntax and how many users / how much code, etc.

I do wonder about the problem of computing layouts from scratch vs. fixing existing layouts. And the problem of formatter upgrades causing churn. My impression is that go fmt tends to fix things, erring on the side of leaving code alone, and people seem to like that? I guess the fear is with the "global optimization" algorithms, the formatter will make a bunch of choices and it's hard to figure out why.

coffeebeqn · on Feb 20, 2024

I’m not sure what that would even mean. It definitely intends but maybe there are some edge cases where it has no opinion - like a long function signature with too many parameters it leaves alone unless you manually start new lining things

reactordev · on Feb 20, 2024

It finally settled the tabs vs spaces debate by forcing tabs.

norir · on Feb 20, 2024

The only sane option.