Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> It's genius and I wish all other languages gave you the ability to optionally choose between braces or whitespace. It could settle the whole Python vs C syntax war once and for all!

I really like Haskell as a language, but the fact it isn't based on s-expression is the biggest issue I have with it. Whitespace, braces, etc. are very minor quibbles: they're basically debates about language as a UI. The problem is, that same language is also the API we must adhere to when manipulating code programatically, e.g. writing interpreters/compilers/documentation generators/code formatters/linters/renderers to HTML or LaTex or whatever/static analysers/verifiers/model checkers/IDEs/refactoring tools/etc.

I think it's telling that a language so heavily focused on language implementation (DSLs, etc.) effectively has a single usable implementation (GHC), and a mountain of dead/niche implementations ( https://wiki.haskell.org/Implementations ).

Imagine we're given the path to a file containing Haskell code, and we want to transform it in some way (e.g. replacing calls to one function with another function). The most basic thing we need to do is parse it, but my work on code analysis tools over the past few years has taught me that we can't even manage that reliably.

We might naively reach for the GHC API, but that requires that we set a whole bunch of configuration options that we may not know (language extensions, package databases, commandline flags, etc. collectively referred to as "dynflags"); if we get those wrong, our program crashes saying `the impossible happened!`. We can't, in general, figure out what these should be; the only real solution is to invoke our program via Cabal, and read in the needed values by emulating the commandline flags of GHC. This solution is useless if we have a string of code, without any associated Cabal project.

We might instead opt for a standalone library, like haskell-src-exts. The problem is, those libraries typically can't parse Haskell code found "in the wild". Language extensions are one problem, but another major blocker is widespread use of the C preprocessor (an unhygenic, unsafe macro system based on string substition; which would be largely unnecessary if Haskell used an easily manipulated format like s-expressions instead).

Note that even GHC can't re-use its own ASTs: the Template Haskell extension provides its own AST representation, entirely separate to that used by the compiler's frontend!

Whilst there are layers on top of Haskell like "liskell", which accept s-expressions and produce Haskell code, they're solving the opposite problem: it's easy to convert from s-expressions, since they're so trivial to manipulate. The hard problem is being able to do anything useful with the mountain of existing code which isn't in a nice s-expression format, other than compile it with (some versions of) GHC, if invoked with the right options from (some version of) Cabal; or maybe Stack; maybe after running Hpack; or who knows what else!

I wrote up some of this at http://chriswarbo.net/blog/2017-01-31-syntax_trees.html



> Note that even GHC can't re-use its own ASTs: the Template Haskell extension provides its own AST representation, entirely separate to that used by the compiler's frontend!

Isn't that about the expression problem, though? How would changing the syntax help?


s-expressions are already (a trivially deserialised encoding of) an AST representation. If the Haskell language used s-expressions, or if the "frontend" language was designed to desugar into s-expressions via a standalone pre-processor, then this representation would be available to any code that wanted to use it. Different programs, or different parts of the same program, might decide to convert it to their own specialised datatypes (like those of GHC's frontend and Template Haskell), but those would essentially be details specific to that program (either as an implementation detail or an API). It would have no real influence on how others choose to process the language.

If we think of the current GHC implementation in this way, we find that the only de facto representations of Haskell code is `ByteString` (or `Lazy.ByteString` or `String` or `Text` or `Lazy.Text`, of course ;) ). Since these are generally unparseable (e.g. via haskell-src-exts and friends), it's hard to do much with them (hence the tendency to delve into GHC's own representations).

Note that for my own work I ended up abandoning Haskell syntax altogether in favour of GHC Core. I use Cabal and GHC to do the parsing (since that seems to be the only way to handle real world code), and a GHC plugin to dump out Core as s-expressions http://chriswarbo.net/git/ast-plugin




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: