> Lisp ... macros typically operate on AST In Lisp they don't. See Emacs Lisp, C...

agumonkey · on May 2, 2024

From my limited understanding, it seems like lispers means AST in an ad-hoc way. There's no statically predefined solid structure describing what an if-node or a lambda-node is. We all agree to squint and look at (<idea> <parameters-potentially-recursive>) as an ast in spirit.

Scheme did implement actual syntax objects and other lisps may have a concrete ast (pun slightly intended) layer but it's not required to enjoy the benefits of sexp as code and data.

my two cents

_19qg · on May 2, 2024

An "AST in spirit" is in our mind, but not the machine.

    (let ((s (s s)))
     (flet ((s (s) (declare (type s s)) s))
      (tagbody (s) s (go s))))

The function READ has no idea what S is: variable, operator name, data symbol, type, go tag, function name, macro name, ...? For each above we don't know what s is. We would need to parse it according to some syntax (and possibly know the current state of the runtime.

An AST would be the product of a parser and the parse would parse the code above according to some provided syntax. The AST then would encode a tree built from some syntax, where the nodes would be classified. In Lisp symbols and lists have multiple uses and the s-expression does not encode which uses it is: is it data, is it some kind of syntactical element. There is also no defined parser and no syntax it encodes on that level. READ is at best an s-expression reader, where it has zero knowledge what the lists and symbols supposed to mean in Lisp. For the reader (+ a b), (a + b), (a b +) are just s-expressions, not Lisp code.

agumonkey · on May 2, 2024

I mostly agree, but it seems (I can only speak as a spectator/reader) that this lack of information was not a deal breaker for the community or even producing large systems. The type resolution will happen at use-site, and if `s` is not existent where it should be (a binding in the environment, a tag in the right namespace) or not what it should be, your code fails and people will adjust.

Where there serious defects caused due to this dynamic nature (honest question) ? it seems to me that people adjusted to this without big troubles. Not that I'm against any improvement.

_19qg · on May 2, 2024

The Lisp compiler will need to figure it out and complain about various problems: undefined function/variable/tag/..., argument list mismatch, type mismatch, etc.

The compiler will expand the macros at compile time and the generated source code then is checked.

One thing this macro system enables are macros which can implement relatively arbitrary syntactical extensions, not restricted by a particular syntax. That can be seen as useful&flexibility or as potential problem (-> needs knowledge and discipline while implementing macros, with the goal of ensuring the maintainability of the code).

p_l · on May 2, 2024

More complex structures are built from that input by macros and compiler machinery. Nothing says compiler has to stick to it.

Nanopass shows an example of using "rewriting" while keeping to pretty much same structure for AST till you end up with native code.

ndr · on May 2, 2024

How is a list (of nested lists) not a tree?

_19qg · on May 2, 2024

It could theoretically be a graph.

For example I can make a circular list being a part of the source and the macro may process it.

Get the first two items from a list and add them:

    CL-USER 21 > (defmacro add-2 (a)
                   (list '+ (first a) (second a)))

ADD-2

Now we construct a circular list and use the macro:

    CL-USER 22 > (add-2 #1=(1 2 3 . #1#))
    3

The circular list is really a part of the source code:

    CL-USER 23 > '(add-2 #1=(1 2 3 . #1#))
    (ADD-2 #1=(1 2 3 . #1#))

Another example: One could pass in a string and have the macro parse the string...

aidenn0 · on May 2, 2024

It usually is a tree, just not a syntax tree. For example, the "if" special form in CL has the syntax:

  (IF <text-expr> <then-expr> [<else-expr>])

But appearing in a macro expansion, it would just be a list where the first item was the symbol "IF" from the "COMMON-LISP" package. In addition, you could put e.g.

  (IF foo bar baz biff quux)

Inside the body of a macro you would get a list that does not match the syntax of the "if" special form, despite superficially looking like such a list. This doesn't match anything that one would call an "AST" in other languages, which would enforce syntactical correctness of special forms.

JonChesterfield · on May 6, 2024

You are correct and the precision is appreciated. However your emphasis is somewhat misleading.

In the lisp world, when data represents code, it's probably stored in a tree. Elsewhere, when data represents code, it's probably stored as an array of bytes. There code may not be recognised as being data at all.

It's the difference between eval taking a byte string and taking a parse tree. Rather literally.

This is only a convention. Lisp does it right, but as almost everything else thinks code is strings, confusion abounds.

kazinator · on May 6, 2024

There is a "data AST" in Lisp. The data AST is source code to Lisp evaluation/compilation. Compilation may produce another AST, but typically this is something internal, and implementation-specific.

The data AST is rich enough to for ergonomic, precise source-to-source manipulation. Even a pretty advanced Lisp compiler can be built which goes straight from the data AST to an intermediate representation, skipping the code AST stage.

"Data AST" means that when we have (+ 1 2), this doesn't say "I'm an arithmetic expression", but rather something weaker: "I'm a list of three elements: a symbol object, and two integer objects".

The list object is an abstract syntax tree for the printed list. It must be. It not a parse tree because a parse tree would preserve the representation of the parentheses: in a parse tree, every grammar symbol appears: the nodes of the tree are 1:1 to the grammar rules that they match. Since the parentheses, and whatnot, are gone, it must be abstract syntax.

Most of Lisp syntax is designed such that its syntactic units correspond to nodes of the data AST. That makes source-to-source transformations ergonomic, because the data AST data structure is easy to manipulate.