Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Lisp ... macros typically operate on AST

In Lisp they don't. See Emacs Lisp, Common Lisp, ISLISP. A Lisp macro gets only passed some data and returns some data. There is nothing like an AST.

If we define a macro foo-macro and we call it:

    (foo-macro ...)
then ... can be any data.

Example:

    (defmacro rev (&rest items)
      (reverse items))
Above macro REV gets data passed and reverses it. The &rest list of ITEMS gets reversed. ITEMS is the list of source arguments in the macro call.

We can then write:

    (rev 1 2 3 4 +)

    (rev (rev 10 n -) (+ a 20 b) (rev 30 a *) list)
Example:

    CL-USER 7 > (let ((n 4) (a 3) (b 2))
                  (rev (rev 10 n -)
                       (+ a 20 b)
                       (rev 30 a *)
                       list))
    (90 25 -6)
There is no syntax tree. All the macro does, is to reverse the list of its source arguments.

If I let the macro describe the thing it gets passed, then we get this:

    CL-USER 8 > (defmacro rev (&rest items)
                  (DESCRIBE items)
                  (reverse items))
    REV

    CL-USER 9 > (rev 1 2 3 4 +)

    (1 2 3 4 +) is a LIST
    0      1
    1      2
    2      3
    3      4
    4      +
    10
As a side effect, we can see that the macro gets a simple list passed. A list of numbers and a symbol ("symbol" is also a data type). Not text. Not an AST.

Also data which is possibly not read, but computed by other code. We can compute the arguments of the macro REV and pass the thing to EVAL. Works the same as above.

    CL-USER 10 > (eval (append '(rev) (loop for i from 1 to 4 collect i) '(+)))

    (1 2 3 4 +) is a LIST
    0      1
    1      2
    2      3
    3      4
    4      +
    10

The macro only gets that data and can do anything with it. There is no idea of Abstract Syntax Tree: it does not need to be a tree, it does not need to be valid Lisp code and it carries no syntactical information. Generally, it also does not need to return valid Lisp code. To be computed all we need is that the evaluator eventually sees valid Lisp code, not the individual macro.

In Lisp the "reader" by default only parses a data layer: symbolic expressions. EVAL, macros and other Lisp functionality gets mostly data passed. EVAL has to figure out, what (a b c) actually is as a program. It can traverse it in an interpreter or compile it. A compiler may internally create AST representations -> but that free to the implementation.

The Lisp language then typically is not defined over text syntax, but data syntax.

A Lisp interpreter processes during execution not text, but s-expressions. It's a "List Processor", not a Text Processor. Lisp not Texp. ;-) The function COMPILE gets an s-expression passed, not text.

Racket and Scheme have other macro systems.



From my limited understanding, it seems like lispers means AST in an ad-hoc way. There's no statically predefined solid structure describing what an if-node or a lambda-node is. We all agree to squint and look at (<idea> <parameters-potentially-recursive>) as an ast in spirit.

Scheme did implement actual syntax objects and other lisps may have a concrete ast (pun slightly intended) layer but it's not required to enjoy the benefits of sexp as code and data.

my two cents


An "AST in spirit" is in our mind, but not the machine.

    (let ((s (s s)))
     (flet ((s (s) (declare (type s s)) s))
      (tagbody (s) s (go s))))
The function READ has no idea what S is: variable, operator name, data symbol, type, go tag, function name, macro name, ...? For each above we don't know what s is. We would need to parse it according to some syntax (and possibly know the current state of the runtime.

An AST would be the product of a parser and the parse would parse the code above according to some provided syntax. The AST then would encode a tree built from some syntax, where the nodes would be classified. In Lisp symbols and lists have multiple uses and the s-expression does not encode which uses it is: is it data, is it some kind of syntactical element. There is also no defined parser and no syntax it encodes on that level. READ is at best an s-expression reader, where it has zero knowledge what the lists and symbols supposed to mean in Lisp. For the reader (+ a b), (a + b), (a b +) are just s-expressions, not Lisp code.


I mostly agree, but it seems (I can only speak as a spectator/reader) that this lack of information was not a deal breaker for the community or even producing large systems. The type resolution will happen at use-site, and if `s` is not existent where it should be (a binding in the environment, a tag in the right namespace) or not what it should be, your code fails and people will adjust.

Where there serious defects caused due to this dynamic nature (honest question) ? it seems to me that people adjusted to this without big troubles. Not that I'm against any improvement.


The Lisp compiler will need to figure it out and complain about various problems: undefined function/variable/tag/..., argument list mismatch, type mismatch, etc.

The compiler will expand the macros at compile time and the generated source code then is checked.

One thing this macro system enables are macros which can implement relatively arbitrary syntactical extensions, not restricted by a particular syntax. That can be seen as useful&flexibility or as potential problem (-> needs knowledge and discipline while implementing macros, with the goal of ensuring the maintainability of the code).


More complex structures are built from that input by macros and compiler machinery. Nothing says compiler has to stick to it.

Nanopass shows an example of using "rewriting" while keeping to pretty much same structure for AST till you end up with native code.


How is a list (of nested lists) not a tree?


It could theoretically be a graph.

For example I can make a circular list being a part of the source and the macro may process it.

Get the first two items from a list and add them:

    CL-USER 21 > (defmacro add-2 (a)
                   (list '+ (first a) (second a)))
ADD-2

Now we construct a circular list and use the macro:

    CL-USER 22 > (add-2 #1=(1 2 3 . #1#))
    3
The circular list is really a part of the source code:

    CL-USER 23 > '(add-2 #1=(1 2 3 . #1#))
    (ADD-2 #1=(1 2 3 . #1#))
Another example: One could pass in a string and have the macro parse the string...


It usually is a tree, just not a syntax tree. For example, the "if" special form in CL has the syntax:

  (IF <text-expr> <then-expr> [<else-expr>])
But appearing in a macro expansion, it would just be a list where the first item was the symbol "IF" from the "COMMON-LISP" package. In addition, you could put e.g.

  (IF foo bar baz biff quux)
Inside the body of a macro you would get a list that does not match the syntax of the "if" special form, despite superficially looking like such a list. This doesn't match anything that one would call an "AST" in other languages, which would enforce syntactical correctness of special forms.


You are correct and the precision is appreciated. However your emphasis is somewhat misleading.

In the lisp world, when data represents code, it's probably stored in a tree. Elsewhere, when data represents code, it's probably stored as an array of bytes. There code may not be recognised as being data at all.

It's the difference between eval taking a byte string and taking a parse tree. Rather literally.

This is only a convention. Lisp does it right, but as almost everything else thinks code is strings, confusion abounds.


There is a "data AST" in Lisp. The data AST is source code to Lisp evaluation/compilation. Compilation may produce another AST, but typically this is something internal, and implementation-specific.

The data AST is rich enough to for ergonomic, precise source-to-source manipulation. Even a pretty advanced Lisp compiler can be built which goes straight from the data AST to an intermediate representation, skipping the code AST stage.

"Data AST" means that when we have (+ 1 2), this doesn't say "I'm an arithmetic expression", but rather something weaker: "I'm a list of three elements: a symbol object, and two integer objects".

The list object is an abstract syntax tree for the printed list. It must be. It not a parse tree because a parse tree would preserve the representation of the parentheses: in a parse tree, every grammar symbol appears: the nodes of the tree are 1:1 to the grammar rules that they match. Since the parentheses, and whatnot, are gone, it must be abstract syntax.

Most of Lisp syntax is designed such that its syntactic units correspond to nodes of the data AST. That makes source-to-source transformations ergonomic, because the data AST data structure is easy to manipulate.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: