Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I meant that lexer is just creating an abstraction from (very often) text, but later you operate on abstractions

like enum Syntax.String, Syntax.IfKeyword

dont ya?



Now go ahead and describe all the next steps you need to do to turn "like enum Syntax.String, Syntax.IfKeyword" into running code


I've been doing it kinda naively (it may be problematic in some cases), even with some very handy methods

but it's been like that:

for

if (expression) {body}

______________

if next elements are Syntax.IfKeyword, Syntax.OpenParenthesis then

{

var expressionElements = TakeUntil(Syntax.ClosedParenthesis)

var expressionResultv = ExpressionBuilder(expressionElements)

if (!expressionResult.Success) error;

...

}

else if ...

else if ...

else error;

this way I'm building AST

and then I have other module that makes AST to LLVM_IR


So, how do you solve things like:

- type resolution

- scopes

- variable capture

- closures

- constants

- module boundaries

- symbol lookup

- ...

?

All of the above and about a magnitude more is required for anything beyond a templating language.

And none of them are resolved by a parser.


On my "first stage" AST I have node e.g "variable declaration"

which has declared type, name, expression

then I have pass which transforms "variable declaration" node to

"bounded/proven/type_checked (whatever you call it) variable declaration"

with the difference that it performs check whether result type of expression equals to declared type

and overrides/swaps that particular node

Before I start messing with e.g classes then I'll have to add some kind of type discovery before being able to prove types, but it'll be just walk thru all nodes and save as much info as I can about e.g new classes


So you're not even anywhere close at having a language. And look how it took you almost a paragraph just to describe a simple type check.

> but it'll be just walk thru all nodes and save as much info

Yup. Literally none of that info comes from the parsing stage. It won't be "just" an AST walk.

   SomeType n = f(x, y, z);
This simple line requires you:

- to see if variables x, y, z are in scope

- that they conform to the types required by the function

- to do that you need to look up the function. Depending on your language it can be any of: available globally, imported from a module, aliased from a module, defined in the file, a variable holding a reference to the function

- side note: even x, y, z can be any of the above, by the way

- then you have to check the return type of the function

- then you have to resolve SomeType. Which can be defined in the same file, in a different module, or globally. Depending on your language, it can be simple type, subtype, union type...

- so now, depending on your language you have to decide whether the type of the result of the function (which can be simple, imported global, subtype, union...) can be matched against the type of the variable

And the same has to be done for every single assignment. And literally none of this has to do anything with parsing.

Edit: all of the above depends on how many features you want a language to have, and how robust you want it to be, of course. But all these things are super important and are significantly more important than the parsing stage at least at the beginning. And definitely more important than the importance accorded to parsing in nearly all materials on compiling.


>- that they conform to the types required by the function

>- then you have to check the return type of the function

Those two things seems to be trivial, they're low hanging fruits, at least for "simple types", don't they?

>- to see if variables x, y, z are in scope

every node that can have "body" has its own scope and that scope has also parent scope which it receives from parent and it repeats

e.g function has body which has scope, and then there's if statement in that body, so "if scope" has body's scope as parent, thus receives all declared variable in function body

>- to do that you need to look up the function. Depending on your language it can be any of: available globally, imported from a module, aliased from a module, defined in the file, a variable holding a reference to the function

Nothing seems strange, I assume that whole source code is avaliable at compilation time.

>- so now, depending on your language you have to decide whether the type of the result of the function (which can be simple, imported global, subtype, union...) can be matched against the type of the variable

Definitely things will get more complicated once I stop doing stuff with just ints and bools and try to create complex type system, but for now stuff seems to be easy


> but for now stuff seems to be easy

I'm not saying it's hard. You've forgotten what the conversation started with.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: