Hacker News new | past | comments | ask | show | jobs | submit login

No, = or == is purely a lexer decision easily decided by greedy _character_ matching and easily resolved.

In JavaScript the following give wildly differently shaped ASTs since the / character initiates RegEx parsing when in a _value position_ that has different lexing than the division operator that _only_ appears as a potential binary operator, consider the following:

a = b + /c.y/+d

a = b /c.y/+d

(Given b=1 , c={y:2} and d=3 )

The first assigns a as add b to the RegExp matching c.y added to +d giving us the string "1/c.y/3" (JS converts most types to string on addition and strings are dominant during addition as concatenations), this is correct since the + operator before the / character forces the parser to look for a value.

The second reads as assign a to b divided by property y of c then divided by d (ie the numeric computation (1/2)/3 = 0.166666.... ), this is because the parser is looking for binary operators after the b identifier and when the / character appears it becomes an operator.

So, without knowing the parsing context (operator or value position) the lexer decision is ambigious.

This is somewhat how A<B<C>> is ambigious when parsing C++/Java/C# templates/generics VS the <, > and >> operators but that case is usually easier since the parser could include a hack in the generic parsing code that mutates the token stream if it encounters >> when closing a generic. (The JS ambiguity is worse though since RegEx lexing rules are totally different from regular JS)




True; but the division vs regex example that I replied to was also 'easily decided bg greedy character matching and easily resolved.

I don't know how JS (implementations) actually does it, but this is what I meant by the line being a bit arbitrary, in my limited experience - you can just lex 'forward slash token' or whatever and keep your lexer/parser separation even with this ambiguity.


Consider this text: /a+ ”/ How would you tokenize this if you don’t know the grammar context? If it is a regex, the space is significant - it is part of the pattern. If it is a divsion, the quote will start a string.


With horrible hacks in the reference - search for InputElementDiv and InputElementRegExp if you want to read more about it.

It's not so bad in practice, since you only need to look at the previous token to decide whether a / should be parsed as a division or regular expression, once you've excluded the possibility for // and /* comments.

Comments are also why the empty regular expression in JavaScript must be written as /(?:)/


How about this:

    ( x ) / b / a
It could be part of:

    var x = (x) / b / a;
Or

    if (x) /b/a.test(z)
X could be an arbitrary complex expression. Looking at one preceding token is not enough to disambiguate the slash.

I suspect the only real solution is to intertwine lexing and parsing, so the parser asks for the next token with a flag indicating context. But this constrains what parsing algorithm can be used.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: