The terminology in the article is somewhat idiomatic: "After the Directed Flow Graph (DFG) or syntax tree has been generated the compiler can use this knowledge to perform further optimisations prior to the generation of machine code. Mozilla’s IonMonkey and Google’s Crankshaft are examples of these DFG compiler."
"DFG" usually refers to some sort of dependence graph. What IonMonkey and Crankshaft have is a traditional CFG (control flow graph) in SSA form. That means the code is represented as a set of straight line code paths (basic blocks) connected edges in a graph representing possible control-flow transitions. The instructions are in SSA form, which means that each variable is assigned only once and pseudo-instructions called "phi instructions" are used to merge results when a basic block is accessible from multiple predecessors blocks, both of which might assign the same variable. A "syntax tree" is different yet. It's a higher level representation that preserves the syntactic structure of the code. For example, it represents loops and "if" statements as nested elements in the tree. Translation to a CFG throws away most of that information (e.g. reducing loops to simply control flow edges in the CFG).
For a (relatively) approachable set of materials on the subject, see: http://www.cs.rice.edu/~keith/512/2011/Lectures. I also highly recommend Cooper & Torczon's "Engineering a Compiler" (2d Ed.) In a world of CS researchers that can't write an English sentence to save their lives, Keith Cooper and Linda Torczon's work is a paragon of clear prose. (Their lab at Rice is where Cliff Click got his PhD, and if you've read his work on the JVM, you know that he too has a talent for explaining complicated concepts in plain English.)
He is truly HN's shiniest jewel. I especially like he doesn't give a damn about karma points and sticks to his guns argues in an intellectually honest manner no matter what.
'rayiner, 'gruseom, and 'carbocation are basically my entry points into HN every day. (Rayiner is a lawyer by day; 'carbocation a med student/researcher).
So I have question I'm sure someone here is smart enough to answer- simply put, does ASI affect compilation speed?
In my head I would imagine that a compiler would expect to find semicolons as statement terminators but would have to reprocess a block if validation failed and try to figure out where the semicolons should be, and therefore ASI would slow down compilation (vs a file with all semicolons in place).
What implementations actually do in code is almost always very different from what the specification wording suggests. For example, in V8 the handling
of a line terminator used as a semicolon is practically the same had you used a semicolon as you can see here:
Furthermore, parsing source text initially into a representation that can be worked with is like 0.000001% of the total work to be done. And it is cached too.
Another example is in PHP where you can open and close <?php ?> tags as many times as you like and it will perform exactly the same compared to equal code
with single open and close tag. Especially if you use PHP opcode caching where the source text is not even used for subsequent requests.
Obviously it takes some extra cycles to perform ASI. Now, having said that I doubt any additional passes on the source are done to perform ASI. I suspect that if it is of any significance, they have folded it into another pass. And the code likely runs even if not all the semicolons are already in the right place, so it'd be a sunk cost in that regard.
Having said that, I'm, sure it's such a small amount of the compilation time to be pretty insignificant.
This is all however based on my professional experience writing parsers, and not on any specific knowledge of the workings of Javascript however.
Why would you expect that to be true in JavaScript any more than other semicolon-optional languages like Ruby and Python?
You just write your parser to accept both line endings (that meet certain conditions) and semicolons as statement terminators. Going through and literally inserting semicolons shouldn't be necessary.
Python is somewhat different, in that a newline is simply a statement separator. Semicolons aren't inserted, as far as I know, they're simply an alternate statement separator.
Python does make some exceptions, in that newlines within balanced paratheses and brackets are not treated as statement separators. But outside of that, this is what they mean.
As an example of the difference, consider the following JS:
if
(0)
statement()
This is legal JS. Now consider the roughly equivalent Python:
if
0:
statement()
This is not legal python, because the newline after the "if" terminated the statement, and a bare "if" is not a legal statement.
Another example. This is legal JS:
obj
.method()
As it is parsed as a single statement in JS. However, it is not legal Python, as the newline after "obj" definitively ends the statement, and ".method()" by itself is not a legal statement.
In short, Python can always say "this newline is the end of a statement", except for some very easy-to-detect cases. JavaScript, on the other hand, parses until it encounters an error, then goes back to insert the semicolon and tries again. This implies some additional overhead. I don't imagine it's a significant amount at all, but there is at least a good reason to think that this might apply to JS but not a language like Python.
My understanding (though I'm not an expert) is that the way the JavaScript spec is written, you first try to parse a line as is, and if you encounter a parsing error in trying to do that and the line doesn't have a semicolon, then you insert a semicolon and you try again. The end result is that it's a bit different then semicolons being optional; they are instead inserted automatically to try to recover from parse errors.
You mean "no, no one here is smart enough"? If it's "no" answer to semicolon insertion affecting compilation speed your post could use some explaining.
If anyone wants to delve more into the workings of a JS VM, I've found the code of Higgs (https://github.com/maximecb/Higgs) to be very straightforward and readable. The interpreter and JIT are written in D and much of the run-time is written in JS. It's a successor to the Tachyon project mentioned in that post.
"DFG" usually refers to some sort of dependence graph. What IonMonkey and Crankshaft have is a traditional CFG (control flow graph) in SSA form. That means the code is represented as a set of straight line code paths (basic blocks) connected edges in a graph representing possible control-flow transitions. The instructions are in SSA form, which means that each variable is assigned only once and pseudo-instructions called "phi instructions" are used to merge results when a basic block is accessible from multiple predecessors blocks, both of which might assign the same variable. A "syntax tree" is different yet. It's a higher level representation that preserves the syntactic structure of the code. For example, it represents loops and "if" statements as nested elements in the tree. Translation to a CFG throws away most of that information (e.g. reducing loops to simply control flow edges in the CFG).
For a (relatively) approachable set of materials on the subject, see: http://www.cs.rice.edu/~keith/512/2011/Lectures. I also highly recommend Cooper & Torczon's "Engineering a Compiler" (2d Ed.) In a world of CS researchers that can't write an English sentence to save their lives, Keith Cooper and Linda Torczon's work is a paragon of clear prose. (Their lab at Rice is where Cliff Click got his PhD, and if you've read his work on the JVM, you know that he too has a talent for explaining complicated concepts in plain English.)