Hacker News new | past | comments | ask | show | jobs | submit login
How JavaScript compilers work (creativejs.com)
95 points by wx196 on June 8, 2013 | hide | past | favorite | 26 comments



The terminology in the article is somewhat idiomatic: "After the Directed Flow Graph (DFG) or syntax tree has been generated the compiler can use this knowledge to perform further optimisations prior to the generation of machine code. Mozilla’s IonMonkey and Google’s Crankshaft are examples of these DFG compiler."

"DFG" usually refers to some sort of dependence graph. What IonMonkey and Crankshaft have is a traditional CFG (control flow graph) in SSA form. That means the code is represented as a set of straight line code paths (basic blocks) connected edges in a graph representing possible control-flow transitions. The instructions are in SSA form, which means that each variable is assigned only once and pseudo-instructions called "phi instructions" are used to merge results when a basic block is accessible from multiple predecessors blocks, both of which might assign the same variable. A "syntax tree" is different yet. It's a higher level representation that preserves the syntactic structure of the code. For example, it represents loops and "if" statements as nested elements in the tree. Translation to a CFG throws away most of that information (e.g. reducing loops to simply control flow edges in the CFG).

For a (relatively) approachable set of materials on the subject, see: http://www.cs.rice.edu/~keith/512/2011/Lectures. I also highly recommend Cooper & Torczon's "Engineering a Compiler" (2d Ed.) In a world of CS researchers that can't write an English sentence to save their lives, Keith Cooper and Linda Torczon's work is a paragon of clear prose. (Their lab at Rice is where Cliff Click got his PhD, and if you've read his work on the JVM, you know that he too has a talent for explaining complicated concepts in plain English.)


"The terminology in the article is somewhat idiomatic"

Do you mean idiosyncratic? Not trying to nitpick. I was genuinely confused until I concluded that it must be a typo.


Sorry, yes, idiosyncratic. "DFG" is the term used in the webkit js engine.


I am in awe of your ability to stay informed on multiple fields - economics to compilers to lisp.


None of which are even his day job!


He is truly HN's shiniest jewel. I especially like he doesn't give a damn about karma points and sticks to his guns argues in an intellectually honest manner no matter what.


'rayiner, 'gruseom, and 'carbocation are basically my entry points into HN every day. (Rayiner is a lawyer by day; 'carbocation a med student/researcher).



Looks like the server is under fire, and Google didn't cache any of those links, either. Tears of desire.


Source for a JavaScript compiler implemented in the D programming language:

https://github.com/DigitalMars/DMDScript


Nice. Interested readers may also want to check out Higgs: https://github.com/maximecb/Higgs


So I have question I'm sure someone here is smart enough to answer- simply put, does ASI affect compilation speed?

In my head I would imagine that a compiler would expect to find semicolons as statement terminators but would have to reprocess a block if validation failed and try to figure out where the semicolons should be, and therefore ASI would slow down compilation (vs a file with all semicolons in place).


What implementations actually do in code is almost always very different from what the specification wording suggests. For example, in V8 the handling of a line terminator used as a semicolon is practically the same had you used a semicolon as you can see here:

https://code.google.com/p/v8/source/browse/trunk/src/prepars...

Furthermore, parsing source text initially into a representation that can be worked with is like 0.000001% of the total work to be done. And it is cached too.

Another example is in PHP where you can open and close <?php ?> tags as many times as you like and it will perform exactly the same compared to equal code with single open and close tag. Especially if you use PHP opcode caching where the source text is not even used for subsequent requests.


Obviously it takes some extra cycles to perform ASI. Now, having said that I doubt any additional passes on the source are done to perform ASI. I suspect that if it is of any significance, they have folded it into another pass. And the code likely runs even if not all the semicolons are already in the right place, so it'd be a sunk cost in that regard.

Having said that, I'm, sure it's such a small amount of the compilation time to be pretty insignificant.

This is all however based on my professional experience writing parsers, and not on any specific knowledge of the workings of Javascript however.


Why would you expect that to be true in JavaScript any more than other semicolon-optional languages like Ruby and Python?

You just write your parser to accept both line endings (that meet certain conditions) and semicolons as statement terminators. Going through and literally inserting semicolons shouldn't be necessary.


Python is somewhat different, in that a newline is simply a statement separator. Semicolons aren't inserted, as far as I know, they're simply an alternate statement separator.

Python does make some exceptions, in that newlines within balanced paratheses and brackets are not treated as statement separators. But outside of that, this is what they mean.

As an example of the difference, consider the following JS:

    if
    (0)
    statement()
This is legal JS. Now consider the roughly equivalent Python:

    if
    0:
    statement()
This is not legal python, because the newline after the "if" terminated the statement, and a bare "if" is not a legal statement.

Another example. This is legal JS:

    obj
    .method()
As it is parsed as a single statement in JS. However, it is not legal Python, as the newline after "obj" definitively ends the statement, and ".method()" by itself is not a legal statement.

In short, Python can always say "this newline is the end of a statement", except for some very easy-to-detect cases. JavaScript, on the other hand, parses until it encounters an error, then goes back to insert the semicolon and tries again. This implies some additional overhead. I don't imagine it's a significant amount at all, but there is at least a good reason to think that this might apply to JS but not a language like Python.


My understanding (though I'm not an expert) is that the way the JavaScript spec is written, you first try to parse a line as is, and if you encounter a parsing error in trying to do that and the line doesn't have a semicolon, then you insert a semicolon and you try again. The end result is that it's a bit different then semicolons being optional; they are instead inserted automatically to try to recover from parse errors.


FWIW, this is also the way O'Reilly's JS Pocket Reference explains it.


Thanks, that's what I had read.

I also found this set of tests http://jsperf.com/asi-versus-msi

My results on Opera Next (v15) http://imgur.com/hkAopfX shows MSI fastest FWIW.


I don't know how this is done in JavaScript, but in Go, semicolons are emitted by the lexer, so they really are literally inserted.


No.


You mean "no, no one here is smart enough"? If it's "no" answer to semicolon insertion affecting compilation speed your post could use some explaining.


> does ASI affect compilation speed?

No.


A very nice text with the nice code examples is:

http://mrale.ph/blog/2012/06/03/explaining-js-vms-in-js-inli...


If anyone wants to delve more into the workings of a JS VM, I've found the code of Higgs (https://github.com/maximecb/Higgs) to be very straightforward and readable. The interpreter and JIT are written in D and much of the run-time is written in JS. It's a successor to the Tachyon project mentioned in that post.


How is the JavaScript JIT different from the Java JIT?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: