Understanding V8’s Bytecode

tyingq · on Sept 17, 2017

Looks like Wikipedia's V8 page needs an update.

https://en.m.wikipedia.org/wiki/Chrome_V8

"V8 compiles JavaScript directly to native machine code before executing it, instead of more traditional techniques such as interpreting bytecode"

amelius · on Sept 17, 2017

V8's central documentation could use some work too. Why do we need to read about internals in a Medium post first?

hyperpape · on Sept 17, 2017

Which languages have thorough first-party documentation of how the runtime works in an easily accessible form that's available to the public? I don't think Python or Ruby do. Lua maybe, but that's "cheating" because the runtime is small enough to comfortably read through.

It seems to me like Hotspot has more resources and better documentation of its architecture than most, but when it comes to things like inlining policies, the specific compiler optimizations that happen, safepoints, etc. it's mostly a matter of tracking down random blog-posts, most of which come from third parties.

Edit: Python is better than I remember.

fernly · on Sept 17, 2017

Python just revels in introspection. The standard module dis[1] lets the programmer disassemble and display bytecode, and includes a full list of the opcodes (which appear to be much simpler than the V8 ones described). The standard module ast[2] documents the AST used by the compiler.

[1] https://docs.python.org/3.5/library/dis.html#module-dis [2] https://docs.python.org/3.5/library/ast.html#module-ast

stcredzero · on Sept 18, 2017

Which languages have thorough first-party documentation of how the runtime works in an easily accessible form that's available to the public?

Smalltalk, though you might have to go to a library and check out a dead tree book.

amelius · on Sept 17, 2017

Check the documentation of Mozilla's Spidermonkey in comparison:

https://developer.mozilla.org/en-US/docs/Mozilla/Projects/Sp...

johncolanduoni · on Sept 17, 2017

Wow! The last time I looked at the SpiderMonkey API documentation (a few months ago) it was like reading the rings on a tree, so much so that I gave up and went with V8 even though there were actively developed SpiderMonkey bindings for the language I was using (Rust). Nice to see things have improved greatly!

colejohnson66 · on Sept 18, 2017

.NET bytecode (CLI) is standardized in ECMA-335[0]

[0]: https://www.ecma-international.org/publications/standards/Ec...

DiThi · on Sept 17, 2017

Yep, Ignition and Turbofan have completely replaced Full-codegen and Crankshaft, the old V8 JIT compilers, which grew excessively big with many edge cases.

CalChris · on Sept 17, 2017

I don't really like this article.

Bytecode is an abstraction of machine code.

No, it isn't an abstraction if a few paragraphs later the author gives SuspendGenerator as an example bytecode. Abstraction is the wrong word here. A bytecode is a machine instruction for a virtual machine. In this case, the V8 bytecode is a higher level ISA than is for example, ARMv8; that would be more accurate. System 38 MI is a higher level ISA as well.

Ignition, the interpreter, generates bytecode from this syntax tree.

No, interpreters don't generate bytecode. They interpret bytecode; that's why they're called interpreters. Code generators generate code; that's why they're called code generators. However that said, this must be a strange description internal to the V8 group since it shows up here as well:

https://v8project.blogspot.com/2016/08/firing-up-ignition-in...

You can generate code, bytecode, directly from an AST. You can do that. That is probably what is happening and what is meant here. On the other hand, LLVM converts an AST into an SSA IR and optimizes then lowers that. V8 is a JIT pipeline and LLVM is a traditional compiler pipeline.

dahart · on Sept 17, 2017

> No, interpreters don't generate bytecode. They interpret bytecode; that's why they're called interpreters. Code generators generate code;

FWIW, personally I prefer the article's idea of interpreters over yours, and I speculate the article's is more representative of what most people think. I've never heard interpreters described the way you just did, as the part that interprets bytecode. (But I did find discussion of "bytecode interpreters" in the WP article, link below).

Normally, an interpreter is a complete alternative to a compiler. Usually, an interpreter is more or less synonymous with a programming language "shell", or even the whole programming language. The Python shell would be an interpreter, for example.

In my mind, virtual machines are what you call the thing that executes bytecode, not interpreters. I'd assume that someone talking about an "interpreter" without any other qualification was talking about a program that reads & runs a another program written in a scripting language.

As time passes, these lines are getting blurrier. V8 is liberally mixing concepts from compilers and interpreters, so it's dangerous to draw hard lines. But you might be interested to double-check the broad strokes.

https://en.m.wikipedia.org/wiki/Interpreter_(computing)

> Abstraction is the wrong word here.

The bytecode binary runs without modification on multiple hardware platforms. It's pretty reasonable to call that an abstraction of machine code. Because it is.

CalChris · on Sept 17, 2017

> Normally, an interpreter is a complete alternative to a compiler.

Yes, that would be the Bill Gates MS Basic or Forth sense of interpreter. But it wouldn't be the post parsing, here's an AST, now what do we do with it sense. If they called V8 an interpreter (they call it an engine) I could see that although it would still seem archaic usage. But they're stitching Ignition on the side of V8 and calling that an interpreter which is just strange.

> The bytecode binary runs without modification on multiple hardware platforms. It's pretty reasonable to call that an abstraction of machine code.

No, that's portability onto multiple platforms. But the article said Bytecode is an abstraction of machine code. 'Abstraction of' vs 'runs on', these are two very different concepts.

dahart · on Sept 17, 2017

> Yes, that would be the Bill Gates MS Basic or Forth sense of interpreter. But it wouldn't be the post parsing, here's an AST, now what do we do with it sense.

It's also the JavaScript, Python, Haskell, Perl, Ruby, bash, etc. etc. etc. sense of the word interpreter. You can use it to describe all languages that aren't compiled. You said interpreters "interpret bytecode". You claimed that a bytecode interpreter was the broadest definition of interpreter, and that saying the interpreter is what reads source JavaScript and turns it into bytecode was wrong. I disagree, and Wikipedia does too. Many, perhaps even most interpreters don't even involve bytecode at all. You're entitled to your opinion, but I guess just expect some pushback if you're going to try to correct people using your narrow, uncommon idea of what makes an interpreter.

> No, that's portability onto multiple platforms.

Normally when I'm talking about code with other people, "portable" means code you can re-compile for a platform, not that the binary runs without modification. But, sure, I'd agree that it's reasonable to call bytecode a mechanism for portability. The way it does that is by abstracting away a specific platform's assembly language in favor of one that runs on multiple platforms. People call programming languages like C & Python an abstraction of machine language. JavaScript's bytecode is just that - it's a low level programming language. And as such, it's abstracting the hardware. Virtual machines are an attempt to abstract the CPU specifics and have code that runs anywhere.

> 'Abstraction of' vs 'runs on', these are very different concepts.

I don't know what your definition of abstraction is, but based on your objections, it feels like you have a narrow and rigid idea that perhaps doesn't match the common usage.

A function that takes a parameter is an abstraction of a block of code, just like any programming language that runs on multiple platforms is an abstraction of a specific CPU or machine language.

CalChris · on Sept 18, 2017

SICP uses a John Locke quote to define abstraction:

  The third is separating them from all other ideas
  that accompany them in their real existence:
  this is called abstraction

Oxford says: Freedom from representational qualities in art. You can't say that and then at the same time say, oh and it fits in a byte. You can't say that a function abstracts some code.

You could say that an interface abstracts a module. Or as Principles of Computer System Design says:

  The separation of the interface specification of a module
  from its internal implementation so that one can understand
  and make use of that module with no need to know how it is
  implemented internally

BTW, Wikipedia is awesome but I'm hella not going to take their description on this. H+P, SICP, DragonBook, ... some primary source but not Wikipedia. The above mentioned POCSD says:

  The abstraction that models the active mechanism performing
  computations. An interpreter comprises three components:
  an instruction reference, a context reference,
  and an instruction repertoire.

MS-BASIC doesn't fit that definition.

dahart · on Sept 18, 2017

Hey I'm sorry you didn't like the article. I hope you can find some reading you do like, and comment on that instead.

I love SICP, but I feel like you mistook an example of one kind abstraction as the definition of all abstraction.

A function does abstract code. A programming language does abstract machine language. If you feel otherwise, good for you, but you've confirmed my suspicion that your definition is far removed from common usage. I hope that helps you understand all the pushback on your first comment, but otherwise I have nothing else to add. I don't want to argue over what abstraction is, we are already too far away from any specifics that matter.

CalChris · on Sept 18, 2017

Thanks. I follow V8+WASM sort of closely and there are things I like there. But the Iron Rule of HN is that if you criticize, you will suffer! That's OK. I think it's important every once in awhile to plant your flag in the ground and make sure you can defend it; not to the point of trolling but do I really understand my opinion. I'll admit that my notion of interpreter is perhaps historically limited and that there have been others. In any case, I appreciate the back and forth. I learned things.

iainmerrick · on Sept 18, 2017

No, the grandparent post is correct. In my experience interpreter means anything other than executing machine code directly. That could mean interpreting the source directly (as was often done for BASIC), parsing the source then interpreting the syntax tree, or compiling to byte code and interpreting that. If you have any kind of innermost "engine loop" to fetch the next instruction and figure out what to do with it, you have an interpreter.

If you compile to machine code, you no longer need an engine loop -- the CPU does it.

pmelendez · on Sept 17, 2017

"They interpret bytecode; that's why they're called interpreters."

This is not universally true. Back in the day, interpreters execute machine code directly from source code. I still remember the debates about whether the JVM should be consider an interpreter. It is a bit of a pointless debate in my opinion anyway.

infogulch · on Sept 17, 2017

It's possible that the code generator and interpreter are bundled. (Though I have no idea in this specific case.)

For example, it could be that the parser always passes the AST result to the interpreter which generates and starts to interpret the resulting bytecode directly. Then later it's passed to the JIT from there.

oscargrouch · on Sept 18, 2017

> Ignition, the interpreter, generates bytecode from this syntax tree.

I guess she was addressing something more V8 specific.

In the case of V8, the Ignition 'module' first generates the bytecode directly from the AST, and then in another step, interpret the generated bytecode.

So, in this particular case it does both.

pitaj · on Sept 17, 2017

I'd love to see a tutorial on how a certain optimization pattern is coded into the compiler.

Annatar · on Sept 18, 2017

If they have all the plumbing, why are they forcing us all to go through the virtual machine?

Why didn't they finish the job and produced a proper compiler which generates straight machine code, like a C compiler does?

bobinatorino · on Sept 18, 2017

Seriously though. Omitting this detail seems strange; providing an actual compiler seems like such an obvious thing to do -

1. In the browser, even if you can't allow raw machine code (for security reasons), you could at least compile your V8 bytecode into LLVM IR and then into WebAssembly. It would make front end code way faster running (b/c of compiler optimizations on the ENTIRE code base) and loading (smaller sizes, probably). 2. On the back end, you can compile straight to machine code instead of V8 compiling stuff at runtime. It would certainly be faster, as that compilation is happening in advance versus at runtime.

There has to be some reason why this isn't being done. Perhaps the information at runtime is required, and a proper compiler would be too slow? Maybe there would be no major performance benefit? I could be wrong about the above two ideas. The V8 team has a ton of really smart people, so I can't imagine they haven't already considered this.

tmzt · on Sept 18, 2017

I glanced at the title late last night and thought it said VB bytecode, as in PCode.