Hacker News new | past | comments | ask | show | jobs | submit login

Ok, one way to do interpretation is to do a loop like

  while True:
    instr = decode_instruction(pc)
    execute_instruction(instr)

Where execute instruction is usually a big switch that has every opcode on it, or, in my case, is a call to a function in a table whose indices are opcodes.

The problem is you decode the same instructions many times (in a loop, for instance). A slight more elaborated version is like this:

  while True:
    if not pc in cached_code:
      block = generate_block(pc)
      cached_code[pc] = block
    run_from_cache(pc)

The generate_block function decodes and generates code for every instruction until it finds a jump. When it finds it, it finishes up that block, which is then added to the cache.

Now, whenever there is a jump to an address that is in the cache, the cached code can be executed directly.

Finally, the captures up to three backwards jumps thing:

My first try was to have the cached code just be a regular function call without loops. So a block of code is executed and then control is given back to the main loop.

But a significant number of jumps have targets that are inside the same block of code (anything you run on a loop, for instance). So, to get better performance, I generate, for my block of code, a function that has a loop in it.

It has a rather characteristic look:

  function code_block(cs) { // cs is the "cpu state"
      var count = 0;
      cs.next_pc = BLOCK_START;
      block_loop: while(++count < 3) {
          count++;
          block_choice: switch(cs.next_pc) {
              case BLOCK_START: cs.r0 = cs.r0 ^cs.r0;
              case BLOCK_START+4: // regular instructions fall through
              case CONDITIONAL_BRANCH_ADDR:
                  if(branch_condition) {
                      cs.next_pc = BRANCH_TARGET;
                      break block_choice; // this captures backwards jumps targeting the block
                  }
              case  UNCONDITIONAL_BRANCH_ADDR:
                      cs.next_pc = BRANCH_TARGET;
              default: break block_loop;

          }
      }
  }
The "captures up to three backwards jumps things" is controlled by the ++count < 3 loop condition. 3 was obtained by trial and error and looks like a good number for my machine (I could, for instance, put 100 in there, but then my javascript functions would take too much time to return and the javascript engines don't like this).



Thanks for the explanation! It makes sense. I was just wondering though -- why is it necessary for generate_block / decode_instruction to be called on the fly, instead of as a pre-compilation step? I guess you need some sort of runtime information for the compilation?


What you're looking for is called static recompilation. It's what Emscripten [1] does, it statically recompiles LLVM bitcode to javascript. Emscripten is a wonderful project, by the way. After compiling from LLVM to Javascript, it applies an algorithm (relooper) to the "intermediate" javascript code whose result is code that resembles the structure of the original code so that, for instance, if the original code has a loop, the final javascript code will also have a loop. It is a true gem, and it's part of what makes Emscripten so awesome and fast.

I believe I could do static recompilation for programs that run on a barebones environment and do not employ self modifying code. The more general problem of finding all executable code in a blob of arbitrary binary data (which I would have to solve if I wanted to run an arbitrary program unmodified, which I do), is equivalent to the Halting Problem, so that is a no go. This means that, to have a general solution, a hybrid approach of static and dynamic recompilation would have to be employed.

For well behaved programs, code generation on the fly does not pose a significant overhead (that's what the profiler tells, if I recall correctly), so I'm not sure the hybrid approach is worth its weight in complexity.

Besides, I don't know if I'd be able to run a full OS, though, because those need interrupts (in my case, currently there's a timer tick and the UART interrupt when input is entered through the keyboard), and I haven't thought about handling that.

[1] https://github.com/kripken/emscripten


Yup, I'm aware of Emscripten, which was part of the reason behind my question. I hadn't considered the issue of lack of code / data separation when dealing with pure binary code.

I'm actually working on a JVM interpreter in Coffeescript now, so this topic is particularly interesting to me. Thanks again for taking the time to explain things.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: