All that work, and then he hits the expected killer problem - self modifying code. Getting past that without an interpreter is going to be tough. Not impossible, because the self modifying code isn't that dynamic in the places he found it. It's not like it's reading external input and compiling it. There are a limited number of cases to be handled.
It's amusing seeing this in a machine which gets its code from a ROM.
That example is not self-modifying, though. The BIT absolute instruction simply uses the LDY immediate instruction as operand data by means of overlapping them. This saves some bytes as you'd otherwise have to use relative branching (and there's no branch-always) or a three byte jump to load a different Y value, but the code remains unsabotaged.
These tricks are IMO an indicator that they had a good idea that the game code was going to be a tight fit (or had possibly already discovered that the hard way), so they were optimizing for size. A jump would be faster than executing the BIT instruction.
The manually loading an address to the stack and running RTS is an interesting trick. Wozniak used it in his SWEET-16 interpreter, pushing the interpreter loop entry point to the stack before jumping to the subroutines that execute the byte code, just so that he could use RTS to return to the interpreter loop. In that sense it fulfills an entirely different purpose than a JMP indirect and also saves some bytes, again at the expense of execution speed and non-gray hair. For a table based interpreter, it fills gap left by the lack of a JSR indirect instruction.
> It's not like it's reading external input and compiling it.
Some games, because of bugs, allow exactly that. User input might be directly executable. That means that static compilation cannot be achievable in general. Either you have different behaviour in the event of such bugs or you have a full interpreter to fall back on.
Which is one of the reasons I hope WebAssembly becomes the de-facto standard for binary code distribution: It preserves more structure of the original program and allows less of the lowest level shenanigans (it provides a clear and protected control flow, has a small instruction set, no dynamic code generation, and no instruction misalignment / dual-alignment).
Preface: these are the questions of a CS student, not attacks.
What about java/.net/regex engines? Is it required that they exist outside the browser, or are performance limited to interpreting, or is there a better solution?
What's the problem with the control flow of assembly?
NES cartridges also could have their own hardware too. Don’t like the NES sound card? You could ship your own inside of the cartridge. I think notoriously castlevania 3 shipped some custom hardware in their cartridge. Some of the cartridges for the famicom came with an fm synth chip.
I believe this was even more common on the SNES. The most well-known example is the Super FX chip, which was used in Star Fox and Yoshi's Island among others, but there were actually a ton of these things: https://en.wikipedia.org/wiki/List_of_Super_NES_enhancement_...
This means that they test some condition, and then either transfer control flow to the next instruction, or to a different label. This means that we can mark the possible branch address and the next address as instructions.
I hope all of these NES ROMs were coded well. Seems like you could do something like...
This actually happens surprisingly often in 6502 code, because unconditional branches always take three bytes, unlike conditional jumps which take two bytes. So if the value of some condition flags are known to be constant (e.g. ORing with a nonzero constant always clears the zero flag) it saves a byte to use an always-true conditional jump in place of an unconditional jump.
"Reflection is hard to support with ahead of time compilation"
Not true at all. There's nothing in reflection and serialization[1] that is hard to implement with an AOT compiler.
The real problem with AOT-compilation in Java is that there are many optimisations, like devirtualization, that can only be done (or done much more easily/effectively) at runtime.
You can get some of this back with whole-program/closed-world optimisation, but in reality that's highly impractical for Java. Many Java programs are very large and people want the ability to update them quickly and easily, often without even restarting let alone recompiling.
An LLVM-based Java runtime could use a hybrid AOT/JIT model, however. Recompiling parts of the program as needed at runtime based on profiler data, you'd get the fast startup of AOT combined with the high performance of JIT.
Don't get any crazy ideas about beating Hotspot's performance, though.
[1] one exception is classes/methods that are defined at runtime using dynamically generated bytecode. But that's pretty rare.
Still there are plenty of commercial options to AOT compile Java, and OpenJDK is getting its free variant with Java 9 for Linux x64, with Windows and OS X already supported on the Java 10 development branch.
Oracle Labs is also pushing forward their research of meta-circular JVM, which makes use of AOT compilation. Although it restricts it to a "native Java" subset.
OpenJDK's AOT compiler, when you also allow it to include the JIT and still recompile hotspots, is about 5% slower than just using the JIT. The only reasons to use it are startup times, ease of distribution (no JVM to ship, sort of), and for use in places that don't allow JITs (iOS).
I had a similar project idea for ages. Instead of LLVM it would static recompile into C macros or C++ inline function calls for each opcode. Then it would compile the output and run the game.
The compiler tends to treat static variables very differently from locals. I was holding out for efficient local functions which can be forced inlined to keep the emulated registers as actual system registers. Sadly std::function doesn't play well with force inline on all platforms.
lol, it just came to me that clang is an abbreviation for C language. I always imagined it to be the sound metal makes when you slam it together, because bare metal programming etc.
The polite convention on HN is to say this was discussed previously, and link to it, especially when it was last submitted and discussed over four years ago :-)
I'm always confused about "Guidelines" and "FAQ". It seems that whenever I have a question about HN, I need to consult both, because it is not clear at all which site covers which question.
Maybe "Guidelines" and "FAQ" should simply be merged?
I understand what you're saying. There's the Show HN guidelines as well. One could imagine them all as three sections on one page. The guidelines are for what you should do; the FAQ for general information. Then again, they're short. It's not a lot of effort to check both.
It's amusing seeing this in a machine which gets its code from a ROM.