Statically Recompiling NES Games into Native Executables with LLVM and Go

tibbon · on June 7, 2013

This is amazing. Also, this is the Dark Magic of programming that I don't think I'll 100% grok in 20 years, but its good to try!

edit: now that I think of it, I really need to keep expanding my knowledge. I'm going to go through this post in my terminal and try to at least make the stuff work, so I can start understanding this process. I've been trying to learn Go and C better anyway. Thanks for providing a ground to learn more.

exch · on June 7, 2013

I've looked at this sort of stuff as utter voodoo for a long time. Then I ran into this book[1], and everything just kind of 'clicked' into place in my head. I can't recommend this book often enough.

[1]: http://www.nand2tetris.org/book.php

In short: It gives you a hands-on approach in designing and building your own computer and programming language, to end up writing and running your own games on the system.

    * It starts with simple boolean algebra to explain and create logic gates (NAND, AND, OR, XOR, etc).
    * Use these to build an ALU, memory banks and eventually a full CPU.
    * Design an assembly language and assembler for this system.
    * Use the assembler to create a higher level OOP language, compiler and code base.
    * Use this language to write a rudimentary operating system.
    * Write a game to run on the OS.

All using very clear and simple English and a very comprehensive emulation system (written in Java) by the authors.

Edit: For some reason, this site has started showing malware warnings in chrome and firefox since today. Even though Google's advisory[1] makes no mention of any actual malware being detected. I've visited this site safely for a long time. Still, if you don't trust it, then wait until Google clears up the issue. I've already contacted one of the authors about it.

[1]: http://safebrowsing.clients.google.com/safebrowsing/diagnost...

nitrogen · on June 7, 2013

Firefox indicates that nand2tetris.org has been reported as an attack page. Has the site been compromised? Anybody know how to contact the author?

Edit: just noticed the original Edit mentioning that you contacted one of the authors.

justncase80 · on June 9, 2013

Same with Chrome. Beware this link.

smackfu · on June 7, 2013

Incidentally, a computer engineering curriculum starts a few levels below that, with the physics of the transistors and how circuits actually work.

shanselman · on June 7, 2013

Brilliant! I just bought this based on your recommendation.

tptacek · on June 7, 2013

Likewise. Awesome comment. Book I wasn't aware of. Thanks so much for taking the time to write this.

scrame · on June 7, 2013

The Book, TECS [The Elements of Computing Systems] is great, but the version I have was missing a lot of stuff. There were certain parts of the hardware design that were missing or not really completeable with what was provided, and not a lot of follow-up resources.

The original guy who built a CPU in minecraft actually built it based on some of the exercises in that book. Good stuff overall.

tmzt · on June 7, 2013

Might be a good candidate to port to asm.js or even just javascript, there are platforms without Java where this could be a valuable teaching tool.

abecedarius · on June 7, 2013

Not a real port, but I made an HDL as a Python DSL and coded most of the hardware exercises with it. The majority of the work went into emulating their test script interpreter.

Regular Javascript should be plenty fast except for full system simulation with no custom component simulators.

https://github.com/darius/logsim

BHSPitMonkey · on June 8, 2013

Or, better yet, just port the Java runtime to Javascript...

cgag · on June 8, 2013

http://int3.github.io/doppio/about.html

https://github.com/int3/doppio

steveklabnik · on June 7, 2013

I was about to write something telling you to not be so hard on yourself, but your edit is in the right place!

The only thing stopping you from learning 'dark magic' is your willingness to put time, study, and asking others about it. If you want to learn, just keep pushing forward.

scott_s · on June 7, 2013

My mantra is "It's all just code." If you want an overview of systems topics, take a look at the webpage for the book "Computer Systems: A Programmer's Perspective": http://csapp.cs.cmu.edu

The CMU professors who made the book also have a course at CMU which uses the book as a textbook. I TAed a course at Virginia Tech that used the textbook. I thought the course does an excellent job in demystifying many systems concepts.

logic · on June 7, 2013

Just a quick note about the disassembly challenge he faced (indirect references), having gone through this before: you can get amazingly good results by cheating a bit. That is to say, rather than assuming you actually have to properly execute through the code path, you can get very close by roughly tracking register assignments when making your initial pass through a block of code. (Even better, if you can track potential ranges of values with later calls into a given block. Some of this depends on how you've implemented your disassembler, though.)

I ended up doing this with a SuperH disassembler (with SH2, due to its two-byte opcode layout, indirect addressing is the order of the day), and by doing basic register assignment tracking and adding a few crude heuristics, I was able to get very usable results. No, the end result won't be "pretty"; you'll be moderately embarrassed to show it off., but it will work. :)

(Heuristics: one structure that I had to manually handle were compiler-generated jump tables; thankfully, for my project, I'd had a bit of help from the compiler that was used, and there were distinct signatures I could key off of.)

If you're even remotely interested in the disassembly aspect of this, I'd recommend learning a bit about a piece of software called IDA Pro: https://www.hex-rays.com/products/ida/ As horrible as the UI of it is, there is simply nothing better on the market for reverse engineering analysis.

vidarh · on June 7, 2013

Second this. There are a lot of "signatures" in most asm. Programmers for 6502 and derivatives might be a nasty bunch of sadists that love to do weird stuff to save cycles, but even there there are lots and lots of common patterns that often "happened" just because people learned from the same sources, or because it made sense, or because conventions appeared.

I never had a NES, but I had a C64, and the 6502 code wrote there seemed nasty to translate on the surface, with lots and lots of self-modification, for example. But in the end most of the self modification was specific looping patterns because the 6502 can only index 256 values, and so many loops involved writing addresses into the looping code, iterate 256 times, increase the most significant byte directly in the code and see if you'd reached the end, and jump back to iterate 256 times.

Most of this "nasty" stuff is relatively well known by now and much of it is relatively regular and easy to detect.

delroth · on June 7, 2013

Constant propagation is not really cheating though: it's a completely safe and accurate optimization. We use that in the PPC->X86 Jit of the Dolphin Emulator to reduce register pressure and use the fact that X86 instructions can have 32 bit constants (while PPC is usually limited to 16 bit consts, and 32 bit values are loaded with 2 instructions: lis/ori). If you implement it properly, you can actually brag about it :) (we have an abstract object that can be either an X86 register or a constant value, and instruction handlers handle these two cases differently - when they can't, the constant is loaded to a register).

+1 for IDA Pro. It's a shame this software is so expensive. The UI is actually pretty decent when you get used to it, and there are a ton of good plugins.

VeejayRampay · on June 7, 2013

This is one of the best technical articles I've seen in a long long time congratulations. I won't go and pretend I understand what is really going on but the writing style is excellent, to the point and the general flow and formatting are a pleasure.

Props dude.

tharshan09 · on June 7, 2013

I agree. Really well laid out, easy to understand for lay person without using too much technical jargon. I enjoyed the long code pastes; rather than a github repo link.

comex · on June 7, 2013

Somewhat off-topic, but to defend gcc against clang, here is a modern version of gcc with the correct warning option:

    $ gcc-4.8 -std=gnu99 -Wall -o test test.c
    test.c: In function 'main':
    test.c:6:5: warning: suggest parentheses around comparison in operand of '&' [-Wparentheses]
         if (foo & 0x80 == 0x80) {
         ^

gcc 4.9 will have colored diagnostics, too.

Cool project, though.

pjscott · on June 8, 2013

The rivalry between gcc and clang has been wonderful for everybody who use either of them.

nullc · on June 8, 2013

Some of these warnings start delving a bit too close to "Warning: Competent C programmer detected", ... too eager to flag "less common" usage like the dreaded 'original definition of the insertion operator' rather than specifically targeting things that are genuinely suspicious.

nitrogen · on June 7, 2013

This would be a great way, on a compiler or project that doesn't have this warning enabled, to conceal a deliberate bug and have it appear to be accidental.

dietrichepp · on June 8, 2013

Hey, check out the Underhanded C Contest.

http://underhanded.xcott.com

Filligree · on June 7, 2013

Modern PS2 emulators - which is to say, pcsx2 - uses dynamic recompilation to execute games at useful speed. Static recompilation might not be a useful technique, but did you consider a dynamic version? What caveats are there?

AndyKelley · on June 7, 2013

About halfway through the project I realized that static recompilation is pointless and that dynamic is the way to go. I felt like it was worthwhile to at least get to the "able to play super mario 1" checkpoint before quitting. I did not do any investigation into dynamic recompilation other than pondering about it and concluding that it is more practical than static.

jmhain · on June 7, 2013

Why is static pointless and why is dynamic better? If emulating a newer game console, couldn't you get better performance by running a statically recompiled game, since it doesn't have to do the extra work at runtime? Or better yet, couldn't you cross-recompile a game to run it on a platform that couldn't normally handle emulation of the target platform?

vidarh · on June 7, 2013

Dynamic lets you do lots of nasty tricks, and also lets you detect and work around various nasty tricks at runtime (worst case by falling back to emulation). Consider that old games often would use self-modifying code, for example. Reliably statically detecting self-modifying code can be extremely hard even when it's not intentionally obfuscated. But at runtime it is "easy": Write protect all the code pages, and trap page-faults, and either modify the offending code, or fall back to emulation.

In general, I think dynamic approaches are ideal when you're dealing with a "hostile" environment where the software you're translating was written with no expectation that it would be translated, and possibly (like with old games) in a situation where the programmer may have tried to really maximally exploit the hardware, because it means failure to statically determine that something weird is going on can often be counteracted much simpler by detecting attempts at violating your assumptions.

You can do hybrid approaches, and statically make a "best effort" and include similar methods to trap stuff that breaks your assumptions and fall back to JIT or emulation, but if you do that then there's a tradeoff between how much dynamic stuff you need to be able to do before it's easier to just do everything dynamically from the start.

The performance thing is not so easy to ascertain. JIT'ing code takes a bit of time, but not much compared to the expected overall time the program will be run afterwards. Static compilation can spend more time doing optimisations, but JIT's at least in theory have more information to work with (can detect the specific processor version, and use specialised instructions or alter instruction selection, for example, or could at least in theory even do tricks like re-arranging data to get better cache behavior (I have no idea if any existing JITs actually do that) based on profiling access patterns for the current run.

jmhain · on June 7, 2013

Thanks for the thorough response. I really appreciate it.

delroth · on June 7, 2013

You can't: as long as you have self mutating code (including loading more code in memory from another place), static recompiling is simply not possible. It's not a matter of being "better", it's the only solution.

As video game consoles start looking more and more like PCs, self mutating code starts to be less frequent and more easily detectable. On the Nintendo 3DS for example, only the OS can allocate executable pages, and executable pages are forced read-only. This means you could in theory do static recompilation of a 3DS game. This is the only case I know of though - all other recent video game consoles allow self mutating code or dynamic code loading.

vidarh · on June 10, 2013

That's not really true. A lot of self-modifying code follow very predictable patterns where the modifications treats addresses in parts of the code as a variable. In 6502 code in particular this is a common idiom for looping over arrays of more than 256 bytes. Many of these patterns are easily detectable and easy to statically rewrite.

You likely can't handle the general case, but that's a lot less critical. Especially for old consoles or computers where the pool of software with too complex cases that can't easily be handled with generic analysis is small enough that you can reasonably add special cases for most stuff you care about.

comex · on June 7, 2013

Even so, I wonder if you could get better speed for, say, Dolphin, by pausing to do as much statically as possible (perhaps going all the way through LLVM's optimizations) rather than being in a hurry to do everything dynamically without pausing.

delroth · on June 7, 2013

I wanted to experiment with this idea at some point (without pausing - using a background thread that does optimization and patches code with the optimized version when it's done) but our current JITCache is just not fit for this. We perform JIT compilation per-basic block, which does not allow for a lot of optimization compared to per-function (following all direct branches as long as possible). Changing that would've required too much time and I got bored of it :(

tluyben2 · on June 7, 2013

This is one excellent read! Thanks to the author for writing this down. Not that i'm not interested in the NSA, but this is a welcome diversion. And something I wanted to play with myself for a long time.

AndyKelley · on June 7, 2013

Thank you. I do not write often, so this was a challenge for me. Constructive criticism welcome.

davvolun · on June 7, 2013

I thought it was an excellent experiment and I really appreciate everything you did. I wanted to suggest though, particularly in the "Assembly" part, as an example, that you should link to something like paste bin for the code rather than placing it in-line. Unless you're referring to specific aspects of the code immediately before or after, it's not useful to see the actual code posted in-line, in general. And even then, you should select small snippets of the code (as you did elsewhere in the article). Small nitpick though, thanks for the work, and for writing up everything.

Luc · on June 7, 2013

Well you certainly rose to the occasion. I look forward to leisurely reading this in detail.

Before I looked at the article I immediately wondered how you were going to handle self-modifying code (running on the internal RAM of course). I guess you didn't encounter that situation?

AndyKelley · on June 7, 2013

That situation is covered in this section: http://andrewkelley.me/post/jamulator.html#dirty-assembly-tr...

Basically I embed an interpreter runtime and use it only when necessary, such as in the case when the program jumps to RAM.

Good point though. I should specifically mention self-modifying code.

Note that with NES games, self modifying code is uncommon, because programs are 32 KB ROM, and you only have 2KB RAM. So you'd have to first copy your subroutine from ROM to RAM, and then jump to it. And then you have that much less RAM to work with.

However, some of the emulator test ROMs[1] people have made use this technique to test every instruction.

[1]: http://wiki.nesdev.com/w/index.php/Emulator_tests

quux · on June 7, 2013

This is interesting. If this project can output LLVM byte code, then you could also codegen to javascript with emscripten and make a web based version of a NES game.

acdha · on June 7, 2013

See also https://github.com/jsmess/jsmess which is a port of an existing emulator codebase using emscripten

chadseibert · on June 7, 2013

I'm not quite sure what the point of this project is, as MESS is rather slow when built natively [1]. Perhaps some of the lower end systems can be emulated, albeit slowly.

[1] http://www.mess.org/faq#it_works_but_it_is_way_too_slow

acdha · on June 7, 2013

http://archiveteam.org/index.php?title=Javascript_Mess summarizes why: it makes it as widely available as possible and lowers the barrier to entry considerably.

derleth · on June 8, 2013

The point is historical preservation: Putting ancient games and systems on the Web as a way to enable people to experience them, as part of digital museums, online academic papers, and such, to provide some (imperfect) context for what would otherwise be dry technical history. It's in JS because it's the only language that doesn't require plugins, and so has a better-than-average chance of running on a variety of different hardware.

http://jsmess.textfiles.com

http://ascii.textfiles.com/archives/3745

http://archiveteam.org/index.php?title=Javascript_Mess

chadseibert · on June 8, 2013

But why use a deliberately slow emulator as a base for all of this? Is accuracy really that important when compared to being able to run it on a wide variety of hardware? Why not just bundle some existing JS emulators together or take some fast native emulators and do the emscripten thing?

vidarh · on June 10, 2013

The two are not competing goals. There are plenty of emulators that aim for speed, so it makes perfect sense for someone to want to focus on accuracy. And yes, it is that important if your goal is to actually preserve things properly.

I don't know about the NES as I never had one, but on the C64 for example, a single cycle deviation in the emulation of interaction between the CPU, graphics chip and memory bus would make some effects impossible to reproduce. And similarly on the C64, people are still struggling to make the SID (sound) chip emulation as accurate as possible, as it used a combination of analogue filters that have proven extremely hard to accurately reproduce in emulation.

People casually testing these games might not care, but many of those of us who used these systems notice these flaws and appreciate that not all of the emulation projects focus only on speed.

acdha · on June 11, 2013

Because it's fast enough or close to it now and this should only improve over time: it's often much harder to improve accuracy depending on the shortcuts made for performance.

AndyKelley · on June 7, 2013

That thought occurred to me as well, although I did not actually try it.

tibbon · on June 7, 2013

Seems like a good followup post. I'd read that.

I mean essentially it would make NES games running natively in the browser with no emulator right?

AndyKelley · on June 7, 2013

It's not quite as sunshine and rainbows as that. For one, I had to embed an interpreter runtime in the generated code to handle some dirty assembly tricks that programmers can do. Further, the Picture Processing Unit and Audio Processing Unit must be emulated. Even worse, there are some things that fool the disassembler that many games do. And finally, this project never even attempted to support mappers, which means that there are only about 8 or 9 notable games that this could even potentially support.

As is, it supports Super Mario Bros. 1 only.

boomlinde · on June 7, 2013

There was a NES emulator for the Game Boy Advance that performed with a surprising level of accuracy at full frame rate. I guess the CPU emulator was a simple state machine, but the interesting thing in this context is how it did the graphics and sound. Having a pretty advanced "PPU" of its own, with the same basic types of features (sprites, tile indexing background layers), it translated the PPU register writes (with appropriate scaling, I guess) to native register writes. I'm not sure how things like collision interrupts were handled, but presumably the GBA has similar functionality. For a javascript emulator or static recompilation, similar methods could be used to set up a bunch of shaders dealing with the basic functionality of the PPU (sprites, tile layers etc). Certainly less accurate and probably missing a bunch of corner cases, but it would definitely be the fastest approach to dealing with graphics emulation.

quux · on June 7, 2013

Cool, no need to reply to my email then :)

I've been messing around with emscripten lately myself. Maybe I'll try it some time.

rboulton · on June 7, 2013

Or, you could just use an emulator written by hand in javascript: http://fir.sh/projects/jsnes

BHSPitMonkey · on June 8, 2013

That project's name has always bothered me... My brain tokenizes it as J SNES rather than JS NES.

Pxtl · on June 7, 2013

... while it's not really useful for the NES, which is so old that emulating it does not strain even the crudest modern processor, I'd be excited to see this technique applied to newer consoles for lightweight mobile processors.

michael_h · on June 7, 2013

Emulating accurately takes much more power than you'd think: http://arstechnica.com/gaming/2011/08/accuracy-takes-power-o...;

Pxtl · on June 7, 2013

Oh, I know, but that's usually really tiny edge-case things. NES emulation was "good-enough" speed and accuracy-wise a decade ago. These days even an older-model smartphone can emulate NES games solidly well.

boomlinde · on June 7, 2013

You're right in that emulating the NES "good-enough" for most games is no big feat, in terms of power, but you are wrong in saying that it's because the platform is "old". There are old systems that are much trickier to emulate simply because software takes advantage of every corner case in the hardware design, using cycle perfect timing to exploit unexpected behavior in the chipset. The C64 is the perfect example, still having a pretty vibrant demoscene, coming up with new tricks that break emulators every year. Then, to get a reasonable level of accuracy, you need to emulate everything per cycle, complete with a bunch of analog hardware (simply emulating the SID sound chip is quite a heavy task) and registers bound to pins simply left floating.

mAritz · on June 7, 2013

Unless you get into speedrunning games where accuracy matters a lot and some emulators are "banned" because they are lacking in it. Banned in this case of course only meaning that you will not get recognition from most of the communities for performing speedruns on such emulators.

simias · on June 7, 2013

I think it might actually work better for more modern hardware. Less handcrafted ASM tricks, much more regular (compiler generated) machine code. And of course no self-modifying code that would be extremely difficult to recompile correctly.

Modern hardware (GPU, sound cards,...) is also very similar to what you find on a PC so it would be more straightforward to port all this code. No messing around with the framebuffer mid-scanline to create a cool effect, no quirky special purpose hardware for very specific tasks.

delroth · on June 7, 2013

This is so wrong on multiple levels.

First, self-modifying code is still extremely present on modern consoles, at least on the current generation (PS3/X360/Wii/WiiU). Loading code from external media is basically the same problem as self-modifying code (statically recompiling it is trivially equivalent to solving the halting problem).

Second, modern hardware might be similar, but game consoles SDKs export a lot more features to the developers than PC drivers do through DX/GL. The example I take every time is fetch shaders on the WiiU: these are a kind of shaders supported by AMD R600 GPUs but completely abstracted by DX/GL.

Third, maybe there is no more mid-scanline framebuffer tricks, but you have a ton of other problems with the framebuffer: while a PC assumes separate CPU/GPU memory, on modern consoles the framebuffer (and a few textures) are often stored in memory that is shared and synchronized with both CPU and GPU. This is incredibly hard to emulate because a full GPU->CPU FB transfer induces a lot of latency (several hundreds of us last time I checked). IGPs and APUs make this problem a bit more manageable, but we're still missing the graphics API support for shared FB and shared textures.

Some things are better than older consoles but some other things are also a lot worse. JIT-ing shader bytecode is another problem that I don't think has been tackled yet (except maybe for Xbox emulation - which is still in its infancy and for a console using a very old GPU with no use of stuff like compute shaders).

drbawb · on June 7, 2013

The other interesting thing about older hardware is that each cart could embed special hardware that the NES could take advantage of. To play those games: that extra hardware has to be emulated as well.

So far as I know: this is unheard of with current gen consoles.

The most recent example I can think of is for a handheld console. The Pokemon Walker that was bundled with the newer Pokemon games for the Nintendo DS; which I believe has the IR hardware embedded in the cart itself.

So in addition to worrying about rather interesting use of the stock hardware, you also have to consider interesting use of _secondary_ hardware.

---

The latest batch of consoles [Xbox One, PS4] look to be x86 PCs with high-bandwidth memory; if that's the case, I'm hoping PC ports are more common, and perhaps we'll even see a virtualization based approach to running next gen games on standard PC hardware.

Guvante · on June 7, 2013

I don't know how common add-ons were in the NES era, but they were very common for the SNES (which was very similar to the NES power wise).

Games stopped embedding hardware when they went to discs. There is no way to put a parallel processor into a DVD.

drbawb · on June 7, 2013

Well, modern consoles _could_ still be extensible; but their hardware is already so general purpose that there's not much point.

Best you could do w/ current gen technolgy is bundle a dongle w/ the game, where the user plugs in some kind of co-processor through USB.

So far I haven't really seen anything like that -- the only USB dongles I've seen bundled w/ games are for games like RockBand and they're just RF receivers.

Aside from bandwidth concerns, and the poor sales of previous attempts (for e.g the SEGA's whole 32x/CD addon), there's nothing preventing a disc-based from having an external co-processor.

AndyKelley · on June 7, 2013

I hinted at the end what kind of technique I think might actually be useful:

  For example, one such technique is to identify a section of code, make some
  assumptions based on heuristics which allow for highly optimized native code
  generation, and then detect if those assumptions are broken. If the assumptions
  are broken, the generated native code is tossed, and emulation takes over.
  However, if the assumptions are upheld, the recompiled block of code will
  execute with blazing fast native speed.

Someone · on June 7, 2013

You probably know it, but that approach is used everywhere, for example in JavaScript and ruby, where it is typically is impossible to prove much about your program (in some dark corner of the program, someone might redefine that function that appears to add one to a number, if the program is run on Thursdays)

I also have a minor, minor nitpick on the article: I think you should point out that those INY instructions, in general, are insufficient to increase 16-bit pointers. You have to check for wraparound, and increase the high byte if a value wraps to zero. Somebody must have checked (or hoped) that that didn't happen with these tables (developing with the long, safe form and replacing it by the short form before release is tricky, as shortening the code will move entry points)

darkf · on June 7, 2013

While not completely static, some modern emulators do use dynamic recompilation (essentially JIT) instead, which gives you more information to work with and lets you generate more optimized code. You can always fall back to interpreted code as the author does in this article, too.

davvolun · on June 7, 2013

Personally was very interested in this experiment after seeing this article yesterday: http://www.tested.com/tech/gaming/456272-straightforward-gui...;

pilif · on June 8, 2013

This is one of the best articles I've seen linked here in a long time. oP covers so much stuff but simplifies exactly where needed so everything stays understandable and there are no gaps (the "how to paint an owl" syndrome).

Thank you so much for writing and posting this. You made my day.

CountHackulus · on June 7, 2013

I seem to remember someone doing this for the original xbox and getting up to the halo "start game" screen. That was probably easier due to it being roughly the same architecture. This is something else quite different and really neat.

kriro · on June 7, 2013

I won't even pretend that I understand half of this but from a quick browse this looks pretty interesting.

It seems very well written, too.

Filed away into my magic "ZOMG INTERESTING PROJECT IDEA" folder :D

AlexanderDhoore · on June 7, 2013

Oh, man! I know that folder! I don't have one, but 20. I've switched from bookmarks to actually writing it down on a piece of physical paper. And not just the url. I write down a small explanation for my future self. I haven't been bored in months :D

p_f · on June 8, 2013

Very interesting article indeed. Some time ago I made something similar for GameBoy games and ran into the same set of challenges (and ended up using similar techniques). The ROM is decompiled and translated into C code, which is then compiled and linked with runtime libraries. Jump tables and indirect jumps often need some manual fixing. I went up to the point where I can convert some simple games (without memory mappers) into binaries running on iOS and X. I did not have the time to document the tools but if anyone is interested to continue that work just let me know.

I guess one of the advantages of static recompilation is that you can port old games to new platforms if you hold the copyright of the game itself, but without running into issues with the manufacturer of the console (Nintendo)--but I might be wrong. You could also conceivably improve the game more easily during conversion (e.g., incorporate higher-resolution graphics). Finally, you could potentially have the resulting code distributed via app stores that do not allow general-purpose emulators.

shanselman · on June 7, 2013

This article is a joy. What a wonderfully written and through explanation of the space. I live for this stuff.

lucian1900 · on June 7, 2013

There is some research on this http://www.pagetable.com/docs/libcpu/26C3-libcpu.pdf

It's a very interesting topic. It may be our best chance at preserving software.

darkf · on June 7, 2013

It's really unfortunate that libcpu didn't take off. Last I checked, they got nowhere with no contributors, and now their site 503s. It was an interesting project.

lucian1900 · on June 7, 2013

I think it's more than that. It is not yet clear that this approach can work in the general case without emulation: the halting problem may be in the way.

AndyKelley · on June 7, 2013

Correct. If you think about it, the program could prompt the user for the address to jump to, and then the program could go there. In fact, that's exactly what some games (accidentally) do: http://tasvideos.org/2341M.html

It is impossible to solve this statically. This is why dynamic recompilation is more practical.

BHSPitMonkey · on June 8, 2013

Thanks for that link; That hack (and the accompanying video) has just become perhaps my favorite thing, ever. Also, it led me to this similarly incredible hack as well:

http://www.youtube.com/watch?v=D3EvpRHL_vk

lucian1900 · on June 7, 2013

Sure. It could be argued that "general case" is a bit more restricted than that for actual useful software in the wild, but that would still break any program with a JIT.

Sadly, emulation seems to be the way to go.

RegEx · on June 7, 2013

Please forgive me for the bikeshedding, but I have a quick question as a C novice: Is the equality check in the following necessary?

    if (foo & 0x80 == 0x80) {

I thought if you're checking a bit simply anding would be enough (since everything else would be zeros). In other words, could we just use

    if (foo & 0x80) {

If so, then it seems like this would be the preferred form to avoid the precedence issue presented in the article.

jlgreco · on June 7, 2013

I am seeing that as basically defensive coding. Similar to how you may do this:

  switch (mode) {
      case FOO: ... break;
      case BAR: ... break;
  }

Do you need the very last break? Technically no, but including it so that you don't forget it when you need it later can be a good idea.

(Also, I think explicitly using comparison operators in conditionals might be the idiomatic thing to do in Go. Someone correct me here if I'm wrong about that.)

RegEx · on June 7, 2013

The example was using C, not Go.

jlgreco · on June 7, 2013

Aye, though the same code later appears in Go.

Hytosys · on June 7, 2013

0x80 is 0b01010000.

Consider that (foo & 0x80) is (0x16 == 0b00010000) if foo is 0x16.

(This may or may not be the intended functionality one way or the other!)

RegEx · on June 7, 2013

I believe 0x80 is 0b1000000

Hytosys · on June 7, 2013

Oops, did everything in decimal. Good point and correction. Still, the difference in legibility and implications of the two lines of code are important. Modern compilers would optimize ((foo & 0x80) == 0x80) to (foo & 0x80) but leave ((foo & 0x60) == 0x60) alone because it is testing more than one bit.

chadseibert · on June 7, 2013

I agree; this is quite amazing work! I've been meaning to do something like this; perhaps generate a native executable or something similar.

grapjas · on June 7, 2013

Interesting stuff, and I like the writing style.

patresi · on June 7, 2013

I had a similar idea to this that I never really put in practice which was doing some sort of static recompilation but to higher level code in order to make open source versions of some NES games that could be used by other people to do the same. Accuracy would not be a concern as big as a pure emulation project.

I never got past the reading phase.

QEDturtles · on June 7, 2013

I've been meaning to port some classic games over and utilize better input methods for a while. It would be fun to be able to load old GB games on my Android phone and tap the menus instead of navigating them with the DPad. Thanks for this, I was looking for something to do with my Friday night!

leehro · on June 7, 2013

This was fantastic, thank you.

Static recompilation seemed like an obvious solution to emulating games in theory, but it was fascinating to see just what it would take. Also loved to read about the clever tricks from 30 years ago and how we can or can't deal with them.

saejox · on June 7, 2013

Someone should recompile ps2 games to x86.

0xe2-0x9a-0x9b · on June 9, 2013

The plans in the Conclusion section look interesting.

dschiptsov · on June 7, 2013

What is amazing here is not the techy stuff, but productivity and clear understanding of concepts. Of course, such shape (of mind) comes from years of daily practice. That's why I know I will never write anything good - I didn't spend enough time practicing. Practice leads to perfection (not reading HN).

And look, the guy is not using any IDE or proprietary tools - just a terminal window and command line (what a horror!) tools. Looks like they are good enough..)

All that 9-to-5 Java coders should at least commit suicide.) More seriously - this is very clear illustration for startup founders of what a huge gap lies between mediocre and a top performer.

Convincing a top performer(s) to work for you is the real secret of a successful startup. Even pg (god forbid!) could be not so successful without rtm.))

spc476 · on June 8, 2013

It depends on what you are used to. I started programming in the 1980s and the first editor I used was pretty much like EDLIN (http://en.wikipedia.org/wiki/Edlin)---think of an unholy cross between the Unix commands cat and vi (line based and modal).

And, except for code completion, there isn't anything an IDE can do that can't be done via the command line (just not as conveniently). Then again, I don't program in Java.

dschiptsov · on June 9, 2013

vi is a small miracle of software engineering.