A new cycle-stepped 6502 CPU emulator

eatonphil · on Dec 15, 2019

This is a great introduction to emulation in general.

While I was trying to figure out how to write my first emulator I thought it would be simpler to start with a Game Boy Advance or an 8-bit processor like the 6502 or the PICO-8.

After a few false starts it turned out to be simplest to write a simple x86 emulator since I already had all the tools locally (assemblers, disassemblers, compilers, etc.) and a working reference environment (my laptop).

Here's the walkthrough I wrote for building an x86 emulator in JavaScript that you can use to run simple C programs compiled with GCC/Clang:

http://notes.eatonphil.com/emulator-basics-a-stack-and-regis...

Now that I did this though I'm still trying to get into a GBA emulator.

jsmolka · on Dec 16, 2019

> I thought it would be simpler to start with a Game Boy Advance or an 8-bit processor like the 6502

The 8-bit processor will be much more approachable. I have worked on my GBA emulator for around one year now. Before that I tried the classic GB and its 8-bit CPU was so much easier to implement. The GBA's ARM7TDMI alone took me 3 months to complete, even with extensive testing [1].

[1] https://github.com/jsmolka/eggvance/tree/master/tests

cabaalis · on Dec 16, 2019

I've been thinking of writing a 6502 emulator. My plan was to start with cycle-stepping. I'm purposefully not looking at other emulators or code :)

I do have something I've been wondering, especially since I haven't yet written any complicated code yet for this plan. To emulate a 1mhz clock on a modern CPU, do you have to do things like check time deltas to "limit" the emulator speed? (If this was address in the article, know that I stopped reading when I saw code.. see above purpose)

thristian · on Dec 16, 2019

Most (game console) emulators don't try to limit the speed of actual CPU emulation directly, since that makes the code more complex, and makes testing slower, etc. Instead, the only time the emulated system speed is when it has to talk to humans - render a frame of video, generate audio samples, etc. So if you have a 60Hz monitor (as most people do), and your emulated CPU takes 180,000 cycles (or whatever) to render a video frame, then your emulator just runs at top speed until a video frame is available, waits for the "vertical synchronisation" signal from your graphics API, draws the frame with the graphics API, and repeats.

Because of the delay introduced by vertical sync, your emulated CPU's average speed will be correct, even though the instantaneous speed is either "way too fast" or "zero".

zeta0134 · on Dec 16, 2019

Adding on to this, in the context of a typical emulator, it's important that the other hardware be emulated mostly in lockstep with the CPU. Each time the 6502 reads or writes memory, the hardware registers it is talking to should be "caught up" so that things like video and audio signal timings work correctly. There are a bunch of different approaches to this depending on how accurate you want things to be, but the most straightforward is to just clock the components directly, one after the other. I have a function in RusticNES that clocks the CPU + APU, another that clocks the PPU, and since the PPU runs 3x as fast, a global clock function that looks like:

    pub fn cycle(&mut self) {
        cycle_cpu::run_one_clock(self);
        self.master_clock = self.master_clock + 12;
        // Three PPU clocks per every 1 CPU clock
        self.ppu.clock(&mut *self.mapper);
        self.ppu.clock(&mut *self.mapper);
        self.ppu.clock(&mut *self.mapper);
        self.apu.clock_apu(&mut *self.mapper);
    }

This way, all of the emulated chips remain mostly synchronized with each other, but I don't bother to synchronize with the host until either the PPU is ready to draw a frame, or the APU has filled its audio buffer.

This approach uses the CPU cycle as the boundary, but there are other approaches to consider. You could run an entire opcode and then "catch up" the rest of the hardware (maybe easier to manage CPU state), or run entire scanlines at once. (Maybe easier to manage PPU state at first? The NES's PPU is well understood, but very tricky to emulate correctly.)

zellyn · on Dec 16, 2019

Yes, you’ll most likely need to limit the speed, if you want it to run at 1mhz.

I highly recommend a test-driven approach to your 6502 emulation, using Klaus Dormann’s amazing tests.

Since you’re aiming for cycle-stepping from the outset (as did I), you can also do fun things like run the same test suite on your emulator and on visual6502’s gate-level emulation, and verify that you generate the exact same sequence of reads and writes. This might involve porting perfect6502 from C to your language of choice, but, hey, you’re already in it for the fun, right?

Good luck, and welcome :-)

(zellyn@ on most services if you need help)

toast0 · on Dec 16, 2019

Test roms are super useful... I've written about 1/3rd of a NES emulator, the first part being the CPU, and nestest.rom was pretty helpful for making that. (Although, nestest wasn't really looking for cycle accurate read and write).

mypalmike · on Dec 16, 2019

I recently wrote a 6502 emulator in Rust as part of an Atari 800 emulator.

My first pass at the CPU emulation involved not even considering timing, just emulating each instruction to update memory and register states to spec. In hindsight I think this was a good approach, as there's enough complexity involved getting all the operations and addressing modes implemented without worrying about timing. Indexed indirect versus indirect indexed, decimal mode, etc... I learned a lot and that learning definitely has guided how I am dealing with approaching cycle accuracy.

As another mentioned, Klaus Dormann's test suite is an amazing resource - getting all those tests to pass was a battle, and prompted me to write a disassembler and other tools to help guide my understanding of what the tests were doing. It was a great feeling once all the tests passed.

I'm still working on a microcode version to handle cycle timing. I've started an approach where I emulate the system bus. Memory has its own time slice where it reads or writes to the data bus each tick (2x CPU speed should suffice to mimic sub-tick memory operations). It's kind of stupidly inefficient and unnecessary, but I like the idea that each component knows nothing of each other directly, just the pin states.

flohofwoe · on Dec 16, 2019

For a complete system emulator which renders video and audio, usually you compute how many clock cycles you need to run for one host-system-frame (e.g. for 60 fps host system frame rate and a 1 MHz emulated system that would be 1000000/60=16667 clock cycles per frame. Then emulate those clock cycles unthrottled, render the emulator output, and then simply spend the rest of the host system frame doing nothing (e.g. wait for vsync).

vidarh · on Dec 17, 2019

That really depends on the system. A large portion of the software for C64 for example depends on a cycle accurate simulation of the interaction between the 6510 and the VIC graphics chip at least, because which values are present in certain registers when the VIC renders even specific portion of a single scan line matters.

ghusbands · on Dec 19, 2019

In this case, I believe the parent post also means that you run the simulated co-processors (graphics/sound/etc) for a frame's worth of cycles, too. Essentially, they're just saying that when you've got a frame ready, wait for a vsync before carrying on.

vidarh · on Dec 19, 2019

If you mean running it all in lock-step cycle by cycle, and then pausing, then, yes that is possible.

But you need to run the full system simulated cycle by cycle in lock-step, not just the CPU.

flohofwoe · on Dec 23, 2019

Yes, that's how cycle-accurate emulators usually work :)

Each CPU cycle is "accompanied" by one "system cycle" where the video and audio emulation, and the other chips in the system are "ticked forward" by one clock cycle.

mrec · on Dec 15, 2019

Retro emulators seem to be increasing in popularity lately. I've seen a few articles now covering the "how", but does anyone have pointers to something explaining the "why"?

Is it a Zachtronics-style technical challenge? Specific nostalgia for the software that ran on these ancient platforms? More general nostalgia for the days when platform stacks were still thin and comprehensible [1]? Something else?

[1] https://ptrthomas.files.wordpress.com/2006/06/jtrac-callstac...

near · on Dec 16, 2019

For me, the "why" is that I grew up playing RPGs, and around my early teens I discovered the west missed out on dozens of major games (even Dragon Quest 5 and Final Fantasy 5!)

I taught myself programming and reverse engineering so that I could fan translate these games.

After I finished a few of them, I became disheartened that my work had trouble running on real game consoles due to emulator inaccuracies.

The folks writing emulators at the time took many shortcuts to run on the slower computers we had back then, but in spite of PCs getting faster, they didn't want to incur any speed hits or break any other fan translations relying on emulator bugs.

So I switched focus and started writing my own emulators instead. I was filling a niche that at the time wasn't covered. These days, I mostly keep at it because it's fun, I like the games on these old systems, and I'd like to preserve the games for future generations as faithfully as possible.

(if you're bored and would like a really in-depth answer, see https://byuu.org/about for a full history.)

Lerc · on Dec 16, 2019

A bit of all of that I think. I'm making a fantasy console with an 8 bit instruction set and I'm having quite a bit of fun writing assembly for it myself.

A few years ago I added a graphics mode to a DCPU-16 emulator and wrote a game of Pac-Man for it and it felt extremely liberating to write code that just does the job. I think that aspect could be achieved in other environments if you have a extremely thin API, That is, I think, one of the appeals of the PICO-8. While the Lua language is pretty powerful, the interface to the virtual hardware is thin enough that there is very little API wrangling. If you want it, build it.

As for the technical challenge side of things, I have long wanted to write a Language like BASIC in 8-bit ASM. They squeezed a lot into 4 to 16 k back in the day, it would be fun to have a go myself.

zeta0134 · on Dec 16, 2019

Given how complex a successful emulator project tends to be, I expect the answer will be highly personal. In my case, it's a combination of nostalgia for the games I grew up playing, curiosity and respect for the developers who were working with such unusual constraints (by today's standards), and a strong desire to help preservationist efforts. I want to help ensure that future curious minds have a wide variety of tools available to play older software once the hardware becomes difficult to acquire. Well, that's the philosophical answer anyway; the more practical answer is that I simply enjoy the challenge, and nostalgia is a great spark to motivate the effort.

jdc · on Dec 16, 2019

For me it's the appeal of potentially mastering the _entire_ machine.

ddingus · on Dec 16, 2019

Fully understanding is one reason.

Another is to preserve experiences. There is a lot of great retro software out there.

Yet another is development. Many older machines enjoy an active development scene. Emulation can help this along.

Fun.

unoti · on Dec 16, 2019

> Is it a Zachtronics-style technical challenge?

Actually, I did a lot of assembly coding on 6809 and 6502 back in the day, and also played a lot of the Zachtronics games. It's not really the same at all! Zachtronics games are all about challenges that are quite a bit more ridiculous than you have in real life with small machines. In a typical Zachtronics assembly programming game, you spend a lot of time trying to figure out how to do pretty trivial things using a ludicrously small amount of resources, like maybe you've got two reigsters to work with. In a real assembly project, you'll have a few kilobytes to work with, or at a bare minimum 512 bytes. The smallest system I worked with had 512 bytes of RAM and 8kb of ROM. The Zachtronics games were fun and challenging, but they were more like puzzles rather than real programming. Puzzling out how to get the assigned task done with ridiculous artificial limitations.

Contrast that to an example of a real-life non-trivial assembly thing I did on a 6502. I needed to respond to remote control commands from an infrared remote similar to what you have with a TV. For input, I'd get interrupts connected to an IO-pin that would fluctuate up and down when receiving IR input. The task involved "recording" commands from a remote, then later monitoring the input and seeing if it matched any of the pre-recorded patterns. You'd count the amount of time between successive pulses, using a real-time clock that was connected to some other IO pins. It's not rocket science, but it's fun and challenging figuring out how to get all that going with a limited amount of memory. To me that kind of thing is a lot more fun than figuring out how to do the various challenges in Zachtronics games, which mostly center around taking something that'd be trivial do if you had a couple hundred bytes of memory to work with, but they are forcing you to do it with 2 bytes of memory. So you make little side constructs that act as memory, or other side constructs that somehow use message queues as memory, or whatever.

The appeal of retro emulators I think is that it's a system that's small and limited enough that you have more of a chance of understanding it from end to end. There's a feeling of comfort or coolness in knowing that you know exactly what the system is up to on every cycle. With larger computers, the hard drive starts blinking and God only knows what it's doing, or why. There's this feeling on small embedded or emulated systems that you are really in control of what's happening. On larger, more complex systems, the software I write is just one little part of the larger who-knows-what going on inside the machine.

For me another aspect of it is how much I'm getting done with so little code. There's something very satisfying about knowing the entire binary package of assembled code I've generated, and being able to point to any byte of that and understand what it is.

It's the exact opposite of what's going on in my other terminal, where as I type it's creating a new react app and downloading untold megabytes of who-knows-what with countless dependencies and abstractions. In retro systems, if the system is capable of doing something, it's probably because you personally made it able to do that thing.

ratboy666 · on Dec 16, 2019

Back in the late 70s, I did the bring-up of a Z80 system. I coded an EPROM (2708) with 1KB. Wrote a very small bootstrap monitor that did not require RAM to be operational. Drove a serial terminal (autobaud on CR entry). Display/Set memory and Go to address. Just enough to enter a memory test and a disk bootstrap, which was then added to the EPROM. So, there are examples of "zero RAM" programs. With no stack, use a single level of subroutine - LD DE,ret addr, LD HL,subroutine, JMP (HL), and at the end of the subroutine: LD HL,DE and JMP (HL) did it. The EPROM had to work on both 8080 and Z80 CPUs. The code had to be small, because it was manually entered into an EPROM programming device. Fairly common stuff back then... I enjoyed this kind of work. No debugger, no safety net. The code had to be simple, understandable, and work. However, these days I still don't use debuggers (much). This may be slowing me down; not sure about that.

djmips · on Dec 17, 2019

Atari 2600 has 128 bytes of RAM and a crazy video system. Feels about as hard as a Zachtronics game to me.

unoti · on Dec 17, 2019

Zachtronics games never give you 128 bytes! You get the 2 registers in your processor. They don’t give you ram!

Swivekth18 · on Dec 16, 2019

I like how it's pretty much standard to expect emulators to have a WASM port running in the browser at full performance at this point. Atwood's law is 12 years old but still going strong.

djmips · on Dec 17, 2019

I feel like with this approach and you are getting to a better place to convert the design into an HDL and running on an FPGA.

boomlinde · on Dec 16, 2019

This is really cool. I like the approach of exposing the pins as its API. Super flexible.