Hacker News new | past | comments | ask | show | jobs | submit login

I've been thinking of writing a 6502 emulator. My plan was to start with cycle-stepping. I'm purposefully not looking at other emulators or code :)

I do have something I've been wondering, especially since I haven't yet written any complicated code yet for this plan. To emulate a 1mhz clock on a modern CPU, do you have to do things like check time deltas to "limit" the emulator speed? (If this was address in the article, know that I stopped reading when I saw code.. see above purpose)




Most (game console) emulators don't try to limit the speed of actual CPU emulation directly, since that makes the code more complex, and makes testing slower, etc. Instead, the only time the emulated system speed is when it has to talk to humans - render a frame of video, generate audio samples, etc. So if you have a 60Hz monitor (as most people do), and your emulated CPU takes 180,000 cycles (or whatever) to render a video frame, then your emulator just runs at top speed until a video frame is available, waits for the "vertical synchronisation" signal from your graphics API, draws the frame with the graphics API, and repeats.

Because of the delay introduced by vertical sync, your emulated CPU's average speed will be correct, even though the instantaneous speed is either "way too fast" or "zero".


Adding on to this, in the context of a typical emulator, it's important that the other hardware be emulated mostly in lockstep with the CPU. Each time the 6502 reads or writes memory, the hardware registers it is talking to should be "caught up" so that things like video and audio signal timings work correctly. There are a bunch of different approaches to this depending on how accurate you want things to be, but the most straightforward is to just clock the components directly, one after the other. I have a function in RusticNES that clocks the CPU + APU, another that clocks the PPU, and since the PPU runs 3x as fast, a global clock function that looks like:

    pub fn cycle(&mut self) {
        cycle_cpu::run_one_clock(self);
        self.master_clock = self.master_clock + 12;
        // Three PPU clocks per every 1 CPU clock
        self.ppu.clock(&mut *self.mapper);
        self.ppu.clock(&mut *self.mapper);
        self.ppu.clock(&mut *self.mapper);
        self.apu.clock_apu(&mut *self.mapper);
    }
This way, all of the emulated chips remain mostly synchronized with each other, but I don't bother to synchronize with the host until either the PPU is ready to draw a frame, or the APU has filled its audio buffer.

This approach uses the CPU cycle as the boundary, but there are other approaches to consider. You could run an entire opcode and then "catch up" the rest of the hardware (maybe easier to manage CPU state), or run entire scanlines at once. (Maybe easier to manage PPU state at first? The NES's PPU is well understood, but very tricky to emulate correctly.)


Yes, you’ll most likely need to limit the speed, if you want it to run at 1mhz.

I highly recommend a test-driven approach to your 6502 emulation, using Klaus Dormann’s amazing tests.

Since you’re aiming for cycle-stepping from the outset (as did I), you can also do fun things like run the same test suite on your emulator and on visual6502’s gate-level emulation, and verify that you generate the exact same sequence of reads and writes. This might involve porting perfect6502 from C to your language of choice, but, hey, you’re already in it for the fun, right?

Good luck, and welcome :-)

(zellyn@ on most services if you need help)


Test roms are super useful... I've written about 1/3rd of a NES emulator, the first part being the CPU, and nestest.rom was pretty helpful for making that. (Although, nestest wasn't really looking for cycle accurate read and write).


I recently wrote a 6502 emulator in Rust as part of an Atari 800 emulator.

My first pass at the CPU emulation involved not even considering timing, just emulating each instruction to update memory and register states to spec. In hindsight I think this was a good approach, as there's enough complexity involved getting all the operations and addressing modes implemented without worrying about timing. Indexed indirect versus indirect indexed, decimal mode, etc... I learned a lot and that learning definitely has guided how I am dealing with approaching cycle accuracy.

As another mentioned, Klaus Dormann's test suite is an amazing resource - getting all those tests to pass was a battle, and prompted me to write a disassembler and other tools to help guide my understanding of what the tests were doing. It was a great feeling once all the tests passed.

I'm still working on a microcode version to handle cycle timing. I've started an approach where I emulate the system bus. Memory has its own time slice where it reads or writes to the data bus each tick (2x CPU speed should suffice to mimic sub-tick memory operations). It's kind of stupidly inefficient and unnecessary, but I like the idea that each component knows nothing of each other directly, just the pin states.


For a complete system emulator which renders video and audio, usually you compute how many clock cycles you need to run for one host-system-frame (e.g. for 60 fps host system frame rate and a 1 MHz emulated system that would be 1000000/60=16667 clock cycles per frame. Then emulate those clock cycles unthrottled, render the emulator output, and then simply spend the rest of the host system frame doing nothing (e.g. wait for vsync).


That really depends on the system. A large portion of the software for C64 for example depends on a cycle accurate simulation of the interaction between the 6510 and the VIC graphics chip at least, because which values are present in certain registers when the VIC renders even specific portion of a single scan line matters.


In this case, I believe the parent post also means that you run the simulated co-processors (graphics/sound/etc) for a frame's worth of cycles, too. Essentially, they're just saying that when you've got a frame ready, wait for a vsync before carrying on.


If you mean running it all in lock-step cycle by cycle, and then pausing, then, yes that is possible.

But you need to run the full system simulated cycle by cycle in lock-step, not just the CPU.


Yes, that's how cycle-accurate emulators usually work :)

Each CPU cycle is "accompanied" by one "system cycle" where the video and audio emulation, and the other chips in the system are "ticked forward" by one clock cycle.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: