> If we latch too late, after cycle 40, Full Throttle Racing will glitch. But if we guess a value too soon, say on cycle 0, that might be too early, and it might cause issues in another game.
> We would have to retest the entire SNES library to be sure such a change did not cause any regressions, which just isn't practical with a library of 3,500+ game titles.
Running automated regression-tests against a library of known-accurate titles actually sounds extremely practical. This sounds like standard software-engineering practice to me.
I'm surprised that a test-harness like this doesn't already exist by now given the relatively mature state of SNES emulation. It seems like it would be an extremely helpful tool to benchmark accuracy of new or existing emulators, prevent regressions from advanced optimizations, etc.
Automated testing could transform the 'blind guessing' approach on tweaking PPU timings from a game of 'whack a mole' to a structured, iterative linear algebra solver. Which could end up being quite a bit more feasible than acquiring 100x die scans of the chip and tracing the logic from the hardware directly.
Screwtape built exactly such a regression-testing harness for bsnes already.
We are still at a point where minor changes can result in dozens of "change detected" triggers, of which we are not sure how to determine whether they represent corrections or regressions (it's usually not at all obvious.)
I've also been weary of building a large "accuracy testing suite" for the SNES, on the fear that if I come to the wrong conclusion on something, others will emulate that inaccuracy and won't question it because of deference to my code.
Tests also have the unfortunate side-effect of being gamified (example: http://tasvideos.org/EmulatorResources/NESAccuracyTests.html), which is a very bad thing. Failing a test because we don't know the answer is infinitely better than passing the test using am unverified guess. But when people reduce a comprehensive and complex set of tests into a percentage score, you get a very skewed picture of what actually matters. (think of how browsers jumped through hoops to pass the Acid CSS tests back in the day.)
It is however something that's important, and hopefully I'll have the time one day to work on this more.
I don't know enough about SNES hardware or the level of in system debug, but could one generate a hardware test jig that was able to record player traces that could be re-run in the emulator and have the screen grabs compared? Given the system state, random number generator and the series of up/down/a/b one should be able to recreate the same game evolution and video output from the emulator?
Then you can form a reference set of captures, real hardware and the emulator. Screen to screen emulation with diffs could then be an accurate tool to measure against actual hardware and the accuracy gap between the two. No? Controlling the random number generator sounds like it might be key. And some player traces might be really fragile in the input timing search space.
> Running automated regression-tests against a library of known-accurate titles actually sounds extremely practical.
Dolphin (the GameCube/Wii emulator) already does this, actually. They make dumps of various states of the graphics pipeline across multiple revisions of the source and generate image-based diffs of the results.
You can't just start it up and run it for 5 seconds, though. A given game might be a hundred hours long and only exercise certain functionality at minutes 5, 37 and 60. It's non-trivial to automate the playback of the whole package and then verify the whole stream against a known good. So you'd need to identify the key points in each game that are important, and replay those with save states and hope that the save states aren't unintentionally encoding corrupted data that came from an existing bug in the emulator you used to run them.
Also, none of that automation you just did will work w/the actual hardware.
> A given game might be a hundred hours long and only exercise certain functionality at minutes 5, 37 and 60.
An infamous bug of that variety exists in the SNES game "Speedy Gonzales - Los Gatos Bandidos", where one specific element in one level would freeze the game on most emulators due to a really subtle hardware emulation issue.
In no way will TASBot ever be automated. It can require hours (and sometimes dozens of hours) to get a single game to sync on real hardware, and that’s if a cycle-accurate emulator was used. If not, it’s near impossible to playback a TAS.
Not to mention only a very small subset of SNES TASes are done on cycle-accurate emulators.
The input stream for an SNES isn't very complicated - just the state of a few buttons at every cycle, and the transition edges are quite sparse relative to the clock frequency. You could easily record human play sessions and replay them.
Not only is there unitialized RAM and I/O registers, and some analog effects, the really big elephant in the room is that the system has two oscillators. A ~21MHz CPU/PPU crystal oscillator, and a ~24MHz SMP/DSP ceramic oscillator.
Given that not only do these exact frequencies change between systems due to margins of error on clocks, they also change slightly as the system runs (and gets warmer, for example.)
Every SNES game has sound routines that synchronize the CPU to the SMP.
It wouldn't be possible to make a literal 1:1 play log unless you a) ran a custom register and memory initialization at system startup, and b) replaced the two oscillators with a much faster single oscillator and then used a clock dividier to drive both the CPU and APU off of it.
(You can TAS certain SNES games anyway, of course. It really depends on how the game is programmed to react when the exact CPU<>SMP communications change. If it seeds a random number generator based around the PPU H/V counters that are polled after a CPU<>SMP sync for example, forget about it.)
> Given that not only do these exact frequencies change between systems due to margins of error on clocks, they also change slightly as the system runs (and gets warmer, for example.)
Maybe it'd be possible to modify some boards to use a CPLD (or an FPGA) for synthesizing those clocks from a common source. This could eliminate most (all?) uncertainty.
There's already a community of people that does this called tool assisted speed runs (or TAS). These players already record their inputs for replay and are comfortable tuning them to consistently hit edge cases.
I wonder how much of a lift it would be to take a bunch of TASes and turn them into regression tests...
The input steam is half the story, you should also match the whole observable state of the machine for an accurate emulation, because some other test might actually depend on it.
Triggered on all the clocks and decide which clock matters every case by hand.
To get that, you would need a set of hardware debuggers plugged into the bus and chips. And a lot of inside knowledge to decide if a deviance is random enough to not have to be emulated.
I don't know if SNES dev kits still exist but if they do they might prove useful. Such kits sometimes have additional debugging facilities because they're also used for validation (or they share some elements). Not sure if anyone was doing that sort of thing as early as SNES though.
> more feasible than acquiring 100x die scans of the chip
Last time I had to get SEM micrographs in resolutions between 2K and 10K, we paid 475€ for 2 hours of work.
While I wouldn't want to pay this out of my pocket and gift it to the world, I don't see how it's less feasible to make better die scans than to build the testing equipment (PPU breakoutboard) needed for automated testing.
Especially if the current owner of the already decapped dies were willing to lend them to someone else willing to do the footwork.
I think it should be easy enough to find 19 people willing to donate 25€ for this kinda effort.
On the other hand, the dies would have to be plasma coated with Gold/Platinum, which might be bad, I'm not an expert. So, perhaps you were refering to this when you said higher-res scans are less feasable than making automatic testing happen. ¯\_(ツ)_/¯
One of the later comments on the article offers custom PCB design to make a test harness for the PPUs. It seems like a perfect partner for a manually controllable clock for the whole SNES, e.g. as Ben Eater's 6502 demos use. If you can freeze the clock at will, you can more easily inspect the data and address lines. But this wouldn't work if the PPU latch timing depends on analog effects or the PPUs have an internal clock.
Combine the test harness with a USB logic analyzer and signal generator, and borrow some concepts from fuzzers like AFL, and it should be possible to automatically identify critical phases of the PPU operations.
Question: do the PPUs themselves generate analog video, or is there a separate DAC whose bus could be tapped to figure out what color the PPU is producing?
The PPUs use static logic and an external 21MHz oscillator (which also powers the CPU), so they're perfect for single-stepping in isolation from the rest of the SNES.
The PPUs themselves output only analog RGB, there's an analog pin for each color channel rather than a digital pin for each color bit.
On some level, I don't doubt it's possible to build a test harness that can automate things based on the analog RGB values with some fuzzy matching, but if we are going that far, it just makes more sense to snoop the bus traffic directly. That would reveal a lot more information and in digital form.
> it just makes more sense to snoop the bus traffic directly
It seems the bus traffic won't tell you everything. Just because the PPU reads a specific piece of a sprite from RAM, doesn't tell you which pixels of a scanline it'll get rendered on.
If I were approaching this problem, I would observe that the 'analog' output of a SNES is probably a bunch of discrete values with discrete timing intervals, and therefore can be perfectly captured with no error, especially if you can slow the clock down to eliminate signal reflections etc.
I would then make a test harness consisting of a PPU chip, the ability to read and write registers (either with the rest of the SNES, or some other microcontroller, whichever is easier) and the ability to capture complete frame outputs from the 'analog' outputs without error.
I would then do the same with an emulator, and run code on both which pokes random registers at random (clock accurate) times, and whenever a difference in output appears, debug it.
> It seems the bus traffic won't tell you everything.
Oh it absolutely won't, it's only the external state. It's just another important piece of the puzzle that I think would help us out a lot.
Solving this would most likely require reverse-engineering the netlists from a decapped die scan.
I've since (post article writing) learned that a German forum found a hidden test mode that outputs digital RGB values, but it has a lot of issues on various edge cases, so while it's not a perfect solution, it's definitely a huge help along the way if we can get a testing rig set up around an appropriately modded SNES.
I think you could use a NN to attempt to replicate the circuit under test, perhaps generating verilog, programming a chip and then running it against a set of autogenerated test vectors.
A generalized system for duplicating the functionality of a boolean circuit. If can use its own axioms and generalities, once it "understands" flip-flops and nand gates, can start to directly synthesize more complex logic.
That sounds like a great idea, but you today we can't even synthesize arbitrary VHDL/Verilog code efficiently. The author needs an understanding of hardware architecture to write something that synthesizes versus just a test harness not intended to be synthesized.
I love that there are people whose hobby is preserving things. SNES was an important part of so many 80s and 90s kids’ childhoods! But how many of them have digital logic experience, especially at the reverse engineering level? And actually want to do it outside of compensated labor?
> how many of them have digital logic experience, especially at the reverse engineering level?
Like programming and reverse-engineering, I taught myself the basics of electrical engineering.
The Neo Geo Pocket's SoC is basically nothing but raw logic circuits for explanations of how things work. Imagine 200 pages of diagrams like this: https://i.imgur.com/2LZ2UWY.png
(oh and as a bonus, the diagrams often contain errors.)
If you're only looking to create a basic emulator, you can mostly get away without knowing all of this stuff by reading tech docs and the source code of other emulators. But if you really want to get low-level and get things clock-cycle accurate, or want to work on a system that's not been well-emulated to date, digital logic pretty much a hard requirement.
> And actually want to do it outside of compensated labor?
From 2004 - 2018 for the SNES, it was mostly only me. As of today, three people.
In general only one person is really needed. I couldn't say if someone else would have taken my place had I not been around. I suspect so, given how popular the SNES continues to be. Whether they would have done a better job of it than I have is a question that often keeps me up at night.
In any case, the answer is certainly "not nearly enough."
John D McMaster seems to have previously done 100x scans for some chips if I'm reading the site right e.g. the latest has a link for 20x and 100x https://siliconpr0n.org/map/generalplus/gplb52a24a-049a/. Is it as simple as seeing if he'll rescan the PPUs at 100x or are these not the quality that I was thinking?
He has a tremendous backlog of work to do at the moment.
I would prefer to not bother him unless it was a last resort, but that is indeed an option if my breadboard idea does not work out and no others are up to the task of decapping the PPUs.
"If we could recruit a talented electrical engineer, I believe that a custom PPU breakout board could be designed that would aid us substantially in reverse engineering..."
Perhaps Andrew "Bunnie" Huang -- would be someone to go to, for this?:
There are 52 registers comprising approximately 100 settings, there are probably close to twice that number if you include the internal register latches (some of which we know about, some of which we don't know exist yet.) The SNES PPUs are heavily based around combinatorial logic. Changing the timing of one variable can alter the timing needed by other variables. Think of each setting as potentially doubling the number of tests needed (at least when it comes to a blind, brute-force approach.) There are more combinations of settings and pixel generation patterns than atoms in the universe.
The way we've gotten as far as we have is that not every combination needs to be tested. It is a ballpark estimate on my part, but I'd estimate us needing a few million tests to have a high degree of confidence our emulation is correct.
If we had to make those tests into visual patterns on a screen that had to be checked by eye each time, it would be overwhelming even if we only needed a few thousand tests made.
I can test code on a live SNES extremely easily with my 21fx board ( https://github.com/defparam/21FX ), but it's still far too much to do this by hand.
byuu I'm curious, is there some magical piece of documentation/schematic that would break open an entirely untapped area in your SNES research? Or are you at the point the hardware is almost entirely transparent, and it's just a matter of chipping away at the edge cases, or halo projects like a replacement clock for the dual oscillators?
It'd be interesting if some Nintendo engineer for the early 90s has a box in his garage with documents that could change the whole course of your current work.
"A flash emulator or flash memory emulator is a tool that is used to temporarily replace flash memory or ROM chips in an embedded device for the purpose of debugging embedded software. Such tools contain Dual-ported RAM, one port of which is connected to a target system (i.e. system, that is being debugged), and second is connected to a host (i.e. PC, which runs debugger)."
> We would have to retest the entire SNES library to be sure such a change did not cause any regressions, which just isn't practical with a library of 3,500+ game titles.
Running automated regression-tests against a library of known-accurate titles actually sounds extremely practical. This sounds like standard software-engineering practice to me.
I'm surprised that a test-harness like this doesn't already exist by now given the relatively mature state of SNES emulation. It seems like it would be an extremely helpful tool to benchmark accuracy of new or existing emulators, prevent regressions from advanced optimizations, etc.
Automated testing could transform the 'blind guessing' approach on tweaking PPU timings from a game of 'whack a mole' to a structured, iterative linear algebra solver. Which could end up being quite a bit more feasible than acquiring 100x die scans of the chip and tracing the logic from the hardware directly.