MetalNES: Transistor Level NES Simulation

pubby · on Feb 26, 2022

MetalNES is by the author of NESticle. Like Visual2C02 and Visual2A03, presumably it runs at such a slow speed that games are unplayable. Like, it may have taken hours to reach the SMB menu screen, but I would love to be corrected :)

ReactiveJelly · on Feb 26, 2022

0.5% of realtime: https://news.ycombinator.com/item?id=30482633

cbozeman · on Feb 26, 2022

Awesome... so we just need a computer that's about 200x faster than what we have available today. So 20 years from now.

xattt · on Feb 27, 2022

Maybe ReactOS will reach feature parity with Windows 7 by that point.

phkahler · on Feb 27, 2022

GPU should be able to do digital logic simulation a lot fast no?

taneq · on Feb 27, 2022

And the next step after that to improve performance would be an FPGA, then an ASIC... ;)

Underphil · on Feb 27, 2022

Then maybe we should just simulate the behaviour of the system in code... wait a minute.

taneq · on Feb 27, 2022

Now you're getting it!

kevingadd · on Feb 27, 2022

For a while now there have been companies using GPUs to do parallel digital logic simulation for things like automated testing

Taywee · on Feb 27, 2022

> MetalNES is by the author of NESticle

Thanks Shitman!

dr_dshiv · on Feb 26, 2022

Absolutely sick. Chip community is so rad.

gary_0 · on Feb 26, 2022

Reminds me of GateBoy, which was posted a while back: https://news.ycombinator.com/item?id=28396927, https://github.com/aappleby/MetroBoy

GateBoy runs surprisingly fast for a gate-level simulation.

quanto · on Feb 27, 2022

What does it mean to be a transistor-level simulation in this context?

I used to work with circuit simulators to design circuits (discrete and integrated) and there are many transistor models (mathematical models) depending on the use case. In spirit, not too unlike DFT for Schrodinger equations in quantum mechanics.

BHSPitMonkey · on Feb 27, 2022

Most game emulators are looking at the instructions from a ROM file and then implementing those CPU instructions in code. Usually with those approaches there are inconsistencies due to the original game console's quirks (or bugs/flaws) that show up during gameplay, making the emulation not completely true to the original hardware.

In this context it means emulating "perfectly" by actually simulating the hardware in full instead of taking the aforementioned approach.

quanto · on Feb 27, 2022

So, in this context, transistor-level simulation does not mean actually simulating transistors (voltages, currents, doping, etc)?

Volker_W · on March 1, 2022

I would guess "transistor-level simulation" means you simulate a NAND transistor by doing something like.

bool c = !(a && b)

JoeAltmaier · on March 1, 2022

Maybe saying 'gate-level' would be more accurate

SLWW · on Feb 27, 2022

I love this mans work; even if getting it to run at 100% might see me an old man, if not me, then my children.

theogravity · on Feb 26, 2022

Would be really cool to see a video of this in action.

danaris · on Feb 26, 2022

I don't really pretend to understand much of what I'm seeing, but I've recorded a short video of it running and uploaded it to my server [0].

From what I can tell, this 3-minute video shows the process of generating 2 and a bit frames.

[0] http://topazgryphon.org/metalNES.mp4

Scuds · on Feb 26, 2022

go lil' electron beam, go!

9000 cycles/s / 1.79 MHz = 0.5% of realtime. The only time you ever see rendering a line at a time is single stepping through an emulator.

I wonder if this is how FPGA developers working in VHDL/Verilog see the world? https://www.patreon.com/posts/37038491 here's a short clip of the VRAM of a PSX rendering about half of a completed frame of Final Fantasy 7's battle screen.

loeg · on Feb 26, 2022

~9000 cycles/sec (displayed in the video), wow! For comparison, the NES ran at 1.8 MHz, or about 200x faster. From the number of frames drawn and time, I'm guessing this simulator is actually averaging much lower than 9 kHz.

71a54xd · on Feb 26, 2022

Super curious about your personal server setup, mostly how you're serving content directly to the open web?

TaylorAlexander · on Feb 27, 2022

I mean you can always use a server on digital ocean or something. That’s how I serve web sites. It’s still just a Linux machine and you can throw whatever content you want on it.

hedgewitch · on Feb 26, 2022

That it would. I don't have access to a Mac or time to get it building elsewhere, so a video would be great.

aappleby · on Feb 27, 2022

Niiiice.

-Austin, GateBoy guy.

DarylZero · on Feb 26, 2022

is this allowed for speedruns

teaearlgraycold · on Feb 27, 2022

I think it'd be considered TAS if you report your time by frame count. Slowing down an emulator is a TAS.

danaris · on Feb 26, 2022

Allowed? Maybe, maybe not.

Helpful? Extremely not.

I just ran this on my high-end M1 Max, and it rendered 2 frames in roughly 3 minutes.

Hardware-accurate emulation is, generally speaking, not fast emulation.

rl3 · on Feb 26, 2022

Hardware-accurate emulation in software is typically not fast. FPGAs however can achieve low-level emulation with a high degree of hardware accuracy:

https://github.com/MiSTer-devel/Main_MiSTer/wiki

stormbrew · on Feb 27, 2022

I mean, aside from the question of whether fpgas can truly accurately simulate everything at the transistor level, the truth is that most fpga emulation cores for this sort of thing don't. They are, afaik, still mostly 'just' cycle-level accurate emulations built off the same reverse engineering that accurate software emulators are. They can do it more efficiently for sure, but that doesn't mean it's worth it for them to go much farther.

Like, for an extreme example there's no world in which the in-development PSX mister core, running on a de10 nano, is particularly more accurate than a software emulator. The chips involved have too many transistors to completely simulate. But this will be true to some degree for everything above some complexity threshold. Maybe the Atari 2600 core is perfectly accurate.

I wouldn't be surprised if the most accurate software emulators are more accurate than the most accurate fpga cores, especially for anything newer than the NES, just because that's where most of the original work involved happens.

This idea that FPGAs are inherently more accurate seems to be rooted in marketing FUD from a company called Analogue.

philistine · on Feb 27, 2022

I’ll 100% give you the marketing spin, but FPGAs provide an element of accuracy with frame delay. If you plug in a controller and use a CRT, there no more lost frames than the original hardware.

That’s the accuracy that’s above what an emulator on our Babel’s tower of a stack on Windows.

stormbrew · on Feb 27, 2022

That's not because of the fpga though. You can also plug a CRT into a PC. You can even chase the beam if you're willing to go to a low enough level. People generally don't but this is a limitation born advantage. There's no OS (sort of) to get in your way and why would you put one there if it's not helping you.

(I say sort of because there usually is a system that could be described as an OS running somewhere on the fpga board, to load the fpga bitstreams, if nothing else, but also sometimes to supplement the fpga's abilities)

philistine · on March 9, 2022

You will find projects that have zero-input lag with FPGA based emulators right now. You will not find PC-based emulators with zero-input lag at all. That this is the reality has no effect on whether a PC could do it or not.

monocasa · on Feb 26, 2022

The 6502 unfortunately relies on some effects that can't be described in FPGA synthesizable logic. You might get a lot of speed up over the extremely simplified spice style simulation you have to do that still might get you to real time, but it's not the slam dunk you might think.

russdill · on Feb 26, 2022

Sounds interesting. Any citations where I can read more about those effects?

monocasa · on Feb 26, 2022

You can sort of divine them from how the undocumented instructions work.

https://www.pagetable.com/?p=39

It more or less falls under true classic highz buses with multiple drivers.

You have three options I can see

1) treat it as a spice simulation that aggressively optmizes to synthesizable logic most places other than the problem parts.

2) hand replace the problem parts with synthesizable logic sorta like how Nvidia replaces graphics shaders with had written replacements in their drivers

3) just throw the whole thing into a spice simulation

tenebrisalietum · on Feb 27, 2022

I think the decrement relies on open bus behavior I think - the 6502 uses it to cheat and set a value to FF in the same cycle which is then sent through the same path as increment or add, effectively adding -1 to a value.

comex · on Feb 26, 2022

Out of curiosity, what effects are you talking about?

monocasa · on Feb 26, 2022

The internal buses are basically true highz buses with multiple drivers and latches hanging off of them.

russdill · on Feb 26, 2022

That shouldn't be a problem to implement on an FPGA with a tiny bit of extra logic.

monocasa · on Feb 27, 2022

Totally agreed, but you can't just throw the netlist at a fpga and expect the synthesizer to be happy is all I'm saying.

taneq · on Feb 27, 2022

Is it still really emulation at that stage? Or is it more like a reimplementation?

lanewinfield · on Feb 27, 2022

yes, they're called slowruns

beprogrammed · on Feb 27, 2022

A beautiful small codebase from which to run little experiments.

aliswe · on Feb 26, 2022

iirc this is what they said would never be possible with any computer, whatever the specs (at real life speed).

kamray23 · on Feb 27, 2022

It won't run at real-time speeds. Way, wayy slower. You could put this on a FPGA though, if you port it to one. Then it would run in real-time. I doubt it will run exactly as it is, with the limitations an FPGA provides. The logic should be fine though.

cubefox · on Feb 27, 2022

Not to be picky here, but silicon is not a metal, but a metalloid. MetalloidNES sounds great in my opinion.

Quindecillion · on Feb 27, 2022

I would get a little kick out of playing Metroid on that.