Hacker News new | past | comments | ask | show | jobs | submit login
Nintendo 64 Architecture – A Practical Analysis (copetti.org)
330 points by bottle2 on May 18, 2020 | hide | past | favorite | 72 comments



I gave a talk not too long ago about running Rust on a Nintendo 64, with the slide deck written in Rust and running on an N64.

https://twitter.com/DebugSteven/status/1054903603985559553

So I guess what I'm saying is that I have pretty hands on knowledge with the system and would be happy to answer any questions I can.

One thing I'll throw out there, is that one of the biggest limitations of the N64 (its 4KB texture memory) gets called a texture cache a lot, but that's a misnomer. It's a manually managed piece of memory, and (IMO) the system would have been much better off if it were actually a cache rather than having to load an entire texture in regardless of what was being sampled. Nowhere I've seen in Nintendo's literature do they call it a cache either. The crazy hacks that Rare did to subdivide their geometry on texture boundaries wouldn't be necessary for instance. I'd maybe even be into a 2KB cache over a 4KB chunk of manually managed memory.

One other aside is that I think the system still has tons of unlocked potential. So much of unlocking it's power seems to be centered around memory bank utilization. Switching which page of DRAM within a bank is expensive in terms of latency, but it seems like if you allocate your memory in 1MB bank chunks you can get around a lot of the limitations of the systems having the slow memory that developers complained about at the time. I don't blame developers at the time, they were coming from SNES where it was single cycle access to RAM, to the N64 that had a very deep, very modern memory hierarchy and what all that means for your code. The industry as a whole didn't really catch on until about halfway through the PS2's development cycle. But applying some of those PS2 techniques back, the system really purrs when you have dedicate a 1MB bank to each streaming source or destination. I can't wait to see what crazy stuff happens when the demoscene folk really start to get their hands dirty with it.


> The crazy hacks that Rare did to subdivide their geometry on texture boundaries wouldn't be necessary for instance.

I would love to know more about this. Is their texture format tomfoolery written up somewhere?


I don't think anything is written up, but you can see it if you put Project64 into wireframe mode. The most clear I've seen it is in the intro cut scene of Conker's Bad Fur Day where the camera's slowly backing up from Conker's throne.

So TMEM is only 4k. If you want mippmapping, that eats half of it, down to 2k practically. That leaves you with enough room for 1 32x32x16BPP texture at most. So I think what they did in some cases was to take a mesh with a larger texture, and run it through a processor to tesselate the mesh on smaller texture block boundaries in UV space, so they can render each tile of geometry with the same texture block at once, then swap to the next subtexture and render all it's geometry. That'd give you an apparent larger texture than you could fit into TMEM, and is one reason (of many) why their games look so good. They also might not have had tooling for that and just brute forced it by hand, I can't tell just from looking at the wireframe.


this is relevant but not same system https://www.youtube.com/watch?v=izxXGuVL21o


Amazing. Is there somewhere I can watch this talk?


It wasn't recorded unfortunately.

The source behind the text of the most recent slides starts here if you'd like to read it: https://github.com/monocasa/n64-slides-apr/blob/caae25f397c5...

I should probably save off pictures and put it into a pdf or something where it could be more easily accessible.


I’d certainly watch it if you made a video by giving the talk again, if you still have your notes :)


If you made a video of the slides with a recording of your voice over it and posted it on Youtube, I’m sure you’d get a great response


Would it be possible to provide a rom Image that would run in an emulator?


I uploaded the build into a github release on the repo just now.

https://github.com/monocasa/n64-slides-apr/releases/tag/v0.1...

I've only tested it with cen64 and real hardware with a 64drive.

A is forward through the slides, B is back.


That's certainly an interesting way to organize your slides!


One of my favourite little easter eggs in Goldeneye is that during the Silo mission, two of the satellite components you have to steal are the N64's RSP and RDP:

https://twitter.com/007goldeneye25/status/109829415491907174...


Nintendo also included their own Easter eggs:

The rabbit you need to catch beneath the castle in Super Mario 64 is named "MIPS". :)

https://www.mariowiki.com/MIPS


> In the remake Super Mario 64 DS, MIPS does not make a reappearance, instead being replaced by the rabbits scattered throughout the castle for each character to find.

Aw, they should have brought it back and renamed it ARM…


For someone just starting to learn about computer architecture at a low level (rather embarrassing for someone who's been in the industry for over a decade), this is a really interesting read, and it's helping concepts like pipelining and caching to gel.



Hardware used to be so exotic!


> Hardware used to be so exotic!

It might circle back around to being exotic again, if FPGAs take off and more work is done on highly-specialized task-specific hardware, as opposed to building the fastest general-purpose chips you can and beating problems to death with sheer speed.


It might not be visible from a regular programmers' perspective (i.e. You don't need to read https://people.freebsd.org/~lstewart/articles/cpumemory.pdf to write a website) but FPGAs are absoutely everywhere already, e.g. Xilinx (the Intel of the FPGA market... as opposed to Intel who are the AMD of the FPGA market) have Market Cap of $21Bn which is quite a lot for a company that is both fairly obscure to investors and doesn't sell directly to consumers.

The issue with this is that FPGAs are expensive to buy as a hobbyist, expensive to buy at small quantities unless you can negotiate with Avnet or similar, and involve using software from the past if you want to program them.

Beyond FPGAs, ASICs are fairly common in very high margin/high volume electronics.


Note that Intel bought the AMD of the FPGA market to enter it ;)


Is this a thing that's happening? I remember how cool I thought FPGAs were from my college CE classes. Seems intuitive to me that you would want to specialize the hardware once you have a process figured out. The fact that FPGAs are upgradable makes it even more of a no brainer to me.


Well there's a cost benefit; Cheaper FPGAs may not necessarily be as performant as others for some tasks, there's still a gate budget to deal with, and I personally am not sure whether the FPGA world is one where you have full control over what you do with the chip when you sell it in a commercial product.

I know Gigabyte back in the mid 2000s made a PCIE Card that let you use DDR as a disk drive once upon a time; for the original they actually used a Xilinx Spartan FPGA since it was a smaller run.


FPGAs can sometimes be one-time-programmable using anti-fuses that basically disables the JTAG interface or can be set to disable the interface if the FPGA detects attempted tampering. Most of the time, an FPGA is going to be set to OTP to prevent competitors from stealing source code for applications where upgrading the firmware via JTAG is not necessary.

The FPGA also can have a massive unique key that allows the designer to create a whitelist algorithm that only lets certain unique IDs run that firmware. Other options involve setting a time limit for how long the firmware will run, disabling certain features, or totally bricking that FPGA forever. Spartans have this feature but it would still allow for someone to build a new design that doesn't check the device ID.

Additionally, the bitstream can be encrypted so that if a field update is necessary or the firmware is stored in a stored in a separate flash chip, someone can't reverse engineer it.

Overall, the more you pay, the more security features there are available. An example secure design would disable JTAG pins permanently and have a microprocessor inside that would handle new updates. The processor would authenticate any new encrypted firmware before programming the internal flash.


> Is this a thing that's happening?

Sure. The Afterburner Card that Apple is selling to accelerate ProRes decoding on the newest Mac Pro is an FPGA. They’ve opened it up to third parties like Red as well iirc.


Xilinx even has its Zynq line with combines a ARM Cortex processors with programmable logic so you can partition the design to what each portion is best suited.


I have seen fpgas a bunch in low volume hardware that needs something better than a microcontroller.


It still is, specially when we spend time playing with what our GPUs allow for nowadays.

My main use for C and C++ are actually their shading language derived dialects.


Someone made a pin-compatible Controller Pak with FRAM that doesn't require a battery. Has layout files too.

http://www.qwertymodo.com/hardware-projects/n64/nonvolatile-...


Brilliant article ! I love these in-depth "old-hardware" articles with my morning coffee. (It's only 8am here in South Africa)

Lol this stood out for me (although probably just semantics...):

"Reality Co-Processor running at 62.5 MHz." What big dreams we had back then to try and"simulate reality" with only 62.5Mhz :)

Well done author... well done Nintendo !


Amiga's Agnus and its AGA successor were running at around 7 - 35 Mhz tops. :)


On the cost-saving point, I always understood that the limited 4kb texture memory led to lots of games having really muddy, blurry textures. How much more would have 8kb or 16kb cost? It seems a small cost saving that had a pretty large, negative impact.


Regarding simply having 8/16/32kB, the cache was integrated with the chip itself, it wasn't RAM that lived on the motherboard. So adding more would have required a larger chip.

It was a multifaceted problem, and was ultimately a design flaw/oversight rather than someone saying "I think 4kB is enough memory to store all the textures". The problem is less that the cache was small, it's more that Nintendo's plans for how awesome RDRAM and a unified memory architecture didn't pan out.

Problem #1: There was no dedicated video memory. All RAM on the N64 was shared RAM. So framerates tanked if you didn't have most of your stuff in cache. Keep in mind the framebuffer also lived in this unified memory area, so the video chip was already very noisy on the memory bus.

Problem #2: The unified shared system RAM was RDRAM, not SDRAM. And the latency on RDRAM is absolutely terrible. So the already expensive cost of using RAM was compounded.

If the N64 did what the playstation and saturn did and just have dedicated video/system RAM, and made this RAM relatively low latency SDRAM instead of the relatively high latency RDRAM, this 4kB limitation wouldn't have mattered.


Yeah, but RDRAM gave them one big benefit: Even with the price premium of RDRAM back then, it was the cheapest route to getting 500mb/sec of memory bandwidth.

The Playstation by comparison used EDO (Based on eyeballing the pictures on wikipedia, baseline was 70ns/60ns for CPU and Video memory.) But, It's main bus was under 133mb/sec, and the fastest it could read from CD was 300kb/sec.

EDO Memory would have kneecapped the N64 from a memory bandwidth standpoint. The cartridge bus alone is over 200mb/sec. SDRAM -might- have done the job but may have wound up being more expensive; PC-66 (we are at the infancy of SDR in 1996) would have meant a PCB with 8 chips laid out for the parallel bus. To be frank I'm not sure Nintendo could have even gotten such a configuration (i.e. 8 512KB PC-66 chips.)

RDRAM was definitely a design compromise, but in retrospect I understand it's use in keeping overall costs down.

Dedicated video ram would have been a better option however, but I think it was another cost issue.


It's interesting to consider the N64 in contrast to the first Voodoo card. One is a console and the other is an add-in card, but both launched in 1996, with underlying 3D technology from SGI. The Voodoo used EDO RAM, 8 chips of 256k x 16b, 2 MB for the z+framebuffer and 2 MB for the texture memory. With 50 MHz EDO RAM, wired in a 64-bit bus, the peak texture bandwidth would be 400 MB/s, dedicated to that purpose alone; in contrast the N64 RDRAM was main memory and had to service other functions.

The N64 launched at $200, the Voodoo at $300. Of course you would additionally need a computer to run the Voodoo, but I remember thinking the N64 was already way too expensive back in the day. It would've been even more expensive to support a 64-bit memory bus.


If I recall correctly, later games actually packed higher throughout ram into the cartridges to work around the latency of the onboard ram.


They used uncompressed textures on the cart, in ROM. (not on-cart RAM) Normally a game would store compressed textures in ROM, and decompress them into RAM. It was a solution with significant tradeoffs though.

#1 It was still slower than the cache.

#2 You were still using the single shared bus. You would still be using cycles which contribute to data stalls elsewhere in the system.

#3 ROM was expensive. N64 games were typically in the ballpark of $10 more expensive than Playstation or Saturn games because of the manufacturing expense.

#4 I don't fully understand why, but it was all or nothing. You couldn't have uncompressed textures in ROM but also gain the benefit of the cache. Maybe the cache invalidation was poor or something. I wish I knew more.

Later games were more likely to go this route because ROM was cheaper. (Moore's Law and all that)


So the TMEM wasn't cache, but manually managed memory split into 8 512 byte banks that had to be loaded from the RDP's command list stream. That's half the problem.

Additionally, the TMEM could only be loaded from RDRAM, not directly from the cartridge. I think the RDP's DMA master is only connected to the RDRAM slave port and not the main system's bus matrix.

So going back to it, games would a lot of the time store compressed data with a simple algorithm that could run out of the CPU's cache. Then the scheme looks like

* Cart->RDRAM DMA of compressed texture

* CPU decompresses texture into another RDRAM bank, and can be considered a RDRAM->RDRAM transfer. Sometimes the RSP handles this instead. I'm not sure if you could load straight out of RSP DMEM to avoid another bounce to RDRAM. I don't think XBUS works that way, but I could be wrong.

* RDRAM->TMEM DMA of uncompressed texture

Interestingly, games with more advanced texturing schemes like Indiana Jones tended to use uncompressed textures. They did this to avoid the decompression step and it's bandwidth. At that point it's just staging the texture with that cart's DMA, and slurping that into TMEM without any other processors eating bandwidth in between.


Not quite.

Larger cartridges (i.e. 32/64MByte) gave them space in ROM to play with tiled textures. Usually this -did- also involve use of the 4MB RDRAM upgrade.


Interestingly, I feel the opposite. The inferior texture memory means that many games just used Gouraud shading (or sprites) instead, which was quite clean. At the time, I felt that N64 games looked cleaner than PS games, mostly because of the textures, and I also think they have aged (slightly) better. At least those games that didn't aim for realism - Super Mario 64 and Ocarina of Time are still quite playable, while the looks of GoldenEye 007 are probably more of a hurdle.


I agree. I remember thinking at the time that N64 games tended to look significantly better than PS1 games simply because the N64 was capable of shading/smoothing and the PS1 wasn't. The pixellated PS1 textures are a big part of the reason many games on the system have aged so poorly even as compared to the 8 and 16-bit sprite-based consoles of the 1980s and early 1990s[1].

[1] There are other factors of course: SD TV resolutions are low so when you hook a 32-bit 90s console up to a large modern flat panel HD or 4K TV with games running at a resolution of around 320 x 240 the pixels are MASSIVE. In addition polygon counts are low, draw distances are often low, and so it goes on. Depending on your setup games can look considerably worse on a modern TV than they would have done on more modestly sized 90s CRT screens. To be clear I'm talking about SD TVs here, not CRT monitors, which could support much higher resolutions and would therefore suffer from some of the same problems as modern flat panels in terms of making the graphics look too sharp.


Texture memory (TMEM) is very special and fast, so it tends to be expensive stuff, at least back then, and it's balanced together with the rest of the architecture, like the bandwidth the RDP has, DMA copies from main mem -> TMEM, and so on. It would have changed quite a lot of the underlying architecture to increase it to 8KB, and might not have been worth it. You couldn't have increased texture resolution by simply increasing memory without changing a significant part of the architecture elsewhere (e.g. RDP fill rates). Most textures used during draws on the N64 don't even fill up the 4KB texture memory.


It was a pretty bog standard 6T SRAM block. You can see a die shot here, and I don't think going to 8K would have been a huge deal. http://www.hotchips.org/wp-content/uploads/hc_archives/hc09/...

That being said, going to a real cache rather than a block of manually managed memory I thin would have been the better design choice. Most textures didn't fill up the memory because of the practicalities of managing that memory. A double buffer scheme to load a block wile rendering from another one, the fact that you have to eat the whole cost of the full texture's load before you can render from it, etc.


Well, I would guess twice as much and four times as much respectively. I mean RAM is cheap today but if you wanted 128GB of it today it’d still run you like, $700. I can’t pretend to know what this particular type of RAM cost back in the day, but given how relatively cutting edge the machine was I can only guess it was not particularly inexpensive...


They should have released a Nintendo 64 Plus sometime before the release of the Gamecube that would have doubled its RAM to provide hi-res textures.


That... doesn't work. The games which were released for the N64 were all designed to run within the limits of the hardware. Adding more memory later wouldn't give existing games higher resolution textures.

Plus, adding more texture memory would have meant respinning the RCP silicon. That would have been a significant expense for marginal returns.

Besides, Nintendo had already sold a memory upgrade in the form of the Expansion Pak. A second memory upgrade which couldn't even be installed into existing consoles would have been a very difficult sell, and could have soured customers on their future consoles. ("Why buy a Game Cube when they'll just release a Game Cube Plus next year?")


It only works today because devs can update games post release and there is enough speed to use abstracted libraries rather than designing a game to the letter of the spec sheet


they technically did.

the https://en.wikipedia.org/wiki/Nintendo_64_accessories#Expans... raised main memory from 4mb to 8mb.

it was required for one of the Zelda games, but also helped with graphics for some games if it was inserted.

seemed to be a decent success.


DK64 required the expansion pak due to a bug (it was intended to work in either 4MB or 8MB modes and they couldn't fix a crashing bug in the 4MB version) and then Rare spent an obscene amount of money shipping an expansion pak with every copy of the game. That was probably responsible for a large fraction of the units shipped, with most of the rest accounted for by Zelda (which did not include the pak even though it was required).


The truth of this claim is disputed:

https://www.reddit.com/r/n64/comments/ft63zn/dk64_memory_lea...

The lead artist claims they’d always been planning to make use of it. There’s compelling evidence of a memory leak or similar issue since the released version of the game apparently crashes if you leave it running for some amount of time over 10 hours (which wasn’t usually a problem in practice except for very long speedruns until it was released on Wii U Virtual Console and people started using save states which don’t reset the timer the same way as turning the game off and on does).


I've heard the the lead programmer confirm it, and I trust him more than the lead artist.

I _think_ it was on the documentary clips shipped with Rare Replay, but I'm not 100% on that.


64DD was an expansion to the 64 https://en.wikipedia.org/wiki/64DD . It was a commercial flop


It may have done better if Nintendo didn't take so long to develop and release it.

I bought one on eBay, and it's a quirky device. It's essentially a glorified floppy drive with proprietary disks, and it makes loud, albeit amusing scanning noises.

Every Nintendo console prior to the Wii had some sort of "expansion" or upgrade capability for future peripherals to be added. Hardware revisions took the place of this model.


Man, it sounds like it was in development hell. I wish we could get somebody to talk about what it was like developing the 64DD and what problems they ran into. Probably won't ever happen, but I can dream.


We called it 64 cd we where kids so we where waiting every day for a n64 cd kit


they made a Expansion Pak allows 8MB of RAM. I just replaced mine it was bad I couldn't play perfect Dark with out it


I don't think I would have gotten as interested in gaming if it weren't for Nintendo's decisions with the N64. The native bilinear texture filtering, Z-buffer, and subpixel model rendering make such an enormous difference to me I'd have found the Playstation unplayable.


To me, N64 continued the video game "tradition" I knew from the 16-bit era. Colorful, fast-paced, responsive. PSX games by contrast were dour, slow-paced, and controlled poorly because of Sony's initial failure to consider how digital control wouldn't work in a 3D environment.


The N64 blurry textures and washed out color had me thinking initially that my friends TV was broken. It was only Rare and Factor 5 games on the N64 that really impressed me. Of course they could do nothing for the blurry picture quality. That was a hardware thing that has been fixed with a hardware HDMI mode that offer de-blurring


The N64's control sticks used digital rotary encoders.


The PSX only had D-pad style controls at release in 1995 -- no joysticks! The Dual Analog controller wasn't released until 1997.

(I think you're getting distracted by the terms "digital" and "analog". It may help to think about this in terms of discrete and continuous inputs instead.)


True, but there's a rather radical difference between an input device with nine possible states like a d-pad and one with 65,536 possible states like the N64 control stick.


The design of this web page leaves a bit to be desired. The tabbed boxes with the light grey background on the white background of the website itself are pretty easy to miss as you scroll forever.


Did Rambus have a dossier of compromising photos on industry executives in the mid 90's? Why did Intel and Nintendo go all in on such an expensive and technically inferior memory technology? The latency is such a killer, especially if you're on an architecture with really deep pipelines (ahem, P4).


Rambus meant Nintendo shipped 5 gen console on 2 layer pcb with only 4 big ICs. TWO layers! that was a huge cost saving. Compare to Sega Saturn with a total of something like 144bits of various memory buses divided into multiple memory banks over multiple memory chips.


How many layers did the Saturn's PCB have?


at least 4 like playstation, compare this nightmare https://mcretro.net/sega-saturn-photographing-has-begun/

to https://bitbuilt.net/forums/index.php?threads/trimming-your-... revercse engineered pcb layout here https://gmanmodz.com/2020/01/30/2020-the-year-of-n64-again/

One was thrown together by committee with some function goal in mind, the other designed top to bottom with huge influence from process engineers. Simplified layout and reducent component count/variety means less time in pick&place, faster optical alignment, faster optical inspection, less opportunity for process flaws.


Very interesting, thanks.


some more hardware info including die shots and development info (verilog simulation etc) here https://www.eevblog.com/forum/blog/eevblog-491-nintendo-64-g...


I get the nostalgia angle -- but why are we discussing a console that's >20 years old?


Because it's interesting and technical. Despite the name 'Hacker News', I don't know if you've noticed but not a lot of what's on here is necessarily news — just plenty of food for thought for engineers, people in comp sci, etc.


The linked article is a very indepth, clearly presented breakdown and analysis of the hardware. It's well written and educational. Works like this get hella karma on this site.


Sometimes, it's an important step in the path of hardware evolution; sometimes, it's a path not followed that we can learn from; either way, 20-year-old hardware is still relevant.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: