MOnSter 6502: a working transistor-scale replica of the classic MOS 6502

dang · 2024-04-03T21:39:50.000000Z

Related. Others?

Complete working transistor-scale replica of the classic MOS6502 microprocessor - https://news.ycombinator.com/item?id=33841901 - Dec 2022 (38 comments)

MOnSter 6502 - https://news.ycombinator.com/item?id=26507525 - March 2021 (31 comments)

The MOnSter 6502: transistor-scale replica of classic MOS 6502 microprocessor - https://news.ycombinator.com/item?id=17969472 - Sept 2018 (81 comments)

A working, transistor-scale replica of the MOS 6502 microprocessor - https://news.ycombinator.com/item?id=14386413 - May 2017 (44 comments)

The MOnSter 6502 - https://news.ycombinator.com/item?id=11703596 - May 2016 (74 comments)

ooterness · 2024-04-03T20:57:06.000000Z

I love this art/engineering project. The whole concept of extra-large unintegrated circuits is just so amusing.

The team behind this project sells kits for two classic ICs, the 741 op-amp [1] and the 555 timer [2]. Sadly, they've said the Monster6502 is too big and complex to make a practical kit.

[1] https://shop.evilmadscientist.com/tinykitlist/762 [2] https://shop.evilmadscientist.com/tinykitlist/652

floating-io · 2024-04-03T23:34:45.000000Z

There's also the fact that they were recently acquired [1], unfortunately (IMO).

I've said it before: I would totally buy one of those just to hang on my wall. It's a truly awesome project, and I love blinkenlights!

[1] https://www.evilmadscientist.com/2024/bantam-tools/

lnx01 · 2024-04-03T22:17:14.000000Z

The Apple A8X, found in the iPad Air 2, contains about 3 billion transistors. (This is comparable to the number of transistors in modern desktop computer CPUs as well.) At the scale of the MOnSter 6502, that would take about 885,000 square feet (over 20 acres or 8 hectares) — an area about 940 ft (286 m) square.

xcv123 · 2024-04-04T03:03:40.000000Z

The Apple A8X is 10 years old.

The current iPhone processor (Apple A17) contains 19 billion transistors. So now we have 126 acres in a pocket.

https://en.wikipedia.org/wiki/Apple_A17

MBCook · 2024-04-04T02:03:38.000000Z

I wonder how slow it would have to run and how many kilowatts it would use.

mrguyorama · 2024-04-04T17:15:42.000000Z

It takes about a full microsecond for a signal to go from one edge of the "wafer" to the other, so that constrains your cycle time considerably.

allenrb · 2024-04-04T00:26:58.000000Z

Unintegrated or, perhaps… Dis-integrated?

Max-q · 2024-04-04T09:09:34.000000Z

The correct term is discrete circuit.

We used to have circuits, and when integrated on one piece of silicone, we called them "integrated circuits", while the old ones became "discrete circuits".

sumtechguy · 2024-04-04T12:25:49.000000Z

I have been kicking around the idea of doing this to something like a NES. Then selling them preassembled. But then I remember I can solder at all and it is decently expensive to do.

JPLeRouzic · 2024-04-04T19:20:37.000000Z

You don't have to solder yourself, some PCB providers can do it for you (indeed they buy components).

tombert · 2024-04-03T20:24:37.000000Z

Wait, so if I understand this correctly (what I don't know about hardware is a lot!), is this basically a "macroprocessor"? As in, it functions the same way as a vanilla 6502, but done at a larger scale?

Actually pretty cool that it's able to get 1/20th the speed when it's this big!

kazinator · 2024-04-03T21:05:40.000000Z

It's a discrete CPU. This is how processors were made before microprocessors. Before 1970, we had discrete component mainframe machines, in fact running at multi-MHz speeds (e.g. CDC 6600).

https://en.wikipedia.org/wiki/CDC_6600

60 bit, 10 MHz processor, introduced in 1964.

Article says that was 10 times faster than other contemporary machines; but that still leaves those at around a solid megahertz.

Discrete logic circuits can be fast because they have a lot of area over which to dissipate heat and can guzzle current.

magicalhippo · 2024-04-03T21:37:34.000000Z

The main issue isn't the CPU as such, it's memory. You can reasonably make a usable CPU using relays even, but a non-trivial amount of memory takes up sooo much space.

jfoutz · 2024-04-04T00:02:25.000000Z

I learned this in Minecraft. I’d wired up ttls in college for intro to ee for cs majors, which was fun. But ram was a beast. I guess I’ll build another tower and upgrade to a full 1k.

Ram takes so much space.

vintermann · 2024-04-04T05:43:11.000000Z

The Virtual Circuit Board game/sandbox at Steam offers an external memory, presumably for this reason.

I remember a Minecraft mod from way back that let you do "modular" redstone, going "into" blocks to build redstone inside them, which you could then reuse as a single block. I wonder if it's been updated.

userbinator · 2024-04-04T01:32:00.000000Z

Discrete logic circuits can be fast because they have a lot of area over which to dissipate heat and can guzzle current.

Moreover, most of them used either ECL or TTL, which are logic families much faster than P/N/CMOS. This being a transistor-level replica of an NMOS 6502, it can't go much faster.

mschuster91 · 2024-04-03T22:17:09.000000Z

> Discrete logic circuits can be fast because they have a lot of area over which to dissipate heat and can guzzle current.

... but not too fast either, because the long trace lengths are antennas so you'll end up in EMF emission regulations hell on the one side, and signal integrity issues on the other side. On top of that come losses from inductive and capacitive coupling.

dhosek · 2024-04-03T20:38:43.000000Z

Yep. The bummer is that the Apple ][ relies on the clock speed of the 6502 to handle some of its functionality (most notably video handling), so you can’t just run a cable to the CPU socket on the Apple motherboard. I wonder if any of the other classic 6502 machines (Commodore, Atari, Acorn, BBC, etc.) would work?

tombert · 2024-04-03T20:40:23.000000Z

Again, I know basically nothing about hardware, so this is likely a dumb question: could you conceivably get an oscillator/clock chip that also operates at 1/20th the speed, wire that into the Apple ][, and then wire in this giant CPU?

rogerbinns · 2024-04-03T22:31:23.000000Z

It is well worth understanding how these older computers worked. All the components were doing double or triple duty, so that together all the tasks like refreshing memory, reading/writing memory, generating video, generating audio, and I/O worked. This involved interleaving bus access and similar "tricks".

There are an excellent series of talks at CCC titled "The Ultimate X talk" [0] where you are shown exactly how. I recommend The Ultimate Acorn Archimedes Talk [1] which introduced the ARM chip, but also shows how shared bus access and optimisation was the core principle of performant designs at the time.

[0] https://media.ccc.de/search/?q=the+ultimate+talk

[1] https://media.ccc.de/v/36c3-10703-the_ultimate_acorn_archime...

pulvinar · 2024-04-03T20:59:56.000000Z

Due to the way the video is interleaved with CPU cycles, you'd have to do something like divide the 1 MHz CPU clock by 20, and then when writing carefully only put data on the data bus for one of those 20 cycles. And the disk accesses would surely fail due to the CPU not keeping up, among other problems.

http://www.apple-iigs.info/doc/fichiers/TheappleIIcircuitdes...

anyfoo · 2024-04-03T21:50:49.000000Z

Thinking more and more about it in this thread (yes, I'm procrastinating doing something else), I'm beginning to come to the conclusion that the way with the least modifications to the computer itself is to replace the DRAM with SRAM, and to "just" slow the video circuit down as well (including color burst, horizontal scan, etc.), but to attach a custom buffered display instead of a regular monitor to the computer.

However, I wouldn't be surprised in the slightest if any other of the integrated circuits don't work at the slower speed for some reason. (Filters etc. you'd just replace/retune.)

duskwuff · 2024-04-03T20:44:25.000000Z

Unfortunately, no. The CPU would work, but memory retention would be unreliable (because the DRAM is now going ~20x too long between refreshes), and the composite video output would fail to display on a monitor (because it's the wrong frequency).

tombert · 2024-04-03T20:49:12.000000Z

Well now I'm wondering; it might not directly run an Apple ][, but what about the Apple ][ compatibles like the Laser 128 [1]? It doesn't use the same ROM as Apple, they licensed Microsoft Basic directly from Microsoft and then added the missing functionality themselves.

I guess someone would have to see how similar it is to a vanilla Apple ][.

[1] https://en.wikipedia.org/wiki/Laser_128

CamperBob2 · 2024-04-03T22:04:29.000000Z

1.023 MHz is basically the Apple ]['s fine structure constant. Change that, even by a tiny amount, and the whole universe comes unglued. Video, disk access, you name it.

There were accelerator boards, but they ran at integer multiples of the original clock speed for that reason.

anyfoo · 2024-04-03T21:12:54.000000Z

Again, the answer is almost certainly no, unfortunately.

A Laser 128 isn't that fundamentally different to an actual Apple II. In fact, from a wider point of view, they are "almost identical".

DRAM needing timely refresh is universal to DRAM itself: It's because DRAM is literally made of a transistor and a capacitor holding your bit, and if you don't access it every so often, that capacitor discharges eventually. The most realistic path to "slow DRAM" would be to replace DRAM with SRAM altogether, which is effectively made of transistors only, and holds its content as long as just power is applied. It's not that hard to replace DRAM with SRAM: Apart from not needing refresh, SRAM behaves similar to DRAM. It's just more expensive (and usually faster, which is also not a drawback), and the DRAM refresh, while not doing anything useful, wouldn't hurt.

But as for video output, that's just a consequence of how Apple II's (most microcomputers of that era, actually) video output works: By literally controlling the electron gun in the CRT, with only some analogue electronics in between. The monitor however wants to move the electron beam at a certain speed (or, if you had a fancy Multisync monitor, a set or range of speeds). But even if that wasn't the case, the phosphor on the screen glows only for a short amount of time, so draw your picture too slow, and you will only see small parts of it.

The latter can be demonstrated by just pointing a camera at a CRT. In all likeliness, the camera's frequency will be slightly off from the CRT's frequency, so that you capture different parts of the beam's path at each frame, simulating what it would look like if you'd slow down the beam itself: https://pub.mdpi-res.com/sensors/sensors-22-01871/article_de...

Would it be possible to decouple the video output circuit from the microprocessor's timing? Absolutely! And this is in fact what virtually any modern computers has been doing for many decades by now: Even if you still can find analogue VGA with a CRT, that VGA card had its own clock that is completely independent of the CPU's clock. But that's just not how simple home computers of that era usually operated.

duskwuff · 2024-04-03T21:10:26.000000Z

From what I can tell from documents like [1], the Laser 128 was very similar (possibly even identical) to an Apple II on a hardware level. It'd have the same issues running at a higher/lower clock speed than it was designed for.

[1]: http://www.applelogic.org/files/VTECHL128.pdf

jameshart · 2024-04-03T20:45:11.000000Z

You’d need to run an NTSC screen at 3Hz

basementcat · 2024-04-03T20:56:30.000000Z

You’ll need additional circuitry to "upscan" your 3Hz display to 60Hz.

There was a moment many years ago when I considered constructing a mechanical 6502. I quickly learned that transistors are fabricated in silicon for economic reasons.

anyfoo · 2024-04-03T21:14:43.000000Z

The fun path would be to modify the display to run at 3Hz (and also implying the much slower horizontal scan rate), but due to the phosphor coating's short persistence, you'd at best see only a small sliver of the full picture at any time. Exactly like you see if you point a video camera to a CRT whose frequency does not match up exactly.

But if you'd take a long exposure photo of the CRT, it would likely work! (EDIT: Might fry your phosphor, though... as for any other components, like filters or components that may get damaged by the now only slowly changing current, I just assume you replaced or retuned that as part of the conversion process.)

userbinator · 2024-04-04T01:36:40.000000Z

There are CRTs with long-persistence phosphors too; they found applications in radar and the like:

https://en.wikipedia.org/wiki/Phosphor#Standard_phosphor_typ...

https://tubetime.us/index.php/2015/10/31/crt-phosphor-video/

http://www.labguysworld.com/crt_phosphor_research.pdf

The P10 phosphor there claims "Persistence from several seconds to several months"(!)

basementcat · 2024-04-03T20:59:47.000000Z

This is true of the Commodore 64 as the VIC-II chip timing is closely coupled to the NTSC color burst (note below joke about "up-scanning to 60 Hz")

Also disk I/O and RS-232 are bit-banged.

anyfoo · 2024-04-03T21:46:31.000000Z

> Also disk I/O and RS-232 are bit-banged.

You could easily slow that down as well.

The big problem is really the CRT (and possibly other ICs in the computer that don't like the extreme slowdown). As noted below, if you went through the arduous process of replacing and retuning components to be able to slow the beam down, your phosphor coating is still too fast, and probably doesn't like having the beam passing only slowly over it. (You thought regular burn-in was bad!)

However, replace the CRT with something else, like an LCD with some extra buffering circuit, and it could work. Yes, the color burst will be at a much lower frequency as well, but just demodulate color at this lower frequency, then.

dhosek · 2024-04-04T00:25:26.000000Z

Yeah, I’m thinking that the solution would be something that takes the analog monitor signal and uses it to control a digital display.

geon · 2024-04-03T21:13:34.000000Z

Still, you could compile your own custom microsoft basic like Ben Eater did on youtube recently.

anyfoo · 2024-04-03T21:47:39.000000Z

The software is really not an issue. I saw the MOnSter 6502 live running BASIC, and it was MS BASIC as far as I recall. It just wasn't simply hooked up as the CPU of an Apple II or similar.

lloeki · 2024-04-04T06:15:02.000000Z

> Does it run at the full speed of an original 6502 chip?

> No; it's relatively slow. The MOnSter 6502 runs at about 1/20th the speed of the original, thanks to the much larger capacitance of the design. The maximum reliable clock rate is around 50 kHz. The primary limit to the clock speed is the gate capacitance of the MOSFETs that we are using, which is much larger than the capacitance of the MOSFETs on an original 6502 die.

Now I'm curious but in way over my head.

Could the speed be improved by using MOSFETs with better capacitance?

Can the full 1.023 Mhz be attained by throwing money at it or are there physical limitations at that scale?

junon · 2024-04-04T06:36:09.000000Z

I'm not an expert but a hobbyist. Take what I say with a grain of salt.

> Could the speed be improved by using MOSFETs with better capacitance?

Not using MOSFETs would make it faster, in theory. MOSFETs have a delay time and are considered slow. But it appears they want to be true to how the original was made. There definitely exist mosfets indistinguishable from dust - I made the mistake of designing with a few at one point. So probably, yes.

Another thing I noticed is that the distance between the components on the board is quite high. They might have done that to replicate the original layout, or maybe I'm just not seeing it correctly. But everything - including the traces on the board - has some capacitance.

> Can the full 1.023 Mhz be attained by throwing money at it or are there physical limitations at that scale?

Hard to answer without measuring, though I'd imagine it's possible. 1MHz isn't that fast, but if you have high capacitances or they don't switch fast enough (e.g. MOSFETs) then your signals are going to get muddy.

EDIT: to be clear, not ragging on the project. This stuff is beyond cool.

skissane · 2024-04-04T09:30:20.000000Z

This is basically a second generation computer, whose heyday was in the first half of the 1960s – after vacuum tubes, but before the first integrated circuits.

It makes me wonder, what transistor family would give you the best performance for this (very unusual nowadays) use case? Obviously not MOSFETs.

From what I understand, germanium transistors were most common for 2nd generation computers, because (at the time) silicon transistor technology wasn't sufficiently mature to be used for that application.

fallous · 2024-04-04T10:28:47.000000Z

Maybe try ECL?

TheOtherHobbes · 2024-04-04T11:44:10.000000Z

ECL would definitely be faster. Switching time is around 1nS, which is a solid order of magnitude faster than NMOS.

But it would need something like 2 to 4 times the surface area, and would make a nice room heater.

codeflo · 2024-04-04T08:56:03.000000Z

The real question is why would you. This is an educational project that visually demonstrates how a CPU operates. It could clock at 0.5 Hz for that purpose.

avhon1 · 2024-04-05T02:31:03.000000Z

If it ran at full speed, you could run tons of original 6502 software, which almost universally rely on cycle-accurate timing to generate video output.

junon · 2024-04-04T09:29:36.000000Z

Why do anything?

sebcat · 2024-04-04T09:15:07.000000Z

> Not using MOSFETs would make it faster, in theory.

My understanding is that the MOS 6502 was MOS, so MOSFET. Not using MOSFETs would make it less of a replica.

Implementing the replica in an integrated circuit instead of discrete transistors would lower capacitance.

The delay time is a consequence of the capacitance.

junon · 2024-04-04T09:30:17.000000Z

I'm aware of all of this (I even mentioned the MOSFET vs not in my comment), I was just answering the OPs questions.

generuso · 2024-04-04T16:34:06.000000Z

I assume they use garden variety MOSFETs like 2N7002, which cost a few pennies when bought in bulk.

The gate capacitance is 20 pF. One logic gate output is typically driving several inputs, so the typical load is several-fold of this capacitance. One could check the schematics for the maximum fan-out in the real circuit. The longest, one foot long traces on the PCB add another few tens of pF. So the capacitance seen by an output of a logic gate will probably be in the range of 20-100 pF most of the time. But it is the slowest gates which determine the speed of the whole circuit, so the worst case could be much worse.

These transistors themselves are extremely fast and can switch on and off in about 3 ns. Their channel resistance is in single Ohm range even at the lowish gate voltages, so the RC time constant is also very small 100pF * 3 Ohms = 0.3 ns. Thus this is also not an issue. The slowness of the circuit comes from elsewhere.

First, from the resistors which pull the logic gate outputs to the supply voltage. There is one for each logic gate, so a total of one thousand of these, and in the first approximation half of them conduct at any given time. To keep power consumption low, these resistors must have relatively high resistance values. If we limit the current for the logic to 1 A total, that is 2 mA for each of the 500 resistors. Therefore the resistors must be 2.5 kOhm for 5 V supply voltage. Thus charging of the nodes of the circuit through these 2.5 kOhm resistors is almost a thousand times slower than discharging the same nodes through the 3 Ohm channel resistance of a fully open transistor. Still, this is not too bad -- on the order of 250 ns of delay per gate.

But the frequency for the whole circuit is determined by delay not through just one, but through many gates in series, plus there are probably parts of the circuit topology which are not optimal for implementing them with discrete transistors, where the delay does not follow from such a simple reasoning as above.

The speed can probably be easily increased by an order of magnitude if one were willing to spend 50W instead of 5W for the circuit. Beyond that, one would want a CPU circuit designed specifically for implementation in discrete transistors, not a discrete element copy of a monolithic chip.

Pet_Ant · 2024-04-04T13:18:38.000000Z

> The maximum reliable clock rate is around 50 kHz.

For those too lazy to do the math: the original did 1 MHz to 3 MHz.

Note the units -which I missed the first time- kHz vs mHz.

https://en.wikipedia.org/wiki/MOS_Technology_6502

zokier · 2024-04-04T08:15:15.000000Z

its unfortunate they don't have any details on what components were used so its difficult to say how much room for improvement there is still.

pnw · 2024-04-03T21:22:16.000000Z

I signed up for the mailing list when it was first on HN a few years ago, but haven't seen many signs of progress. Website still says mid 2023 for a launch.

6502 was my first CPU so I'm totally down to buy one, even though it's going to be pretty expensive. I'd be pleased if they could make them for less than $5k.

randomcarbloke · 2024-04-05T08:46:28.000000Z

I'd even just buy the trace for the pcb!

cancerhacker · 2024-04-03T23:49:16.000000Z

It’s mentioned in the page, but the emulation at http://www.visual6502.org/JSSim/expert.html is worth a visit.

russdill · 2024-04-04T11:08:57.000000Z

I'd be pretty happy with a piece of visual art that recreates something like that

th4tg41 · 2024-04-03T22:35:05.000000Z

Wonder if it's feasable and how much work it would be to record a run of eg Super Mario Land and play the processor instructions back on this thing hanging on the wall. Would be a nice conversation piece.

th4tg41 · 2024-04-03T22:41:58.000000Z

Plus a small LCD in the frame that displays the level and a progress bar for that level.

th4tg41 · 2024-04-04T00:15:03.000000Z

Dot matrix display. Progress bar is underneath the Level indicator. Plus presence sensor. Not PIR. And RTC in/on whatever plays the instructions to always seemlessly pick up on the loop.

henry_bone · 2024-04-03T22:35:04.000000Z

I wonder if it's correct right down to the errata. Was there undocumented instructions on the 6502? If so, I wonder if it supports those.

zik · 2024-04-03T22:48:08.000000Z

Yes, there are some undocumented instructions [1]. They say it's a transistor-for-transistor replica so it should be 100% compatible with them.

[1] https://www.masswerk.at/nowgobang/2021/6502-illegal-opcodes

jpl56 · 2024-04-05T15:37:28.000000Z

Great project! Ben Eater needs this in his videos, he would show us how the registers change.

pclmulqdq · 2024-04-04T03:54:46.000000Z

Beautiful work, and a great choice of processor to emulate. As I understand it, they also tried to get close to the layout of the actual chip, not just make a functional equivalent circuit.

The next time I have some time, I would love to do a discrete RISC-V, but I know how big a project this sort of thing is.

jecel · 2024-04-04T14:49:51.000000Z

I compared the sizes of several processors[1] including these RISC-V: Glacial, SERV, PicoRV32, VexRiscv and Darkriscv. Compiled to NAND gates the results were 2063, 3595, 34463, 49214 and 50424 gates respectively. Multiply these numbers by 4 to know how many transistors a CMOS implementation would need.

I also implemented them in a bunch of different FPGAs, though that is less relevant here.

Glacial is an 8 bit processor optimized to emulate a 32 bit RISC-V and SERV is a serial implementation of a 32 bit RISC-V. They save transistors at the cost of many clocks per instruction.

There are several projects that implement RISC-V using TTLs but I have not heard of one using individual transistors.

[1] https://www.mdpi.com/2079-9292/13/4/781

pclmulqdq · 2024-04-04T19:08:30.000000Z

I have my own core design that I would probably use, and you definitely don't need to use CMOS logic with your discretes. You can also get away with a lot more things than verilog lets you do, like you can use shared buses with internal 3-state drivers. That stuff cuts down your transistor count a lot.

snvzz · 2024-04-05T00:48:15.000000Z

Note QERV[0] exists. Slightly larger but much faster 4bit-at-once variant of SERV.

0. https://github.com/olofk/qerv

playa1 · 2024-04-04T05:50:27.000000Z

Projects like this are a great way to appreciate how much can fit into an IC.

Reminds me of the Megaprocessor project.

https://www.megaprocessor.com/index.html

KWxIUElW8Xt0tD9 · 2024-04-04T10:09:41.000000Z

4 trillion transistor wafer-scale integration

https://www.cerebras.net/blog/cerebras-cs3

apantel · 2024-04-03T20:42:50.000000Z

This is the Japanese Cloisonné of circuit boards. Beautiful, well-done.

warbled_tongue · 2024-04-03T20:47:28.000000Z

I'm always slightly disappointed that I can't just Buy Now one of these passion projects for my wall. Truly wonderful art.

VectorLock · 2024-04-04T07:25:25.000000Z

Take my money please!

saulpw · 2024-04-04T00:28:40.000000Z

It's a shame not everything is for sale.

johnwbyrd · 2024-04-03T22:52:44.000000Z

Fun project, but fairly old news at this point.

djmips · 2024-04-04T07:03:50.000000Z

Old to you but not to a lot of people! https://xkcd.com/1053/

grishka · 2024-04-03T20:24:57.000000Z

A macroprocessor.

dylan604 · 2024-04-03T21:34:07.000000Z

This would be a great piece of art for a very niche space, like my office =)

The fact that it actually works is just bonus.

namuol · 2024-04-04T03:33:03.000000Z