> The idea of implementing a CPU core inside an FPGA is not new, of course.
Indeed. I took a computer engineering class in undergrad. The capstone project was implementing from scratch a multi-pipeline RISC CPU (including the ALU and a very basic L1 cache) in Verilog that we flashed to FPGAs that we then programmed to play checkers using hand-compiled C. The FPGA was easier to flash and debug than the Spartan-6 mentioned in TFA but was significantly more expensive as well.
It was a brutal class, but it totally demystified computers and the whole industry in a way that made me feel like I really could understand the "whole" stack. Nothing scares me any more, and I no longer fear "magic" in software or hardware.
I had a similar course, using VHDL going on to Spartan-3E kits (which I still have sitting in a box 14 years later). Our professor gave us three C programs -- with the input that would be entered and expected output -- and we had to implement everything to make it work: CPU, VGA output (with "font" definition), keyboard input, and translation of the C programs to our CPU's instruction set.
That was a difficult yet extremely rewarding class. My wife, then girlfriend, still remembers that semester because she barely saw me.
IIRC, bonus points were given to the team with the highest clock speed. I didn't win, but I seem to remember mine being somewhere in the 18MHz range and the winner in the low to mid 20s.
> Nothing scares me any more, and I no longer fear "magic" in software or hardware.
You sound exactly like my professor for computer architecture. “Computers are not magic!” He mentioned this at least once during every lecture and my experience was similar to yours
I'd say the field effects on electrical flow are probably the weirdest part to look at, and they don't reach 'magic' levels.
The atoms below that are relatively straightforward, and them being made up of building blocks is fine I guess, increasingly irrelevant to the issue of a computer.
And going up everything starts to get very non-magical as you turn response curves into binary signals and then string gates together.
Yeah for me it is the parasitic bjt/collector that forms in a fet (c.f. latch up). Also that bjts work in reverse active to some extent, despite the emitter nominally being the injector of the carriers. Weird!
Also I never understood things like slew rates, noise, gain bandwidth. Too high up the stack.
Gain bandwidth is directly related with slew rate. Slew rate is mostly an effect of capacitances (gate capacitance in FETs for example.) Noise is mostly caused by thermal effects (atoms bumping randomly.)
There's also adjacent stacks... like being able to build your own computer out of crabs and rocks if you get stranded on a desert island. So that you can get them to write SOS in the sand and play pong while you wait.
Are there any resources online that would let me do a similarly deep study myself? I'm enjoying the famous "Nand to Tetris" course, but it uses a simplified architecture (no pipelining, no cache, no interrupts) and runs in an emulator instead of an FPGA.
Awesome Base to start with! Lot of things in the next step though: PCIE, JTAG, differential signaling protocols, debuggers/monitors, UEFI firmware, cache coherency, TPM type modules, linkers, loaders, vector instruction sets, sound codecs, DACs, GPUs, multi channel SDRAM, MMUs, display protocols, device trees. Plus latency, throughout, and thermal management in any of the above in a real system.
As someone who started pretty high up the stack(Visual Basic!) and kept peeling layers away out of curiosity I will say working with a CPLD/FPGA was really eye-opening. Latency difference between SRAM/DRAM and the "why" in pipelining made a bunch of disjointed approaches to optimization just drop into place.
I've started with gwbasic, then QBasic and moved up to "Visual Basic for DOS" (ncurses-style UI stuff, similar to Turbo Pascal iirc — I don't think many people even knew it existed since Win 3.11 was already big, but I loathed it and only switched to it because of... Trumpet Winsock for the internet!), but that did not stop me from playing with driving serial (COM) ports or parallel ports (LPT) with printer escape sequences. Not really FPGA level low (not even close), but DOS-based stuff was really easy to start hacking up!
Leonard Tramiel once told me that the "world's record" for a production 6502 was around 25Mhz before the smoke came out. This was in a lab at Commodore, and beer was probably involved (and may have been used to cool the chip).
Reminds me of when I worked at a computer shop in the Pentium 1 days.. Motherboards had these complex blocks of jumpers in those days to configure the speeds, there was no autodetecting the CPU.
One day I was working at a different branch filling in for someone on sick leave, and I spent an hour trying to get a PC I just built to boot reliably. Every time it crashed after an minute or so. I didn't get it, everything seemed fine.
Eventually it turned out the "cheat sheet" they had of the jumpers was upside down over there. Someone had copy/pasted the pictures so it was matching the orientation of how they usually had the PC on the desk, rather than upside-down as I was used to. But the text was the right way up. So the total thing looked the same as in our branch except it wasn't.
It was a square block so I hadn't noticed the orientation was different. Turned out I had the 100Mhz pentium configured for 180Mhz. Oops. That wasn't even an officially supported speed of the motherboard but the BIOS messages indicated this (which I only noticed afterwards)
As we didn't want to sell this CPU after the torture I had put it through we decided to use it for a display box instead, and we tried to keep it running as long as possible by using compressed air cans upside down to blow dry ice :D It actually ran reliably until the can ran out. Only later I found out that liquid nitrogen extremeclocking was actually a thing :D
My 486 (nominally 40MHz) had one of those jumper blocks, and it was right near the front corner of the motherboard. Right behind the floppy-drive opening in the case, which had the drives mounted in these little removable sleds. And I didn't use my floppy drive much, and besides, the floppy cable didn't seem to mind being hot-plugged as long as you weren't accessing the disk at the time and unplugged the power connector first.
So during a long download, I didn't need all 40 MHz screaming along (and heating up the chip to the point that it needed a cooling fan -- a COOLING FAN, can you imagine a CPU running so fast it couldn't cool itself on ambient air?), so I decided to see if the clock generator jumpers were hot-pluggable.
Lo and behold, they were! I could reach in and seamlessly downclock the CPU to 8MHz (which was just one jumper-cap different than the 40MHz setting), which was still plenty to service the UART FIFO interrupt. Unplug the CPU fan too, which made the machine silent. Turn the monitor off, kick back in my chair, and take a catnap. The Telemate terminal software would play a little tune when a download finished, which would wake me up, I'd turn the monitor back on, open a DOS prompt, start unzipping the file, and then reach in and clock the CPU back up so the pkunzip process would finish in a timely manner.
It would do 50MHz but the upper half of RAM would disappear, so there weren't a lot of workloads appropriate for that configuration....
and heating up the chip to the point that it needed a cooling fan -- a COOLING FAN, can you imagine a CPU running so fast it couldn't cool itself on ambient air?
This feels super nitpicky but I'm curious about your setup and if you're either remembering the clock speed wrong or if the fan was actually completely extraneous, because in fact neither of the common 40 MHz 486 parts, the Cyrix Cx486DX40 or the AMD Am486DX40, required a fan. The Cyrix one came with a heatsink, which was a rarity at the time.
The first 486-class CPU that pretty much always ran with active cooling was the DX4/100. Even the DX2/66 could run fanless if you had half decent airflow.
I'm certain of the clock speed, but I think you're right that the manufacturers alleged they were fine without additional cooling. (They considered it a bad look, and actually said that chips sold with fans were likely overclocked chips intended for a lower speed bin, or otherwise graymarket.)
But consensus among everyone _but_ the manufacturers was that additional cooling couldn't hurt. (A representative opinion can be found in Upgrading And Repairing PCs, whatever edition was current at the time.) Running right at the top of Tcasemax wasn't good for longevity in terms of electromigration within the chip itself, nor for the capacitors and other components in the neighborhood. Thermal goop wasn't commonplace yet, but the little heatsinks and fans sold like hotcakes (har!) at the local computer shows. Plain aluminum heatsink, clear (polystyrene?) fan, with a holographic "CRYSTAL COOLER" sticker on top. I still see the fans around, but without the shiny sticker.
The Am486DX-40 was my favorite chip. With a VLB video card (Trident 9400CXi) that worked well on the 40MHz bus, its pure pixel-pushing power ran rings around 33MHz-bus systems regardless of their core clock, and that included the P-75. I later got the impression that I lucked out with that Trident card, as almost everyone else with a 40 or 50MHz VLB machine had tales of woe and flakiness.
Thanks for the reply! That is interesting and makes a lot of sense.
Yes, running a 40 or 50 MHz bus made a huge difference, especially if you could get VLB graphics running reliably on it. I'm into collecting and tinkering with 486-era machines for nostalgia's sake and often i see things like DX2 or DX4 systems with plain cheapo 16-bit ISA graphics cards and think such wasted potential...
I was pretty sure none of the 486 CPUs absolutely required a cooling fan (though I only had a DX2/66MHz): I remember Pentium II being the first CPU I've seen that absolutely required it (came with one integrated on the second computer I assembled for myself), and I distinctly remember having trouble testing an AMD CPU a year of so later on a computer I was assembling for someone else because... well, it wouldn't even POST without a CPU cooler.
Luckily I did not fry it and after adding a cooler it worked just fine.
I don't remember if AMD or Cyrix CPUs were worse though.
There were cards that had a pot on the back. You could just dial up more speed, until your machine ran poorly, then back off just a little. Kind of crazy to think about!
The speed on my Apple FastChip is adjustable in real time too. It's neat to just dial a speed appropriate for the application at hand.
If you are interested in programming your Apple in assembly, you can ask nicely for your FastChip to include a 65816 processor. It's going to act like a 65802, due to hardware limitations, but otherwise yeah. You get the 16 bit instructions to use.
I've not had any compatibility trouble with mine, which is a 65816.
I am pretty sure I've had this on at least my 80286, and maybe even 80386 and 486DX2 (33/66MHz I think) too (though highly uncertain on the latter). Perhaps you only needed a case that would connect the turbo switch to the motherboard.
It was fun to turn it on for games that used timing-loops for frame rendering to make games twice as fast :)
Do I recall correctly that those turbo buttons would, counterintuitively, actually down-clock the CPU? For compatibility with software that had hard-coded timings or something?
The correct way to wire them is so that turbo "on" means full speed and turbo "off" means slowed down. Different motherboards implemented it differently. Usually downclocking the FSB or inserting waitstates for memory access.
They originated with "Turbo XT" class machines which ran an 8088 but at 8, 10 or 12 MHz -- faster than a real IBM PC/XT. Turbo on meant a faster machine, and turbo off meant 4.77 MHz -- fully compatible with timing-sensitive PC software.
Later, in the 386/486 whitebox PC era, some machines had the buttons wired wrong and now it's a meme that turbo made the computer go slower, but that was never true for systems built correctly.
Yes. Old software had timing loops and other delay constructs. For a while, that button was meaningful when running games intended for the original clock rates.
I discovered this a few years ago playing Ultima IV on my Amiga 3000, which is 16Mhz. It was impossible to play (and funny to look at) because the game was so sped up. All of the NPCs in the game, which normally just stand in place and move their arms about, were moving so fast it was bananas. You could barely see their arms they were like hummingbirds.
The game was intended to be on a 7Mhz stock machine pre-1990. And it was perfectly timed to that speed.
Seriously. The guy put in a good amount of time and delivered an Altima with many modern sensibilities baked in. I've been playing it on my Apple it's a lot of fun.
Great story. I remember those days well, I also remember finding every last byte of UMB, elaborate autoexec.bat files and figuring out IRQ assignments to get max functionality. Kids these days don't know how good they got it.
> With the introduction of CPUs which ran faster than the original 4.77 MHz Intel 8088 used in the IBM Personal Computer, programs which relied on the CPU's frequency for timing were executing faster than intended. Games in particular were often rendered unplayable. To provide some compatibility, the "turbo" button was added. Engaging turbo mode slows the system down to a state compatible with original 8086/8088 chips.
I never had such a button in my PCs (first one was a 386SX) but I did see it on other PCs and always wondered what it did... => today I finally found that out :P
The one I had I believe was a 386. The problem was that it was easy to accidentally bump the button and my Mom would complain the computer was running slow.
> As we didn't want to sell this CPU after the torture I had put it through we decided to use it for a display box instead
Why? What you did is basically a burnintest. All manufacturers torture their hardware by locking it in a hot room for several days at max speed to see if it fails. The basic theory is that if it's able to survive the torture test, then it's less likely to fail once it's been sold to the customer. Parts for things like space missions go through even more severe torture tests, where they're bombarded by radiation and every horrible thing you can imagine and that actually makes the price go up!
If something like Windows98 probably because of it not using the HLT instruction of the CPU vs. Linux doing the right thing, resulting in a cooler CPU on average when running under Linux.
Reasoning by MS was low quality of the countless low-end power supplies, and maybe voltage regulator modules on mainboards, being 'unreasonably' stressed by load changes that fast.
I remember when I was using Windows 98, the media player [1] I was using is shipping with a tray icon which has a menu item written 'Save power when CPU is idle'[2]. It did exactly that (HLT thing). After ticking the menu, CPU just go cold.
At the Vintage Computer Fest last week Bill Mensch mentioned to the audience that no one ever hears about the 65C02 and 65C816’s use in defibrillators and pacemakers - life critical applications - unless he tells them!
Does anyone know of good write ups or explanations of what makes the 6502 so reliable and what competition it had in being chosen for medical applications?
Simpler is an advantage in that world, if you can understand the functioning of your device to the cycle level then you have a much better chance of delivering something that will work reliably.
One of the things that I loved about the Apple ][ was that it was possible for one person to completely understand everything about that computer from the hardware to the software. I've never had that level of complete understanding of any system I've used since.
Yep, similar experience here. My first computer was a Tandy / Radio Shack Color Computer. It had a 6809 processor (8/16-bit precursor to the 68000) @1.8MHz, 4k of RAM (upgradable to 64k), 16k or 24k ROM memory with a quite expansive MSFT Extended Basic Interpreter (supposedly the last ROM OS & BASIC that had assembler written by BillG himself).
I taught myself BASIC, assembler, graphics programming and game programming on that machine over a period of about four years of hacking around on it (including hand-commenting some significant chunks of the ROM). By the time I retired it for a shiny new Amiga 1000 in 1986 I'd upgraded it to 256k of bank switched RAM with a soldered-in hack board, added four floppy drives, various I/O boards and learned OS/9 (a UNIX-inspired multi-tasking, multi-user OS) and hacked in my own extensions to the ROM OS (including adding my own new commands and graphics modes to the BASIC interpreter).
It started out as a lot of trial and error but, on later reflection, ended up being a surprisingly thorough grounding in computer science from which to launch my career. That 6809 machine was also the last time I really felt like I was aware of everything happening in a computer from interrupts to registers to memory mapping down to the metal.
Yes, that was the beauty of the 8 bit era, and many people lost it without even knowing that they lost something very precious. The total control is a very nice feeling.
I'm not sure why "simple, understandable system design" would have to be synonimous with 8-bit computing. One of the most appealing things about new open hardware initiatives is how they bring this simplicity and surveyability together in what's otherwise a very modern design and context.
Seems every time someone applies that to hardware with a wider compute path, other complexity creeps in.
Would be interesting to make a 32 bit Apple 2 style computer. Include a ROM for a means to boot, and leave everything else simple, with some nice slots. Could be a great development / learning machine.
I've "built" such machines in FPGA; PicoRV32 core, hand made display processor, a bit of RAM, and a couple UARTs. It was fun and not that hard for me, a newbie to FPGA.
One of the bigger challenges is integrating peripherals. I got bogged down trying to do SD Card interfacing. There are off the shelf bits of IP from Xilinx, etc. you can use to do this, but that sort of defeats the purpose of the exercise.
I think modern machines started their slide into mind boggling complexity when bus speed and CPU speed outstripped RAM speed. So much complexity and unpredictability is in the all the infrastructure built around cache.
Something like an Amiga or Atari ST was still not hard to understand all/most of, despite being 16/32 bits.
Because once the clock speed gets past a handful of MHz, maintaining good clock distribution and good signal integrity become more painful very, very rapidly.
It's unclear what you're asking for here? The Raspberry Pi Pico and Raspberry Pi Zero are existing and very well documented ARM based single board computers.
Fair enough. I suppose what I'm asking is whether it's possible to purchase a more modern CPU independently of a SOC and design my own single board computer built around that. Can you buy a naked Cortex M0 in a PDIP package? Are there data sheets for it?
A search of the interned didn't turn up what I was looking for, but I'm very new at hardware work. Perhaps newer chips have such tight timing requirements that you can't work with them without using a SOC?
Nearly everything is going to be a SOC because that's what commercial applications need to minimize part count and cost; there's negligible demand for a standalone processor.
If you're just looking to breadboard up a computer but don't want to go back to 8-bit processors, the Motorola MC68000 / MC68008 used in the original Apple Macintosh is a 32-bit processor in a DIP package running at a manageably low frequency and can be found on eBay inexpensively.
You may like Project Oberon [1] designed by Niklaus Wirth [2] then. His guiding principle was to make a powerful but simple system that could be understood from top to bottom by a single person, from the RISC hardware to the OS to the compiler used to compile and run the OS.
It's quite a bit above the Apple ][ in terms of power.
I don't have any documentation but I would imagine that as these chips have been in existence for so long, its behaviour is extremely well understood, including most, if not all, of its weak points. The work around for these weak points should also be well known.
The nice thing about the 6502 is that it is completely reverse engineered down to the transistor level, so it's possible to explore what exactly is going on in the chip for each clock cycle even when the original design documents had been lost:
Though I'd think they would use a microcontroller with a 6502 CPU which integrates ROM/RAM/GPIO/peripherals into one. Here is a microcontroller with a 6502.
I just read an article recently about how 6502-based chips are used inside of satellite receiver boxes.
I wonder if it's just a function of the time. I imagine anything designed new now would use an ARM based microcontroller but likely when many of these systems were originally designed those were much less common and more expensive.
I'd expect that there are still a lot of new designs where something like an 8-bit microcontroller such as an AVR makes more sense than using something ARM based.
It's getting harder and harder to find places where this is true.
ARM Cortex M0/M0+ blows AVR out of the water, and is usually cheaper except for the very lowest end AVR parts. Generally will use less power, too. And that's assuming your unit counts are so high that firmware developer time is free.
Of course, it's getting impossible to find 5V VCC ARM parts, so that's something that would steer you towards AVR if your system is really a bunch simpler by having a 5V micro.
This is not strictly true. Many AVR chips can handle more computation than their clockspeed would suggest due to some really nice assembly codes that allow for common DSP calculations.
I ported an AVR code base to a cortex M4 last year, and some of the inlined asm didn’t translate. I ended up having to use inlined C instead. So, my 120Mhz M4 chip struggled to do what a 90Mhz AVR did no problem.
You can get Attiny to sleep at 6uA with a watchdog, and ~120nA with an external interrupt. Can M0 match that? Genuinely curious, I don't have much experience with ARM.
AVR is actually rather expensive, relatively speaking. I believe it's only popular due to Arduino. Even the various PICs will be cheaper, and of course there's still a lot of (very fast) 8051 variants as well as 4-bit MCUs at the ultra-low-cost level (<$0.01).
That idea to treat some memory regions (e.g. memory mapped IO areas) as "external memory" which cause the CPU to run at the system clock speed (instead of the much faster "internal clock") sounds like it could also work well for (software) emulators.
However for "highly integrated" home computers like the Atari 8-bitters and C64 I guess this wouldn't be of much use, because most games and demos depend on proper CPU timing, even when not accessing memory mapped IO regions (for instance in wait-loops to get to the right raster position before reprogramming the video output).
There's a lot of discussion of wait stating and slow-clocking of 6502 and accessories over at 6502.org. In particular, many 65xx series accessories still need a common/regular clock signal to keep their internal timers going, even if the CPU is running faster/slower than normal for a specific access.
This ends up becoming a very fun design problem when you do it with integrated circuits!
Well, it's not new in the 6502 world either. The Apple IIGS has to do just this to clock down its 2.8Mhz CPU selectively to work with the floppy drives and such, outside of doing that to run as an Apple II.
Beyond that, a 6510 isn't the only thing you really need to emulate a Commodore 64. You also need a SID Chip (MOS 6581) for sound and a MOS VIC-II for display and a number of other things.
The cpu in the article is real hardware and pin compatible with the 6502, and they use it by putting it into 6502 sockets on old hardware, so no need to emulate the SID chip or anything else--there would be a real one available.
It would be quite easy to modify the design to full 6510, which just has a few more pins dedicated to IO. The biggest issue is properly emulating the bank switching, which they have done for other hardware, but the 6510 has a more complicated scheme.
The C64 bank switching is trivial - the 6510 has a few IO pins mapped to address $1, and a few of them are used for the bank switching.
The bigger problem is that all the RAM is really used for "IO" (in theory anyway) on the C64, as the VICII can remap the character generator (font) location, where it pulls sprites from, and where the screen content is stored.
So a static memory map is insufficient if you want it to just plug into the CPU socket and work.
I always wondered in those days why the disk drives for 8-bit computers were so crazy expensive. In Holland they cost more than the computers they were meant for.
But only later I learned that they were basically another whole computer themselves. Plus the drive mechanism of course which also wasn't cheap (but not nearly as expensive to warrant the high price).
It was the same for the Atari 800XL I had, I never owned a commodore 64.
It's all the odder when you consider the Apple II.
It came out before any of those other home machines, and yet had the cheapest floppy disk storage from 1978 onward. That was largely due to Steve Woz's brilliant disk controller design, which did away with everything but some simple glue logic and a couple ROM chips, lifting everything else in software.
Of course, the Apple II had real expansion slots, obviating the need for using a serial connection, too.
From what I can tell, while the Apple II family had a much higher up-front cost, the more serious you were about computing, the more the low-priced home machines with expensive peripherals worked against you in the long run.
I'd agree with that. The case size, open design, accesible FW, etc. made it really easy to add custom HW to the Apple ][, I got a lot of joy out of mine. I have to admit to being a bit jealous of the C64 and Atari kids with their better gaming capability.
> and yet had the cheapest floppy disk storage from 1978 onward.
OTOH, the time-critical hack that allowed it also made it nearly impossible for Apple to upgrade the II without breaking backwards compatibility. The only Apple II with a faster 6502 is the //c+, and that because it has the crazy Zip Chip acceleration logic on the motherboard.
> I always wondered in those days why the disk drives for 8-bit computers were so crazy expensive. In Holland they cost more than the computers they were meant for.
The mechanics were also somewhat expensive. In Brazil, an Apple II drive was often as expensive as an Apple II clone.
What makes the intelligent drives a great idea is how easy it is to emulate them - you emulate a nice protocol. When you have to emulate, say, an Apple II drive, you need to emulate the delays the drive mechanics introduce, as well as the head electronics, because the Apple II's 6502 is reading the head and assembling the bits. That's also why accelerating an Apple II requires you to slow it down for a longer time every time it accesses the IO region - because the disk needs to revolve in the exact time the 6502 takes to run some amount of code. With an intelligent peripheral, it doesn't matter you don't wait several seconds between commands, as long as you only issue them at the required speeds.
Here's a demo running directly on the 1541 disk drive. It generates video by hacking the serial cable that's usually used to connect to the computer, and audio from the drive motors.
We have breadboarding (ben eater and someone basically do a sort of os to ease the pain of doing rom etc check YouTube) and many just arudino mega hacker (one of this but there are many : https://www.instructables.com/6502-Minimal-Computer-with-Ard...).
And if not hardware easy6502 is a very good software assembler and simulator using JavaScript and canvas no less
They're compatible enough you can drop a 6510 into it with no problems (I tested that, to my parents great despair). You can also swap the IO chips I think, with various effects (you at least can drop the Amiga CIA chips into a C64 - you lose the realtime clock nobody uses, but gain timers).
Putting a 6502 into a C64 may or may not work for some values of work or not at all - I don't recall what the default for the bank switching would be, but the tape drive certainly wouldn't work (the gpio lines on the 6510 is used for bank switching the ROM, and for the tape). But it should be quite easy to make it work except for the tape drive. You just need to ensure the right voltage on 3 pins for the ROM bank switching (various software that expect to be able to change it will fail though)
The only difference that matters is the IO pins mapped to address $01. Given this is an FPGA based project it likely would be pretty trivial to make it work in a C64 as well.
Depends. Some Atari 8 bit games used the display interrupts for timing, so those could run as fast as possible so long a the display interrupt happened at a consistent 60hz. (now that I think of it, I think some games has issues in countries with 50hz updates)
This is true for a lot of older and simpler games for the C64 as well, but at the same time for most of them you'd see absolutely no benefits, since it's common for everything to hang off the interrupt.
A few games that glitches when there's too much stuff going on at the same time might run smoother.
Demos are likely to mostly not work because any remotely fancy effect tends to depend on much more precise timing, though.
There’s plenty of games and demos that require the exact amount of raster time to function. If you want to display things in the left and right border you have to switch something at exactly the time the graphics chip is displaying the right border. That’ll never work if the timing changes.
In reality in a machine like the C64, etc. it wouldn't really run at 100mhz, because in that case the bus speed is driven by the VICII chip and the bus access is stolen by it to do its work, and this is all tied to NTSC/PAL speed. So in the design on this FPGA impl it could theoretically internally do a burst of 100mhz-esque work it would only be while the VIC-II (or equivalent in other machines) has given time over to it.
The original article touches on this, the difficulty interfacing an Atari 8 bit, C64, etc.
This was already an issue on the Commodore 128, which had a 2 MHz "fast" mode but you had to manually engage it, and doing so would turn off the VIC-II. Good for doing large calculations or working in 80 column text mode, but useless for games etc.
Yeah I suppose one could use a CPU similar to this 100mhz FPGA one and just have it do ~50 cycles worth of activity every time the VIC-II yielded control to it. Then it'd be idle for 50 cycles, etc. And then have a "fast" mode like you're talking about to do 100 cycles in a clock.
Memory and peripheral access would be seriously wait-stated though. 50 cycles of action doesn't do you much good if memory is slow. Especially when you consider that programs for the 65xx made heavy use of zero page / direct page as an extra bank of pseudo-registers.
So you'd end up implementing some kind of cache, or memory mirroring, or just moving the whole of RAM in the FPGA... and then you start to wonder why you didn't just do the whole thing in FPGA as a C64 SoC.
I thought that the faster 6502-clone should have its own 64kb RAM and only let writes end up on original RAM. So only writes should be slowed down to the original 6510 free slot.
All reads could be from fast RAM.
Then we have hardware registers and external DMA, those have to be handled specially.
Yes, and that's precisely why the Apple 2 is still on my hobby / workbench. It's simple to interface with, and with a faster clock, remains useful.
And the Apple has slow RAM and fast RAM in a similar way. Really, to get the machine to run at 16Mhz, it's necessary to copy code into the fast RAM on board the card, leaving system RAM unused.
The Color Computer, Apple 2 and some others were made in a simpler way that did not interrupt the CPU for refresh and or video access cycles. That makes projects like this easier.
Way back when, in the early 90s, I ported a unix 6502 C64 emulator to Macintosh (Classic of course). I used it to play ripped Rob Hubbard tunes when the printer ran out of paper.
It surprised the end users who’d been ignoring the original SysBeep(1) sounds the application previously used.
This is somehow super retro cool. I remember the 8032 being the most serious computer I'd seen to date. Wordpro 4+ and something called "The Manager" in use at my high school, plus the rebadged daisywheel printer Commodore had at the time (CBM 6400?) just had "serious computer" written all over them, right before the IBM PC steamrollered all of that into oblivion.
100MHz. The software you could run! Add a megabyte or so of full speed, pageable RAM expansion. Every computer language right up to C++ (if it works on the 8-bit Arduino it could be shoehorned into a fast 6502 - limited stack? Who cares, just do the big stack in software. Special zero page? Just use it as glorified CPU registers).
What's the point really? But awesome all the same.
A megabyte of ram on a machine with only a 16 bit address space gets a bit silly. Do you really want to manage 16 memory pages? These aren't multitasking machines, what are you going to do with that ocean of memory you can't access without faulting?
Why wouldn't a faster machine with more RAM be a multitasking machine? (Obviously without extras you are limited regarding security etc, but plenty early multitasking machines didn't have that)
Not in the sense we usually mean nowadays, but assigning pages to processes and switching between them, with a small non-paged segment for general data/code lets you build multitasking systems. E.g. I believe early multiuser BBSes ran on systems like that.
There were tools that added multitasking to the C64, such as e.g. BASIC Lightning that let you run multiple BASIC "threads" + sprite animations at the same time.
It's easy to write a scheduler for a 6502 as there's so little to save, though you'll need to be very careful about stack usage, and you might do better with a specialised scheduler (e.g. for C64 BASIC) as a lot of code you might want to run may store additional state in fixed locations.
> A megabyte of ram on a machine with only a 16 bit address space gets a bit silly. Do you really want to manage 16 memory pages?
The Game Boy Color has a 16-bit address space and almost all its games are 1MB or larger (although that's ROM rather than RAM). The largest game is 8MB in size - which is managed as 512 banks of 16KB each.
Well take Wordpro 4+ for example. In a 32K machine, it had enough memory for a few pages of text. It would have been relatively minor software complexity even at 1MHz to use paged memory to obtain a much bigger text buffer. Ditto BASIC could be readily rigged up to use a paged memory architecture. Not everything needs a linear address space.
A megabyte of ram on a machine with only a 16 bit address space gets a bit silly.
There were plenty of machines kitted out with that amount of RAM. S-100 bus and other multi-user systems in the 70's and 80's could handle dozens of simultaneous users. It's cooperative multitasking, not preemptive multitasking.
I can put 16MB on my c64 with my 1541 ultimate. Haven’t looked at ir much but I believe there were some commercial productivity apps such as GEOS with very basic multitasking capabilities would be well able use this. Imagine they’d had an amped up CPU too!
This got me thinking about the CPU accelerators that were available for the Apple ][ line, including the GS's 658C16. I googled and little did I know that there's an enthusiast community still developing for it, that's gotten the GS is up 18 Mhz:
Looks like a fun project. I should get back in to FPGA tinkering. Last time I played with one was about 20 years ago. I wonder if the development environment has improved since then?
Yes and no. Much heavier footprint, mostly the same proprietary bs, somewhat faster and much better debugging/simulation tools. Languages have improved somewhat, and as long as you stay within an eco-system and use the vendor supplied tools and software you should be mostly ok, stray outside of that (for instance: open source toolchains for cutting edge FPGAs) and you'll be in for a world of trouble. The fabrics (got a) lot larger and there are some more and interesting building blocks to play with. Higher switching speeds.
Someone who is more active in this field may have a more accurate and broader view than I do.
Is one of the most recent and - for me - significant developments. Note that for companies that use FPGAs none of the above is considered a hurdle, though their engineers may have a different opinion and that the hobbyist/hacker market for FPGAs is so insignificant compared to the professional one that the vendors do not care about catering to it.
I think there are a lot of major developments in the last 20 years, although I'm not active in the field. Symbiflow is largely a distribution of yosys, a bunch of other IceStorm projects, and nextpnr (is that part of IceStorm?), in the same sense that Debian is a distribution of Linux. Another one, but I think limited to Lattice FPGAs, is https://github.com/FPGAwars/apio.
I think the biggest development, though, is that there's enormously more off-the-shelf Verilog and VHDL, not just on OpenCores like 20 years ago, but also on GitLab, GitHub, and so on. Easy examples are CPUs like James Beckman's J1A, the VexRiscv design used in Bunnie's Precursor: https://github.com/SpinalHDL/VexRiscv (as little as 504 Artix-7 LUTs and 505 flip-flops), and Google's OpenTitan.
But from my POV the more interesting reason for using an FPGA is for things that aren't CPUs. For example, the SUMP logic analyzer and its progeny OLS https://sigrok.org/wiki/Openbench_Logic_Sniffer (32 channels at 200MHz), although I think both of these require the proprietary vendor tools to synthesize. I'm gonna go out on a limb here and guess that reliably buffering up data at 6.4 gigabits per second is not a thing that any CPU can do, even one that isn't a softcore; CPUs that run at speeds high enough to potentially do it invariably depend on cache hierarchies that monkeywrench your timing predictability.
As I said, though, I'm not active in the field, so all I know is hearsay.
I'd also add as a bystander, that fpga providing onboard hardware for things like DDR3/4 and PCIe seem like significant development. Really tremendous performance is available in these devices.
This is not far in concept from modern CPUs where only the core and cache run at full speed, although it might be the first 6502 implemented this way. It reminds me of the upgrade processors for 386 and 486 PCs that had a Pentium and cache in a socket-compatible format.
I don't think that would make a very good GPU as it wouldn't fit the data model well [1] as the instructions and data need to share the same bus which would mess up streaming.
Your suggestion is closer to a grid computer but even then I don't think an unmodified 6502 would be a great choice because the memory model (or lack thereof) would really restrict performance.
The LAN controller used to make the Beowulf cluster would probably have more compute (and memory) than the 6502 itself.
The Intel cores in the linked article have a distinct L1 data and instruction caches inside them, and associated L2 caches, which makes a big difference in comparison to the 6502.
I wonder if this would be possible to run on the Xilinx Kintex-7 FPGA employed in the currently-in-dev SoM board for MNT Reform laptop[0]? I know very little about FPGAs so I don't know if it's easily applicable to other boards or what. I mean, I'm not saying it would be _practical_, but it would be pretty cool! haha :)
The Kintex-7's are much bigger than the Spartan 6's. Getting a 6502 core into them wouldn't be the issue. The special thing about this board, though is that it's designed to be pin compatible with a real 6502 CPU, so you can plug it straight into a real 1970's/1980's home computer that used a 6502. If that's not what you want to do you'd need other glue logic to interface the CPU core with whatever you want to interface it with.
6510 compatibility only requires adding 6 IO lines mapped to address $1 though (otherwise the 6510 is 6502 compatible enough), so given it's an FPGA based project it wouldn't necessarily be a big ask, and from the webpage it sounds like they're open to suggestions.
I don't think memory at $00 and $01 are the only challenges. The 6510 has a tristateable bus. A regular 6502 doesn't. The C64 used this to disable the 6510 when the Z80 in the cp/m card was active. (I think it also might be used to allow the VIC II to take over the bus as well, but I'm not positive about that.)
Actually the bigger problem with this design on a second read through is that it tries to mirror the ROM and RAM into a 64K on-chip RAM. That won't work on the C64 because of the bank switching and the fact the VICII can access memory everywhere. You'd have to change it to use the on-chip RAM as a smarter cache.
If you were to disable the use of the on-chip RAM it'd be stalled far more than half the time, as it'd be unable to fetch instructions fast enough.
The bank switching uses three of the IO pins, so if you add support for those as I mentioned that fixes both the bank switching and the tape drive (which uses the remaining IO pins).
EDIT: Actually you're right that there's a problem with the bank switching here since it tries to mirror the system RAM/ROM, and it won't be able to as it has only 64K on-chip RAM. You could conceivable get it to work by designating the entire address space as an "IO area" but it'd totally kill performance.
"It may be possible and worthwhile to also support some slightly later machines: The Acorn BBC Micro, Atari 400 and 800, and maybe the Commodore C64 come to mind."
It'll take a more extensive modification than I thought, though, because it does a RAM mirroring thing that is necessary for performance but that kills compatibility with at least the C64.
If you disable the RAM mirroring, all you need to make it compatible is to map the 6 IO pins to address $1. That "solves" the bank switching, but at the cost of killing performance totally as the chip will be starved for memory access most of the time.
Judging from his pictures, he's using a version of the Spartan 6 (XC6SLX9) that has 72KB on-chip RAM, though, so unless he's using any of the RAM for anything else he could still mirror both the 64KB RAM + the KERNAL and BASIC ROMs. But he'd also need to keep track of various VICII registers to know which areas to designate as a "IO areas" to pass through writes for, given the VICII can address memory "everywhere" for sprites, fonts and bitmap data depending on what you write to different registers. Since that can at any time it'd involve a lot of "fun" logic to flush data from the on-chip cache to the C64 memory if a register changes.
I often ponder all the work put in to make electronics execute series of instructions one by one but where it is undesirable we often lack abstraction for parallel computing. The funny part is how parallel is so "easy" in electronics. My gut says there has to be some simple approach we've all overlooked.
There are supposed to be 50GHz gallium-nitride FPGAs. Might not be big enough to host a 6502 and all its RAM, though. I gather they are mostly used in military SDRs.
Probably by working at a military contractor. The parts are probably still way too expensive for any other use, and probably too specialized for signal processing.
Way back in 2003 or so, we at my old company InformAsic designed a single chip transparent VPN solution for serial communication (RS-232, RS-422).
The control and protocol handling part of the was a modified 6502 core with a sort of MMU and single cycle zero page registers. The whole thing was clocked at 33 MHz, probably making it the fastest 6502 in production at that time. Not that we sold that many of the devices...
As an old c64 scener, I really enjoyed being able to code the application SW using my favorite ASM tools. Though most of the code we actually compiled from C using cc65 and then hand tuned to fit the mem constraints.
Today a simple Cortex M0+ MCU (with internal AES core) would be able to do what we did, and probably be smaller and require less power.
ASIC process generations. The original MOS 6502 was manufactured in really big process - when "Contact" actually meant making contact on the plastic sheets that became the masks. Huge transistors, 5V power supply etc.
Modern Cortex M0+ chips are probably manufactured using 90nm, 65nm process nodes (or possibly even smaller - but the die size will become I/O-bound. Though you can then add more memory easily without driving up the die size). They have much lower core supply, much better I/Os - and low power modes.
In our specific case, we used I believe 250 nm, or possibly even 350 nm ASIC process. And size in this specific case also related to the package. We used a QFN. Today you can get a M0+ based MCU with low number of exposed I/Os in a small BGA or WCP packages that is just a few mm2.
The idea we had at the start was a chip small enough to fit inside the connector of a serial wire, require so little power to not need external power (basically harvesting), be fast enough to not reduce bitrate. Add very low and fixed latency. And be transparent to (after configuration) be totally transparent as seen from the application. Basically a secure serial cable. But reduced to be an extra cable connector that is inserted between an IoT, SCADA device and its serially connected modem.
Due to the process node we didn't really get there. But today this is basically feasible using off-the-shelf MCUs.
I actually found a few of the chips and one of the cable connectors/dongles Yesterday. So I still have a few 33 MHz 6502s ;-)
The 6502 isn’t manufactured with modern processes and technology. It’s still stuck in the eighties for that sort of thing. So basically everything even a few years newer will be smaller and more power efficient.
Of course an M0 on a modern process node would be much smaller but I’d interpreted (probably wrongly) the OP as going further and saying that the M0 would be smaller on a comparable node. With 12k or so gates on an M0 that doesn’t seem possible although maybe modern power management would make it more power efficient.
This is often the first consideration after "does it match the behaviour of the original chip perfectly" when dealing with FPGA reimplementations of other chips.
The MCL65+ is also a drop-in replacement for the 6502 which uses an 800Mhz microcontroller to emulate the CPU, so it can run in cycle accurate and accelerated modes.
https://microcorelabs.wordpress.com
This is pretty awesome, and while it "caches" all of memory, it is conceivable you could just run memory cycles the old fashioned way. And while that doesn't give you a speed bump it does give you the worlds most amazing in-circuit-emulator pod for 6502 designs.
One problem with the Atari 400/800 computers is that they have the ability to use any part of the 64k address space as frame buffer. You might ran zero page on the FPGA and any ROM from a cartridge to get some performance boost though.
One thing is whether this can support single step. Strange but that is what we try when hacking the 6502 and some chips are specially designed to allow single stepping.
Indeed. I took a computer engineering class in undergrad. The capstone project was implementing from scratch a multi-pipeline RISC CPU (including the ALU and a very basic L1 cache) in Verilog that we flashed to FPGAs that we then programmed to play checkers using hand-compiled C. The FPGA was easier to flash and debug than the Spartan-6 mentioned in TFA but was significantly more expensive as well.
It was a brutal class, but it totally demystified computers and the whole industry in a way that made me feel like I really could understand the "whole" stack. Nothing scares me any more, and I no longer fear "magic" in software or hardware.