Hey, author here! Awesome to see this on Hacker News, and that so many still care for this great computer! Happy to answer any questions/take feedback if you got something on your mind.
Does cross-developing in 6502 asm still give much advantage compared to C (via e.g. cc65)? Or have modern tools more or less closed the gap?
The question arose after I played the impressive modern atari 8-bit (48kb ram) game His Dark Majesty [1], which development the author documented on his homepage [2] & used c.
There's something uniquely satisfying about writing code in a limited environment like the C64. No libraries or operating systems -- just you and the hardware!
You can still do this! The register interface for the peripherals on modern microcontrollers is only a tiny bit more complicated (and in some ways significantly easier) than what we had on 8 bit PCs.
You can absolutely grab a generic gcc/binutils cross toolchain for an ARM Cortex-M, an AVR, or even an Intel Quark board and start writing simple to-the-metal code today. It's not hard, or at least it's not meaningfully harder than Apple II or C64 hacking.
It's just that no one does it. All the vendors ship integration libraries and fancy IDEs for all these boards, so what everyone chooses in practice is one of these, or an RTOS that provides an even more abstracted view. And that's not a bad idea either of course.
But the hardware is still there, and still fun, and still useful for real tasks in the modern world. Start hacking.
I've never hacked assembly on an AVR, but I have on the MSP430, and it's a really elegant architecture to work on; expressive enough to let you get stuff done without hurling your development board at the wall, but constrained enough to make figuring out a neat optimisation both satisfying and useful.
Also, the MSP430X has 20-bit registers, just for that added level of surreality.
However, even I have limits: stay clear of PIC12. It's just... bizarre, and not in a good way.
(Somewhere I have 2/3 of an 8080 emulator for the MSP430, written in machine code. It ran on the basic Launchpad, with 2kB of RAM, and booted CP/M using an SD card for storage and an external serial SRAM device for memory swap. Sadly, while I got it talking to an AT keyboard, I never got a screen to work, and I've since broken up the hardware for other projects. The emulator core was 1kloc of assembly. The 8080 is surprisingly orthogonal.)
Those vendor libraries are super ugly and take the fun out of a project.. I absolutely hate the tedious "work" programming where you plug bits of pre-built technology together without actually doing much of anything on your own.
A while back I had an STM32F7 (ST cortex M7) board that I wrote some assembly code for. Getting correct clock signals and power to all the peripherals and bringing up RAM and configuring devices sure was complex (and it's not explained by the datasheet in a particularly straightforward way), but doing it that way is certainly enlightening and interesting, compared to fiddling with a vendor-supplied toolchain.
However... I would recommend not using a Cortex M device if you decide to go the ARM route. Pick a full classic-arm-ISA device so you can use the more human-oriented instruction set. You can stills witch to THUMB if you want, but you won't be forced to deal with its quirks and limitations.
Did you ever see this link [1] regarding the GBA anti-emulation features some time back? It's a bit tangential and slightly off-topic, but it might be of interest if you haven't!
Beautiful assembly language, and a software assisted video system combine to make a lot of fun. It's possible to output video that closely matches the 8 bit era and then add sprites, etc...
I had a lot of fun duplicating old school effects, displaying images with artifacts, etc...
You can get a dev board, or just wire one up on a breadboard. Easy.
There are 6502 and Z80 emulations done too.
The real fun part is this chip is a concurrent multiprocessor. Makes combining a 6502, say video system and sound easy and fun.
I ended up learning a ton and have sense used the chip for some industrial R&D. Nice spiff, for having a play.
Indeed, it is ultimately satisfying! It is also very fun to have one of these old machines, and still see new software being written for it (http://oric.org/)
Apropos amazing things on limited environments, here is a working 3D modelling environment, with Blender model import, running on .. the PICO-8 fantasy console:
Since it's a javascript 6502 emulator, you get maybe 60% of the fun of programming for a C64, but just by visiting a web page. On a single page, they take you from making 3 colored pixels to a functioning snake game with explanations of what's happening along the way.
...one of the items on my Very Long List Of Potential Projects is a self-hosted compiler for a proper modern type-safe C-like language for the 6502. I was planning to target the BBC Micro, as it's the only 6502 8-bit I've found with a real operating system, but it ought to be easy to port.
Making it self-hosting means lots of interesting design choices: I was planning that it'd be a multipass streaming design, where each stage would consume tokens from the previous stage and write out tokens for the next stage, keeping in-memory as little state as possible; depending how much RAM you had you could pipeline stages together for faster compilation, provided you were willing to have less memory available for symbol information.
One other interesting feature is that a lot of these early processors don't really do stack-relative addressing, so stack-frame-based languages like C and Pascal generally work really badly. (The Z80 failed at this, too.) The solution here is easy: forbid recursion. Now there are no stack frames! So each variable is effectively static. This simplifies the design no end, as well as playing to the processors' strengths in being basically memory-memory architectures.
Deciding what variables to put where becomes an easy and effective exercise in optimisation. On the 6502, placing frequently-accessed variables in zero page makes the code smaller and faster. Walking the call tree and figuring out which variables are never used at the same time allows variable storage to overlap. And since calling a function involves copying values directly from the caller into the function's parameters, which are just static variables like everything else, then if you can find places where the value lives at the same address in both the caller and the function, then you can skip the copy entirely...
You'd probably want to disallow/discourage structs as well, favoring striped arrays, because the $aaaa,x addressing mode is a lot more efficient than doing pointer arithmetic in zero-page. (Would be cool if the compiler could figure this out, too...)
I probably wouldn't want to do automatic striping of arrays of structures, because it'd only make sense in some circumstances, but it'd be trivial to add logic to the code generator to use indexed addressing modes --- you'd need to check to see if the thing being indexed was an array with bounds small enough to be indexed with an 8-bit variable.
<thinking out loud follows>
...that would produce something like:
lda index
rol a # scale for 16-bit values
tax
lda address+0, x # read low byte
ldy address+1, x # read high byte
== 11 bytes (assuming index and address weren't in zero page). But of course if I were reading a 16-bit value, I'd probably be wanting to dereference it as a pointer or do arithmetic on it, so I wouldn't want to read it into registers all at once --- the 6502 likes streaming maths through the accumulator a byte at a time.
Something like 'result = array1[index] + array2[index]' would end up:
lda index
rol a
tax
clc
lda array1+0, x
adc array2+0, x
sta result+0
lda array1+1, x
adc array2+1, x
sta result+1
== 24 bytes!
(When I was about 10, the 6502's lack of 16-bit arithmetic made me very sad. It still does, but it did then, too.)
This sounds a lot like the language Action! (https://sourceforge.net/projects/atari-action/) which was popular on the Atari 8-bit systems. Very C and Pascal-like, but with tweaks to make it work better on the 6502.
I might have to dust off the LADS assembly code of a decades old game I was trying to make during the summer after I graduated from collage...