This is a super fun project to play around with. I'm currently trying to build a C64 sprite multiplexer in mostly straight C++ using LLVM-MOS and while it's definitely not going to be the most optimized multiplexer out there, I'm finding it mostly adequate performance-wise.
Whereas it's probably a long way away from being able to use this for democoding due to the mostly cycle-accurate nature of the effects, for creating homebrew games and utilities this is really quite suitable: write high-level logic in C or C++ (or any language supported by LLVM really) and sprinkle in some inline assembly here and there for the really performance critical parts.
The code that is being generated is already quite good and (on first glance) looks better than what's coming out of cc65, for instance.
There's also rust-mos which uses LLVM-MOS to compile Rust code. It unfortunately still has some issues (e.g. [1]) but that is looking really promising as well.
A large part of making this work, requires basically creating a virtual machine, ALA SWEET16, which is what Woz needed to do to get parts of BASIC to work.
But, that will _never_ be as fast as hand coded 6502 assembly unless compilers/etc get a _LOT_ smarter. I came to the realization many years ago, that while the 6502 and the software written for it is amazing, its fundamentally incompatible with modern software development. Thats because in order to create high performace 6502 application three major tenents of software development _must_ be violated.
1: No self modifying code
2: Avoid global variables
3: Use structured programming rather than goto spaghetti.
On the 6502 there are provable reasons why not using self modifying code, globals (frequently on the zero page), and goto's vs function calls are slower. When looking at cycle times, sometimes its just a cycle here/there being saved, but you have to realize that saving a cycle on an instruction can frequently be anywhere from 20-50% faster just on that instruction. Sometimes one can afford to run 10x slower, but on a processor running at just a few Mhz (or 1Mhz in many cases) that can be the difference between being able to realistically solve a problem in realtime or creating a batch job.
Que? We don't compile to a virtual machine, we compile to plain 6502 assembly. We do use 32 bytes of the zero page as a de-facto register file, but that's not at all uncommon in hand-written assembly either, it's just usually less explicit.
The prevalence of self-modifying code also heavily depends on what community of 6502 developers you're part of. It's relatively niche on platforms where most code is in ROM, e.g. most game consoles. For example, the NES only has 2KiB of RAM, but can support upwards of 512KiB of ROM, so it's downright wasteful to place code in RAM.
As for 2 and 3, LLVM-MOS lifts local variables to global memory or the zero page wherever possible, and is able to use function aliasing information to allow the globalised versions of local variables in different functions to share the same memory.
I'm not trying to take anything away from that work, but I view what is going on there as a kind of VM. It may not have an interpreter loop, but it seems closer to a VM/AOT type compilation than the code gen you expect from a "normal" compiler. Largely because so much needs to be emulated.
And my comment is less about the compiler and more about how people write modern C code. Usually under the assumption that various C abstractions are basically zero cost (because they tend to be) and translating that into efficient 6502 code is difficult. And my impression is mostly effected by 3+ decade old memories of apple][ programming which were frequently a mix of higher level interpreted code (usually using some kind of efficient bytecode via applesoft, or p-code, etc) and assembly routines. Where a big part of the battle was fitting everything into 64-128K of RAM, while keeping the speed up so being able to do 16-bit operations and the like with a 1 byte opcode picked out by an interpreter was part of the advantage vs writing assembly which called routines (or frequently parts of applesoft itself) directly. The former could get a 2x+ code side savings.
Maybe some of that is less important these days, because the platform is largely running in emulation or people are packing their machines with a lot of ram because its basically free. My (well one of them lol) IIGS has 8M of ram I added maybe 15 years ago, and I have a pile of 256K-1M RAM cards. Either way code size is maybe not as big of a deal these days because 512K of ROM is a lot easier than 64k. Nor in some ways is losing a multiple of the possible speed because no one is seriously trying to use these computers day to day. Back in the late 1980's I wrote a text editor in basic (for editing assembly code for my assembler) and then spent a lot of time optimizing pieces of it just to make it usable while still being able to edit assembly programs that were a few tens of K of source. In the end the line input ended up being entirely assembly because simple things like blinking the cursor simply took to much time in applesoft.
Thinking back, adding a linker, might have been a better use of my time, but 64k at 1Mhz put a limit on how big the resulting code could be.
There's nothing wrong with self-modifying code as a compiler-implemented optimization technique. At the end of the day it's just a more powerful variant of goto; you're just directly editing the continuation of a running program.
Interesting, but is there any chance that this might be upstreamed to LLVM proper? The 6502 is a well-understood ISA, so having a backend for it might be quite convenient. The static stack allocation feature in this implementation (for non-reentrant functions) is also super cool and might be useful for more than just the 6502.
This has come up off and on, and it's looking increasingly unlikely. I've had to do some nasty surgery to the loop strength reduction pass, and I haven't yet found a way to generalize it.
Still, I'm not ruling it out, it just never seems to be very high on the TODO list. I've also started to do more work upstream whenever the situation calls for it; quite a few of the improvements we've done to LLVM do generalize to other targets, so much of that can and should be upstreamed indepenently.
Yes, it has improved. However I always use it to write new code and my code is simple - it didn't start out trying to be standards compliant so you may still run into trouble if you are using the full vocabulary of C and in this case you may find cc65 or the LLVM-MOS 6502 backend more suitable. KickC's advantage is that it does produce reasonably fast code and it integrates into the Kick assembler workflow.
I think this is the 8th separate attempt I've seen to get LLVM to output 6502 binaries. The others involved a presentation showing the issues involved and how the creator has got it mostly working and made a hello world app or simple game with it. After that the project disappears. Hopefully this time it'll be different.
It is. It works and compiles relatively non-trivial programs correctly (I tried a PDP-8 emulator and a printf library) with surprising optimisations in spots. On -O3 it turned a print loop over a string array with a mask applied to each character, into applying the mask to the constant at compile-time with a sequence of unrolled LDA#/JSR. Very impressive.
Whereas it's probably a long way away from being able to use this for democoding due to the mostly cycle-accurate nature of the effects, for creating homebrew games and utilities this is really quite suitable: write high-level logic in C or C++ (or any language supported by LLVM really) and sprinkle in some inline assembly here and there for the really performance critical parts.
The code that is being generated is already quite good and (on first glance) looks better than what's coming out of cc65, for instance.
There's also rust-mos which uses LLVM-MOS to compile Rust code. It unfortunately still has some issues (e.g. [1]) but that is looking really promising as well.
Kudos to all involved :)
[1] https://github.com/mrk-its/rust-mos/issues/16