According to this page, LLVM-MOS seems to be pretty soundly beaten in performanc...

sehugg · 2025-12-01T00:28:03 1764548883

I've implemented Atari 2600 library support for both LLVM-MOS and CC65, but there are too many compromises to make it suitable for writing a game.

The lack of RAM is a major factor; stack usage must be kept to a minimum and you can forget any kind of heap. RAM can be extended with a special mapper, but due to the lack of a R/W pin on the cartridge, reads and writes use different address ranges, and C does not handle this without a hacky macro solution.

Not to mention the timing constraints with 2600 display kernels and page-crossing limitations, bank switching, inefficient pointer chasing, etc. etc. My intuition is you'd need a SMT solver to write a language that compiles for this system without needing inline assembly.

ddingus · 2025-12-01T08:25:04 1764577504

A very simple BASIC compiled pretty well! It did feature online assembly, and I agree with you on this necessary point especially concerning the 2600!

See Batari Basic

zozbot234 · 2025-11-30T22:02:12 1764540132

AIUI, Oscar64 does not aim to implement a standard C/C++ compiler as LLVM does, so the LLVM-MOS approach is still very much worthwhile. You can help by figuring out which relevant optimizations LLVM-MOS seems to be missing compared to SOTA (compiled or human-written) 6502 code, and filing issues.

asiekierka · 2025-12-01T20:56:07 1764622567

We already know what the main remaining issue is - LLVM-MOS's register allocator is far from optimal for the 6502 architecture. mysterymath is slowly working on what may become a more sutiable allocator.

HarHarVeryFunny · 2025-12-01T22:04:28 1764626668

There is a video below of mysterymath presenting LLVM-MOS where he talks about reserving 32 bytes of zero page to present to LLVM as 16 16-bit registers, to be able to utilize it's register allocator, which does seem a sane approach.

https://www.youtube.com/watch?v=ejbTKtgSZI0

However his github doesn't show any activity on it in the last 2 years.

https://github.com/mysterymath/llvm-mos

mysterymath · 2025-12-01T23:13:11 1764630791

Lol? https://github.com/mysterymath/llvm-mos/tree/regalloc

HarHarVeryFunny · 2025-12-02T13:26:58 1764682018

My bad - was just looking at the main branch.

djmips · 2025-12-01T04:32:00 1764563520

I feel like no amount of optimizations will close the gap - it's an intractable problem.

HarHarVeryFunny · 2025-12-02T13:22:59 1764681779

I wouldn't say intractable, but it's not clear whether LLVM's optimization framework is flexible enough for it.

From mysterymath's (LLVM-MOS) description, presenting some of zero page as 16 bit registers (to make up for lack thereof, and perhaps due to LLVM not having any other support for preferred/faster memory regions), while beneficial, still had limitations since LLVM just assumes that there will be FAST register-register transfer operations available, and that is not even true for the 6502's real registers (no TXY), let alone these ZP "registers" which would require using the accumulator to copy.

A code generation/optimization approach that would seem more guaranteed to do well on the 6502 might be to treat it more as tree search (with pruning) - generate multiple branching alternatives and then select the best based on whatever is being optimized for (clock cycles or code size).

Coding for the 6502 by hand was always a bit like that ... you had some ideas of alternate ways things could be coded, but there was also an iterative phase of optimization (cf search) to tweak it and save a byte here, a couple of cycles there ...

I've mentioned elsewhere I used to work for Acorn back in the day, developing for the BBC micro, with me and a buddy developing the ISO-Pascal system which was delivered in 2 16KB ROMs. Putting code in ROM gives you an absolute hard size budget, and putting a full Pascal compiler, interpreter, runtime library and programmers editor into a total 32KB was no joke! I remember at the end of the project we were still a few hundred bytes over what would fit in the ROMs, and had to fight for every byte to make it fit!

djmips · 2025-12-02T17:32:13 1764696733

It is my conjecture that due to the 8 bit index registers, contrast that to 6800, 6809 and others, the 6502 becomes fundamentally a structure of arrays (SOA) system versus C which is coupled in it's libraries and existing code base with array of structures (A0S).

Optimizing code will never solve good data oriented design. This is just one of the reasons that Asm programmers routinely beat C code on the 6502. Another one is math. In the C language specification, if fixed point had been given equal footing with float that would also help.

These are such a blind spos that you rarely even see custom 6502 high level languages accommodate these fundamental truths.

BTW, growing up on the 6502, I had no problems moving into the very C friendly 68000 but later going backwards to the 6809, on the surface it looked so much like a 6502 that I was initially trying to force SOA in my data design before realizing it was better suited to AOS.

HarHarVeryFunny · 2025-12-03T03:31:48 1764732708

If we're comparing performance of code generated by a C compiler vs hand optimized assembler, then for it to be an apples-to-apples comparison the same data structures (e.g. SOA or AOS) need to be used in both cases.

djmips · 2025-12-03T08:33:55 1764750835

Yes, that's true and should be how good 6502 high level code would be written.

HarHarVeryFunny · 2025-12-03T13:10:47 1764767447

Yep. C was always meant to be a "close to the metal" language providing a feature set that could be mapped pretty directly to the processors it was running on. It's a "low level, high level language" where the expectation is more what you see is what you get (WYSIWYG), even though a modern optimizer might be expected to remove invariant code out of loops, etc - localized efficiency gains, but not large scale transformation.

So, optimal C targetting the 6502 is not going to look much like C targetting a modern processor. The developer still needs to be very aware of the limitations of the processor they are targetting.

One somewhat radical thing that LLVM-MOS does is to analyze the program's call graph, and for functions that are not used recursively it will assign parameters and local variables to zero page instead, both for speed of access and to avoid need for a stack frame. Even though this violates the WYSIWYG mental model, this is a nice abstraction of what the assembly language programmer would have done themself.

djmips · 2025-12-03T13:46:48 1764769608

>One somewhat radical thing that LLVM-MOS does is to analyze the program's call graph, and for functions that are not used recursively it will assign parameters and local variables to zero page instead, both for speed of access and to avoid need for a stack frame. Even though this violates the WYSIWYG mental model, this is a nice abstraction of what the assembly language programmer would have done themself.

very nice, it sounds similar to the 'compiled stack' concept. I've seen that here in the co2 language for the 6502

"Variables declared as subroutine parameters or by using let are statically allocated using a "compiled stack", calculated by analyzing the program's entire call graph. This means scopes will not use memory locations used by any inner scopes, but are free to use them from sibling scopes. This ensures efficient variables lookups, while also not wasting RAM. However, it does mean that recursion is not supported."

https://github.com/dustmop/co2

zozbot234 · 2025-12-02T17:53:17 1764697997

There is a C standard extension for embedded/low-level programming which specifies fixed point arithmetic. (And other goodies such as hardware register access and multiple address spaces.)

fooker · 2025-12-01T05:29:14 1764566954

It's performance of generated code, not performance of the compiler.

djmips · 2025-12-02T17:34:36 1764696876

kwertyoowiyop · 2025-12-01T00:37:34 1764549454

Aztec C had both native and interpreted code generation, back in the day.

bbbbbr · 2025-11-30T22:21:07 1764541267

With regard to code size in this comparison someone associated with llvm-mos remarked that some factors are: their libc is written in C and tries to be multi-platform friendly, stdio takes up space, the division functions are large, and their float support is not asm optimized.

HarHarVeryFunny · 2025-11-30T22:51:27 1764543087

I wasn't really thinking of the binary sizes presented in the benchmarks, but more in general. 6502 assembler is compact enough if you are manipulating bytes, but not if you are manipulating 16 bit pointers or doing things like array indexing, which is where a 16-bit virtual machine (with zero page registers?) would help. Obviously there is a trade-off between speed and memory size, but on a 6502 target both are an issue and it'd be useful to be able to choose - perhaps VM by default and native code for "fast" procedures or code sections.

A lot of the C library outside of math isn't going to be speed critical - things like IO and heap for example, and there could also be dual versions to choose from if needed. Especially for retrocomputing, IO devices themselves were so slow that software overhead is less important.

djmips · 2025-12-01T04:38:58 1764563938

More often than not the slow IO devices were coupled with optimized speed critical code due to cost savings or hardware simplification. Heap is an approach that rarely works well on a 6502 machine - there are no 16 bit stack pointers and it's just slower than doing without - However I tend to agree that a middle ground 16 bit virtual machine is a great idea. The first one I ever saw was Sweet16 by Woz.

HarHarVeryFunny · 2025-12-01T14:33:07 1764599587

I agree about heap - too much overhead to be a great approach on such a constrained target, but of course the standard library for C has to include it all the same.

Memory is better allocated in more of a customized application specific way, such as an arena allocator, or just avoid dynamic allocation altogether if possible.

I was co-author of Acorn's ISO-Pascal system for the 6502-based BBC micro (16KB or 32KB RAM) back in the day, and one part I was proud of was a pretty full featured (for the time) code editor that was included, written in 4KB of heavily optimized assembler. The memory allocation I used was just to take ownership of all free RAM, and maintain the edit buffer before the cursor at one end of memory, and the buffer content after the cursor at the other end. This meant that as you typed and entered new text, it was just appended to the "before cursor" block, with no text movement or memory allocation needed.

zozbot234 · 2025-12-01T08:17:20 1764577040

> I think the ideal compiler for 6502, and maybe any of the memory-poor 8-bit systems would be one that supported both native code generation where speed is needed as well as virtual machine code for compactness.

Threaded code might be a worthwhile middle-of-the-way approach that spans freely across the "native" and "pure VM interpreter" extremes.

anthk · 2025-12-01T10:21:25 1764584485

If it runs fast under an AppleI, it will run fine in the rest.