Nice hack! The next step would probably be a CC which compiles to a minimal virtual CPU instruction set so that we just need some Assembler macros to port it to a native platform (x86, ARM, etc.), plus a library of formats (ELF etc.) to create applications for those platforms.
This was here just 3 days ago: https://news.ycombinator.com/item?id=8558822 ; Now, YOU just need to write those assumbler macros. But why? Are you in some shortage of working compilers?
I am just interested in a small maintainable C compiler which can easily be adopted to bootstrap a new platform or cpu (on FPGA). I consider such a tiny C compiler just a convenient assembler to implement basic things on new platforms.
I know GCC and Clang/LLVM but those tool chains are really big, and I guess it requires a lot of work to support a new platform and to polish all things so that everything works properly.
A portable tiny GCC/LLVM replacement for small embedded systems would be really nice. GCC and Clang are great for powerful systems but I consider them overkill for small systems.
Fabrice Bellard's "tiny cc"[0] is a great sweet spot between "not even a portable assembler" like CC500 and C4, and GCC/LLVM:
It is a full-blown C compiler with enough extensions to compile the Linux kernel (and boot to it directly - look at the "tccboot" project). But it's also small enough and simple enough to retarget in a few days work (make that a few month if you want optimizations ... but you don't seem to).
I believe this is how modern compilers work anyway. GCC as well as clang (via llvm) have multiple frontends (one for each programming language) and multiple backends (one for each target platform.)
Nice project, but is it just me or is calling this a "C" compiler a bit of a stretch? Couldn't the title read CC500: a tiny self-hosting subset-of-C compiler.
I realize it looks like C, but then so does anything with the words if, while, and braces. (It doesn't seem to have the word "for".) Since it says the compiler "does not even parse the types", that part seems like window-dressing and not part of the language.
I bet you could write the complete reference for the language this compiles in a paragraph.
Does 767 LoC / 17KB really get to be considered tiny? I mean, sure, it's small and efficient.. but there are other contenders to consider, such as Fabrice Bellard's [0] tiny obfuscated self-hosting c-compiler in under 500 lines.
How can we unambiguously distinguish between what qualifies for the classification of a "tiny" c-compiler versus just a "small" one?
767 LoC of code including comments describing the grammar. Strip out the comments and you're already do to 600. Move opening brackets and a other very few steps, and you get it down to 560. Getting this below 500 lines should be fairly trivial.
So if your comparison is the completely unreadable otcc, then I'd say yes, this qualifies.
The distinctions are quite arbitrary since there currently isn't anything like a size-competition for compilers, but if forced to make categories, I'd consider <1kLoC to be "tiny" and <10kLoC "small".
I am playing with your code. But all my inputs end with error in the program function. Could you provide one meaningful example, which cc500 compiles completely?
And it also means (thought this used to be more significant) that porting it to a new platform "just" means re-targeting the code generator and recompiling.