Even today, when you look at the core routines of for instance X windows C is st...

barrkel · on Sept 14, 2010

There simply isn't a 1:1 correspondence between C and its machine code; not in size, and not in functionality.

Any half-decent compiler is going to perform a non-trivial transformation of a big switch statement to make it efficient. The expected performance semantics of a switch (something better than O(n) in the number of cases) rules out simplistic iterated jumps in big cases. The programmer expects O(1), or at worst, O(log n).

But more importantly, CPUs generally have many more capabilities than are exposed by C, and this is where the "1:1 correspondence" really falls down. Assemblers generally have disassemblers that you can transparently round-trip through. That's a little harder in C.

Perhaps you meant a injective relation, rather than bijective? But that's a long way short of an assembler.

Someone · on Sept 14, 2010

"But more importantly, CPUs generally have many more capabilities than are exposed by C."

Hear, hear! Moreover, that has always been the case. As an example, try implementing multiple-word addition in C. On most architectures, you will learn that not having access to a carry bit makes that harder than it could be.

jacquesm · on Sept 14, 2010

Yes, you're right, the switch statement is a good example of how modern compilers fudge the boundary.

But the main point is that the difference between the generated code and the stuff you write is relatively small, when looking at the assembly that a C compiler generates I have relatively little trouble following the relationship between the two, and I can make reasonable predictions about what will pop out on the other end.

And of course processors are 'richer' than what most C compilers will use, especially when it comes to special instructions that have no equivalent in the C language.

I've worked on a 'decompiler' for the Mark Williams C compiler (yes, that's pretty long ago), and at the time the above still held true, today the boundaries are definitely fuzzier, mostly due to the increased smarts of compiler writers for the optimization stage.

Gcc is clever enough to optimize whole branches of code out of existence if you set it to be aggressive enough and the code was written naively, that's one way of dramatically losing that 1:1 correspondence.