Register renaming was invented to give assembler code generated from C enough va...

monocasa · on Oct 2, 2018

Register renaming is older than C (the first computer with full modern OoOE was the 360/91 from 1964). It has more to do with scheduling dynamically based on the runtime data flow graph than anything about C (or any other high level language).

psykotic · on Oct 4, 2018

Agreed with your historical information, but the comment about function calls and calling conventions is not without merit. If you have 256 architectural registers you still can't have more than a couple callee-save registers (otherwise non-leaf functions need to unconditionally save/restore too many registers), and so despite the large number of registers you can't really afford many live values across function calls, since callers have to save/restore them. Register renaming solves this problem by managing register dependencies and lifetimes for the programmer across function calls. With a conventional architecture with a lot of architectural registers, the only way you can make use of a sizable fraction of them is with massive software pipelining and strip mining in fully inlined loops, or maybe with calls only to leaf functions with custom calling conventions, or other forms of aggressive interprocedural optimization to deal with the calling convention issue. It's not a good fit for general purpose code.

Another related issue is dealing with user/kernel mode transitions and context switches. User/kernel transitions can be made cheaper by compiling the kernel to target a small subset of the architectural registers, but a user mode to user mode context switch would generally require a full save and restore of the massive register file.

And there's also an issue with instruction encoding efficiency. For example, in a typical three-operand RISC instruction format with 32-bit instructions, with 8-bit register operand fields you only have 8 bits remaining for the opcode in an RRR instruction (versus 17 bits for 5-bit operand fields), and 16 bits remaining for both the immediate and opcode in an RRI instruction (versus 22 bits for 5-bit operand fields). You can reduce the average-case instruction size with various encoding tricks, cf. RVC and Thumb, but you cannot make full use of the architectural registers without incurring significant instruction size bloat.

To make the comparison fair, it should be noted that register renaming cannot resolve all dependency hazards that would be possible to resolve with explicit registers. You can still only have as many live values as there are architectural registers. (That's a partial truth because of memory.)

There are of course alternatives to register renaming that can address some of these issues (but as you say register renaming isn't just about this). Register windows (which come in many flavors), valid/dirty bit tracking per register, segregated register types (like data vs address registers in MC68000), swappable register banks, etc.

I think a major reason register renaming is usually married to superscalar and/or deeply pipelined execution is that when you have a lot of physical registers you run into a von Neumann bottleneck unless you have a commensurate amount of execution parallellism. As an extreme case, imagine a 3-stage in-order RISC pipeline with 256 registers. All of the data except for at most 2 registers is sitting at rest at any given time. You'd be better served with a smaller register file and a fast local memory (1-cycle latency, pipelined loads/stores) that can exploit more powerful addressing modes.

swolchok · on Oct 2, 2018

Why do you claim that local variables and function calls are specific to C? They seem to be very popular among programming languages in general.

Annatar · on Oct 3, 2018

Because I'm an assembler coder, and when one codes assembler by hand, one almost never uses the stack: it's easy for a human to write subroutines in such a way that only the processor registers are used, especially on elegant processor designs which have 16 or 32 registers.

pjmlp · on Oct 3, 2018

I surely did use the stack a lot, back in the old days when coding Z80 and 80x86 Assembly.

Annatar · on Oct 3, 2018

You almost had to, because both Z80 and x86 processors have a laughably small number of general purpose registers. To write code which works along that limitation would have required lots of care and cleverness, far more than on UltraSPARC and MC68000.

On MOS 6502 we used self-modifying code rather than the stack because it was more efficient and that CPU has no instruction and data caches, so they couldn't be corrupted or invalidated.