Back in the mid 80's, I was an operating system architect at IBM during the first couple of versions of AIX (IBM's Unix system). This ran on a new RISC architecture that is now called POWER.
The idea for RISC came from the realization that complex instructions in a CISC processor still had to run a set of lower level functions provided by the hardware. Consider, for example, the CISC architecture of the DEC PDP-11 computer (the system that Unix was originally developed on). It's indirect addressing modes where used to store or load from an address found in a register; this is a frequently used addressing mode on both RISC and CISC machines. However, the PDP-11, being a CISC machine, had eight variations of indirect addressing that automatically incremented/decremented the memory address in the register by one or two, before or after the location was used, etc.
As an assembly language programmer I liked this because it seemed like I could get more work done in a single instruction when I was iterating over an array of data. However, this is an illusion. The hardware still had to do the work so that single instruction using autoincrement indirect addressing took more time to run.
RISC machines generally have a few very simple load and store instructions to access memory and most other instruction work on registers alone. The underlying hardware is more straight forward. The instructions have more predictable running time. It is easier to perform out of order execution and speculative execution since the instructions can be arranged to use non-overlapping set of registers.
For these reasons, the researchers developing RISC believed that they could achieve the same speed as CISC by running more instructions each running a bit faster than CISC instructions.
At this point, one might think that it is kind of a six of one, half-dozen kind of comparison where there is no clear advantage. But RISC has another advantage over CISC. Because the hardware is so straight forward it is easier for compilers to do very sophisticated optimizations. CISC computers often have special registers used by certain instructions differently than other registers. RISC computers usually have a larger number of basically identical registers. This makes register allocation much easier for compilers. On RISC computers, the instructions are often all of the same size again making it easier for compilers to arrange the instructions to fit better in the highest performing cache memory. The idea is that RISC is friendlier to compilers, and the combination of the fast simple instructions and advanced compilers will outperform the CISC machines.
This all sounds good and I consider IBM's POWER and the ARM architecture a success, but Intel is full of very smart people and they have proven that it's not entirely clear that RISC is better than CISC. Some complex instructions are just very useful, like Intel's vectorization instructions and the 2013 instructions that accelerate the calculation of SHA and SHA-256.
Lot's of factors are important in general purpose processor designs: virtual memory support (IBM's POWER has an inverted page table design for example), multiple compute units, virtualization, multicore, caches, good support for JIT compiler designs not just the AOT compilers envisioned when RISC was first being developed.
Intel's success and the need for backward compatability has shackled its current designs, and they have done very well despite this. Although I like the idea of a simpler faster architecture (RISC), Intel might be developing their own next generation processor architecture right now because this would fend off RISC-v and AMD too. They might come out with a new design that was RISC or CISC or a hybrid; it might even be wildly different like a very long instruction architecture.
I don't know of any public information at that level. Here's a few tidbits.
The system was originally called the IBM RT PC, but it wasn't really a PC, it was a Unix workstation. We were going for the market dominated by companies like Sun, HP, DEC, and Apollo. However, IBM already had a range of computers that were important to its business. At the low end, there was the PC-AT running OS/2, but we were shooting for higher performance and higher price point. We had to do this without threatening the interesting AS/400, a mid range business machine, and the large IBM/390 mainframe computers. Traditionally the IBM/390 systems required a dedicated machine room with an elevated floor and fire suppression equipment and so forth. However, they were interested in producing a desk-side 390 machine and didn't understand why we would want to have RISC hardware running Unix, instead of running a personal 390 architecture machine the size of a file cabinet with the VM/390 operating system. I think the name was picked to make us less threatening to other well established lines of business.
IBM's John Cocke did foundational work on RISC systems at IBM Research and won the Turing award while I was at IBM. He didn't work in Austin where AIX and the RT PC (RS/6000) development was going on, but he did stop by to talk to me a few times when he had business down the hall from my office in Austin. He was very interesting. I was just a young OS Architect, but I was working on things he was interested in. I got to work with other IBM Fellows too and learned a great deal from these talented people.
The plan for AIX was complicated because we were supposed to produce a working version of AIX at the same time that the hardware guys finished the PC RT (the first RS/6000 hardware). The compiler was developed at IBM Research (probably T.J Watson Research or Hawthorn Research centers I had occasion to visit both, but I can't remember which was the location for the PL.8 compiler) so the Austin team didn't have to worry about that.
AIX was based on Unix System V with some additions from 4.3BSD. Because we didn't have a stable hardware platform (as it was being developed along side our efforts) AIX adopted a micro-kernel, written in PL.8. The micro-kernel interacted with the hardware while providing a consistent abstraction of the hardware for a traditional Unix kernel written in C running on top of it. For example, the virtual memory manager was written in PL.8 and was in the micro-kernel as were the floating point exception handlers that took care of corner cases to provide a clean IEEE floating point abstraction to higher levels of the OS.
I didn't enjoy PL.8 very much. I had written PL/1 for assignments as an undergraduate years before, and although PL.8 was intended to keep just 80% of PL/1, it still wasn't my cup of tea. In truth, I never had to write any PL.8 code. I attended code reviews, but the team doing the VM was so good that I didn't need to dive into the code very deeply for the virtual memory manager.
Before the first release of AIX, I did work on integrating the Unix file system with the micro-kernel virtual memory management. I was also responsible for the design of the distributed file system DS.
Like you, I've always had an interest in compilers and programming languages. I happen to have John Cocke's 1970 book Programming Languages and Their Compilers. It's one of my oldest books on compilers.
The idea for RISC came from the realization that complex instructions in a CISC processor still had to run a set of lower level functions provided by the hardware. Consider, for example, the CISC architecture of the DEC PDP-11 computer (the system that Unix was originally developed on). It's indirect addressing modes where used to store or load from an address found in a register; this is a frequently used addressing mode on both RISC and CISC machines. However, the PDP-11, being a CISC machine, had eight variations of indirect addressing that automatically incremented/decremented the memory address in the register by one or two, before or after the location was used, etc.
As an assembly language programmer I liked this because it seemed like I could get more work done in a single instruction when I was iterating over an array of data. However, this is an illusion. The hardware still had to do the work so that single instruction using autoincrement indirect addressing took more time to run.
RISC machines generally have a few very simple load and store instructions to access memory and most other instruction work on registers alone. The underlying hardware is more straight forward. The instructions have more predictable running time. It is easier to perform out of order execution and speculative execution since the instructions can be arranged to use non-overlapping set of registers. For these reasons, the researchers developing RISC believed that they could achieve the same speed as CISC by running more instructions each running a bit faster than CISC instructions.
At this point, one might think that it is kind of a six of one, half-dozen kind of comparison where there is no clear advantage. But RISC has another advantage over CISC. Because the hardware is so straight forward it is easier for compilers to do very sophisticated optimizations. CISC computers often have special registers used by certain instructions differently than other registers. RISC computers usually have a larger number of basically identical registers. This makes register allocation much easier for compilers. On RISC computers, the instructions are often all of the same size again making it easier for compilers to arrange the instructions to fit better in the highest performing cache memory. The idea is that RISC is friendlier to compilers, and the combination of the fast simple instructions and advanced compilers will outperform the CISC machines.
This all sounds good and I consider IBM's POWER and the ARM architecture a success, but Intel is full of very smart people and they have proven that it's not entirely clear that RISC is better than CISC. Some complex instructions are just very useful, like Intel's vectorization instructions and the 2013 instructions that accelerate the calculation of SHA and SHA-256.
Lot's of factors are important in general purpose processor designs: virtual memory support (IBM's POWER has an inverted page table design for example), multiple compute units, virtualization, multicore, caches, good support for JIT compiler designs not just the AOT compilers envisioned when RISC was first being developed.
Intel's success and the need for backward compatability has shackled its current designs, and they have done very well despite this. Although I like the idea of a simpler faster architecture (RISC), Intel might be developing their own next generation processor architecture right now because this would fend off RISC-v and AMD too. They might come out with a new design that was RISC or CISC or a hybrid; it might even be wildly different like a very long instruction architecture.