Hacker News new | past | comments | ask | show | jobs | submit login
iAPX432: Gordon Moore, Risk and Intel’s Super-CISC Failure (thechipletter.substack.com)
133 points by klelatti on April 2, 2023 | hide | past | favorite | 102 comments



There's something about this that gives the impression of an ISA that was developed in simulation without any real regard for hardware and then just tossed over the wall to be realized in silicon. I'm particularly baffled by instructions not being byte-aligned. Is there any successful processor that takes this "stream of bits" approach?


It's actually not all that different from the Burroughs large systems architectures (B5000/B6700/A-series) which were initially released in the late 60s ... stack machines, similar descriptor based memory structures, tagged memory etc etc

Burroughs instruction streams are on byte boundaries, and had variable length instructions starting with 1 byte, including long ones for making useful literals


And still being sold by Unisys to this day.


AFAIK what is sold today by Unisys are essentially preconfigured emulators running on what is otherwise normal Xeon servers.


Naturally Unisys ClearPath MCP isn't running the same hardware as Burrough Large Systems from 1961.

They don't need emulators, as Burroughs was one of the first bytecode based OSes, but I am not sure how it looks in detail.


> They don't need emulators, as Burroughs was one of the first bytecode based OSes, but I am not sure how it looks in detail.

Do you have a source for that claim? I don't think that's true.


That Burroughs was one of the first bytecode based OSes?

https://en.wikipedia.org/wiki/Burroughs_Large_Systems

And thanks for making having another look into it, as it clearly points out there are emulators and native implementations (MCP CMOS),

"These machines were the Libra 100 through the Libra 500, With the Libra 590 being announced in 2005. Later Libras, including the 590, also incorporate Intel Xeon processors and can run the Burroughs large systems architecture in emulation as well as on the MCP CMOS processors. It is unclear if Unisys will continue development of new MCP CMOS ASICs. "


> That Burroughs was one of the first bytecode based OSes?

> https://en.wikipedia.org/wiki/Burroughs_Large_Systems

I don't know how you get that from that article, because I can't see the word "bytecode" anywhere in it. What text in that article do you think supports your contention?

> And thanks for making having another look into it, as it clearly points out there are emulators and native implementations (MCP CMOS),

The same article contains this sentence: "Unisys stopped producing the hardware in the early 2010s, and the operating system is now run under emulation"

So there was proprietary hardware, up to about 10 years ago, but nowadays it is all standard x86-64 servers running their software emulator


> I don't know how you get that from that article, because I can't see the word "bytecode" anywhere in it. What text in that article do you think supports your contention?

not sure if you're hung up on the word "byte", but it says the architecture is a stack machine and that's what p-code/bytecode interpreters emulate, so whether it's byte codes or word codes, it's essentially the same thing


We run Clearpath at $DAYJOB. I haven't cracked open one of them to look myself, but the folks running them have told me they're pretty much bog standard, high-end x86-64 servers running a Unisys VM. No proprietary hardware.


AFAIK there's a 'bit-addressable' extension for ARM, but I lack the google-fu to search for it (I guess this only applies to loading and storing data, not 'arbitrary width instructions')

For the data load/store case, if you go with the (high-level) idea that bytes are nothing special anymore (because cache line width is the new word width), and instead you have arbitrary width integers (like Zig for instance), it would also make sense to have 'bit granularity pointers'.

All you'd need is to move a regular pointer 3 bits to the left (which should be ok because the upper bits are wasted anyway).

IMHO this would make a lot of sense from a system programming language PoV, but I guess the hardware guys can think of a couple of reasons to object :)


Bit-addressable data has been done in at least one other architecture: https://en.wikipedia.org/wiki/TMS34010

This is different from the 432 having opcodes that weren't a multiple of 8 bits, making decoding even more complicated than on x86.


The Intel 8051 can do limited bit level addressing. It's restricted to a specific 16-byte region in low memory, which it makes it different than either the 432 or 34010.


Stream of bits really simplifies it. If you read the intel documentation there's a nice diagram that explains the logic. Effectively you can view it as a object oriented, or grouping mechanism. If you look at instructions and prefixes. It effectively goes. PREFIX-EXTENSION(Ala AVX-256)-INSTRUCTION-ARG1-...-ARGN-PostFix.


What’s most interesting to me about the i432 is the rich array of object types essentially embedded into its ISA. The JVM “knows” a little bit about virtual dispatch tables, monitors, arrays, but even that pales in comparison to the i432’s user-facing model of the CPU state.

Is there anything comparable surviving today?


I don’t think so, except at the margins.

I started out as a Lisp hacker on machines designed for it (PDP-10 and CADR, later D-machines) so I was very much in the camp you describe. They had hardware / microcode support for tagging, unboxing, fundamental Lisp opcodes, and for the Lispms specifically, things like a GC barrier and transporter support. When I looked at implementations like VAXLisp, the extra cycles needed to implement these things seemed like a burden to me.

Of course those machines did lots of other things as well, and so were subject to a lot of evolutionary pressure the research machines were not subject to.

The shocker that changed my mind was the idea of using the TLB to implement the write barrier. Yes, doing all that extra work cost cycles, but you were doing on a machine that had evolved lots of extra capabilities that could ameliorate some of the burden. Plus the underlying hardware just got faster faster (I.e. second derivative was higher).

Meanwhile, the more dedicated architectures were burning valuable real estate on these features and couldn’t keep up elsewhere. You saw this in the article when the author wrote about gates that could have been used elsewhere.

Finally, some decisions box you in — the 64kb object size limitation being an example in the 432. Sure, you can work around it, but then the support for these objects becomes a deadweight (part of the RISC argument).

You see this also in the use of GPUs as huge parallel machines, even though the original programming abstraction was triangles.

Going back to my first sentence about “at the margins”: optimize at the end. Apple famously added a “jvm” instruction — must have been the fruit of a lot of metering! Note that they didn’t have to do this for Objective-C: some extremely clever programming made dispatch cheap.

Tagging/unboxing can be supported in a variety of (relatively) inexpensive ways by using ALU circuitry otherwise idle during address calculation OR (more likely these days) by implementing a couple of in demand ops, either way pretty cheap.

Finally, we do have a return to and flourishing of separate, specialized functional units (image processors, “learning” units and such, like, say, the database hardware of old) but they aren’t generally fully programmable (even if they have processors embedded in them) but they key factor is that they don’t interfere (except via some DMA) with the core processing operations.


“Going back to my first sentence about “at the margins”: optimize at the end. Apple famously added a “jvm” instruction — must have been the fruit of a lot of metering! Note that they didn’t have to do this for Objective-C: some extremely clever programming made dispatch cheap.”

I’m struggling to think of what you are referring to here. ARM added op codes for running JVM byte code on the processor itself, but I think those instructions were dropped a long time ago. ARM also added an instruction (floating point convert to fixed point rounding towards zero) as it became such a common operation in JS code. There have also been various GC related instructions and features added to POWER, but I think all that was well after Apple had abandoned the architecture.

I may be forgetting sonething, could you clarify?


GP probably meant a “JS” instruction rather than a “JVM” one: FJCVTZS, “Floating-point Javascript Convert to Signed fixed-point rounding towards Zero”[1,2], introduced in ARMv8.3 at Apple’s behest (or so it is said). Apparently the point is that ARM float-to-integer conversions normally saturate on overflow while x86 reduces the integer mod 2^width, and JavaScript baked the x86 behaviour into the language.

[1] https://developer.arm.com/documentation/ddi0602/2022-12/SIMD...

[2] https://news.ycombinator.com/item?id=24808207


I meant JS but too late to edit.


Not adding tagging is basically a negligence crime. That feature isn't that expensive and it could have saved most of the security issues that have happened to last 20+ years.


I think you are talking about different types of tagging. Tagging on Lisps and other language VMs is where some bits in a pointer are reserved to indicate the type. So with a single tag bit integers might be marked as type 0 (so you don’t need to shift things when doing arithmetic) and other objects would be type 1. This provides no real protection against malicious code at all. There are other types of pointer tagging that do provide security, and we are starting to see some support in hardware.


Its the same basic concept you can use in multiple ways. But yeah of course security was not what tagging was used.


It depends - at the time capability machines were all the rage - the idea there is that you add EXTRA bits to memory (say 33-bit or 34-bit memory - Burroughs large systems were 48-bit machines with 3-bit tags - plus parity) with careful rules about how they could be made, so that pointers could not just be created from integers.

At the time memory was really expensive so blowing lots of bits on tags was a real issue (late 70s we bought 1.5Mb of actual core for our Burroughs B6700 for $1M+) plus as memory moved onto chips powers of two became important, getting someone to make you a 34-bit DRAM would be hard, much less getting a second source as well


"powers of two became important, getting someone to make you a 34-bit DRAM would be hard, much less getting a second source as well".

I have several pounds of 9-bit memory. It's 'parity' memory, was fairly common back in the day, and adding another bit just means respinning the simm carrier to add a pad for another chip. Of course, I don't know if anyone is making 1- or 4-bit DRAMS any more, so you might be stuck adding a 8-bit or larger additional chip. Memory is probably cheap enough now if you wanted 34 bits to play with and were somehow tied to larger power-of-two, you could just go to 40 or more bits and do better ECC or more tag space or just call the top 6 'reserved' or something. It's a solvable problem.


I have built CPUs with 9-bit bytes (because of subtle MPEG details), they made sense at the time and ram with 9-bit bytes was available at a reasonably low premium - that probably wasn't true when the 432 was a thing, RAM was so much more expensive back then.

You could get special sized DRAM made for you, it wouldn't be cheap, getting a second source even more expensive - you'd have to be an Intel, IBM or someone of that size to guarantee large enough volumes to get DRAM manufacturers to bite


If you'll notice, memory is often made on a carrier (e.g. SIMM module). So you don't have to find someone to make you a x9 or x34 or whatever bit wide chip; you find someone to make a carrier out of off the shelf parts with enough chips for your word width (possibly burning some bits). Early 9-bit SIMMs had 2 4-bit wide DRAMS and 1 1-bit wide DRAM. Just need a memory controller that makes sense of it.


(I design memory controllers ....) you can sort of do that depending on where your byte boundaries are (and whether your architecture needs to be able to do single byte writes to memory) - more though I was trying to point out that historically just 'burning some bits' was not something you could practically do cost wise (it's why we built a 9/72/81-bit CPU in the 90s rather than a 16/125/128 one - the system cost of effectively doubling the memory size would not have made sense)

These days (and actually in those days too) memory isn't really the size of the memory bus, often it's a power of two multiple - those 9-bit RAMBUS drams we were using really moved data on both edges of a faster clock - our basic memory transfer unit was 8 clock edges x 9 == 72 bits per core clock - as a designer with even 1 DRAM out there that's the minimal amount you can deal with and you'd best design to make the most of it


My point was there was more than one way of solving the problem (economically optimized or not), and having custom width memory silicon wasn't the only answer. But sure, if you move the goalpost around enough you get to be right.


Edit: Should have said "x-bit wide memory" everywhere above.


Intel does have a pointer tagging feature called Linear Address Masking. Why it's called that I don't know.

(On ARM it's called TBI/top byte ignore, on AMD it's called UAI/upper address masking.)

This is different from pointer signing which it doesn't have.


NSA et al probably like better it that way so they have easier access to the "'intel' inside" my PC.


That's an incredibly bad argument because these are the same computer the NSA and the US govenrment itself uses and they are exposing themselves. Not to mention all the US company who are subject to intellectual property theft and so on.


Unfortunately the Information Assurance part of NSA’s mission seems to have diminished dramatically in the 21st century.

But your point still stands about in house use.


If you squint hard enough, the underlying object capability system as privilege boundary concept still does live on.

In hardware the 432 went on to inspire 16 and 32 protected modes on x86. There it was inspiration for just about anything involving the GDT and the LDT including fine grained memory segments, hardware task switching of Task State Segments, and virtual dispatch through Task Gates.

But a large point of the RISC revolution was that these kinds of abstractions in microcode don't make sense anymore when you have ubiquitous I$s. Rather than a fixed length blob of vendor code that's hard to update, let end users create whatever abstractions they feel make sense in regular (albeit if privileged) code. Towards that end the 90s and 2000s had an explosion of supervisor mode enforced object capability systems. These days the most famous is probably sel4; there are a lot of parallels between sel4's syscall layer and the iAPX432's object capability interface between user code and microcode. In a lot of ways the most charitable way to look at the iAPX432 microcode was as a very early microkernel in ROM.


timing is interesting: ia432 listed as "late 1981" (wikipedia), and 286 (protected mode 16:16 segmentation) in Feb 1982. of course, the 432 had been going on for some time...


> Is there anything comparable surviving today?

I'm not aware of such Super CISC instruction sets in popular use today, but I wonder with VM's and statistically-based AI proliferating now, whether we might revisit such architectures in the future. Could continuous VM-collected statistical data inform compiler and JIT compiler design to collapse expensive, common complex operations we can't identify patterns for with current methods into Super CISC instructions that substantially speed up patterns we didn't know previously existed, or are our current methods to analyze and implement compilers and JIT's good enough and what's mostly holding them back these days are other factors like memory and cache access speed and pipeline stalls?


There were some attempts in the Java direction: https://en.wikipedia.org/wiki/Java_processor

But ultimately it seems that the idea of language-specific CPUs just didn't survive because people want to be able to use any programming language with them.


The Java-Everything trip Sun went on was truly horrific. Both in terms of technical and business results.


> people want to be able to use any programming language with them.

And then, except for a few rounding errors, they actually use C.


> Is there anything comparable surviving today?

Surviving? No. The most recent is arguably Sun's Rock processor, which was one of the final nails in their coffin, was quite an i432 redux. It promised all sorts of hardware support for transactions and other features that Sun thought would make it a killer chip, was rumoured to tape out requiring 1 kW of power for mediocre performance, and Oracle killed it when they saw how dysfunctional it was.


I feel like the Nvidia chips from NV1-NV3 were doing a take on this idea IIRC. My memory is foggy and would really love to see the documentation again. Possibly it survived on beyond the original chips?

I'm not entirely sure if it was just the driver API or the actual hardware that supported objects but definitely they were trying to abstract it that way.

I found this through googling "redesigned PGRAPH objects, introducing the concept of object class in hardware"

https://media.readthedocs.org/pdf/envytools/latest/envytools...

And. https://envytools.readthedocs.io/en/latest/hw/graph/intro.ht...


"""

The key new features included:

Ada : The architecture would be programmed using the Ada programming language, which at the time was seen as the ‘next big thing’ in languages.

"""

This was it - the next big thing. Missed and went down, but there's always the doubt if they were right too early ...

Seems the performance just wasn't there:

"""

Gordon admitted that "to a significant extent," he was personally responsible. "It was a very aggressive shot at a new microprocessor, but we were so aggressive that the performance was way below par."

"""

It was kind of uncanny having it shouted from the rooftops one year, to dead silence about anything having ever happened a few years later.


Ada was the CISC of languages - high-concept the same way. And it lost to C, surely the RISC of languages.

Never bet against low-tech.


> Ada was the CISC of languages - high-concept the same way.

For a long time, many years, in the "programming language benchmarks game", Ada regularly and consistently came 2nd only to C in raw compiled-code execution speed.

There is a huge but mostly unrecognised amount of hearsay, rumour and scuttlebutt in the computing industry. People hear something, believe it, and make choices based upon it, without checking or looking for evidence... as they do with medicine and so on, or there would be no "supplementary, complementary and alternative medicine". It doesn't work. It is a SCAM.

People heard "Ada is big" and thought "big means slow" and avoided it. It was quite big in the 1980s, from what I can tell, but that was about 4-5 chip and software generations ago.

(H/w: Caches, scalar, superscalar, multicore, 64-bit.) (S/w: 16-bit, no memory protection; 32-bit, ditto; 32-bit with memory protection; ubiquitous hypervisors; virtualised 64-bit with memory protection.)

Compared to C++, Ada is a tiny lean fast sportscar of a language.

> Never bet against low-tech.

It isn't low-tech. It was in 1972 or something.

Never bet against "worse is better" is more like it.



Not sure what you are trying to tell us here, TBH.

Can you explain in words?


Data. You say "2nd only to C" but don't provide data.

http://shootout.alioth.debian.org/gp4/benchmark.php?test=all...


Aha, ISWYM now.

I can't. I just remember noticing it a few years back, and being surprised, and then checking occasionally and it stayed there for a while.

Why not contact them and ask, if you're curious? If I'm wrong, you'll have the pleasure of refuting me. ;-)


> Compared to C++, Ada is a tiny lean fast sportscar of a language.

We can all see C++ program measurements faster than Ada program measurements in those archived "programming language benchmarks game" charts.


What I was getting at was the size of the language, not its compiled speed.

Does any single human understand all of C++ and use it all? Is it even possible any more? If they did, could anyone else read the code?


people "avoided" it because up until fairly recently in its history, the only available compilers were priced to sell to the US Department of Defense and the language was too complex for other people to bother implementing a standards compliant compiler. it's an absurd language with a few good ideas buried underneath a mountain of engineering cruft. it's a fantastic reflection of the environment it was born out of.


That's a fair point. However until relatively recently in the history of programming language in general that was true. All of them.

There has been a decent C compiler for the last ⅔ of my career, roughly. Other languages followed; I remember early in my working life when Zortech announced the first ever native-code C++ compiler, and it was commercial. As was Pascal etc.

The mailing list I've been on for the longest time in my life is the British APL society (some 38 years now), and for much of the first 2 decades I was on that, one of their main goals was a decent free or cheap APL interpreter. Then suddenly there were free compilers and interpreters everywhere from the mid-1990s onwards.

About the time ordinary consumer Windows got useful, suddenly, so did Linux, and then suddently decent free compilers were everywhere. And about 5Y later, Mac OS X happened.

Then about 5Y after Mac OS X first appeared, meaning OS X Server with the Platinum appearance, suddenly Linux started to catch up in earnest... meaning that soon after MS forced Corel to kill Corel LinuxOS and WordPerfect Office for Linux, and Caldera went insane, Bruce Perens proposed UserLinux...

And within a year or two, Ubuntu appeared, and Fedora started playing catch-up, as it still is.

And after a couple of years, Ubuntu got pretty good, and frankly, for all its foibles, it's still the best freebie for non-techies. Or Ubuntu derivatives, such as Mint, Zinc, Zorin, or Linux Lite ( which isn't light at all, but is all right otherwise.)


Of course you make a good point yourself, compilers generally weren't free (as in beer) back then. What makes Ada stand out is that it was quite a bit pricier. The "small developer" license of AdaCore for example is $25k/year and it covers 5 seats. Hobbyists aren't going to pay that. It was 3 years after Zortech's compiler that Borland came out with their own C++ priced at $100 and was more than good enough (I don't know Zortech's pricing model to compare.) The only free Ada compiler I know of is GNAT and it's rife with issues (extremely massive binary size, even with the runtime disabled)

Honestly the tooling just isn't great for the language. It's a real shame, because the subtyping system and design-through-contracts are way ahead of their time. ATS is the only other language I can think of that offers such robust safety features while also targeting the same performance as a systems lang.


I think the iAPX432 team went on to do the i960 (another interesting architecture that didn’t really find the success that was hoped-for) and then finally they went on to the PentiumPro where they found more success.


Its really said the i960 didn't take off. Intel wasn't in Unix workstation market much and the high end was owned by Digital and IBM. Intel was working on i860, i960,

Intel if they had added quicker, could have cooperated with a Unix workstation maker and potentially done really well.

Sun was defiantly looking around for a chip partner at the time, but non of the American companies were interested so they went Japan. So the timing didn't really work out. A Sun Intel alliance would have been a scary prospect and beneficial for both companies.


it wasn't a total failure. the 860 and 960 were decent little engines that found a home in high performance computing and embedded applications that needed a little oomph. I worked on some 860 array products and certainly remember finding 960s in printers and other gear


Trivia: The i960 was used to power Sega's highly successful Model 2 arcade board, which truly ushered in the era of 3D gaming* (with Daytona USA), and was used in the F-22 Raptor until it was later replaced with other CPUs.

* Certainly not the first 3D arcade hardware but arguably this along with Namco's MIPS-based System 22 (Ridge Racer, released a little before Daytona), was the inflection point that made 2D effectively obsolete.


Worked on an i860 Stratus machine in the early 90s - provided a key part of our distributed infra due to its FT capabilities.


432 was a pretty interesting flop, but surely the ia64 has to rank up there.

it would be interesting to try to chart some of the features that show up in various chips. for instance, 64k segments in both 432 and 286+, or VLIW in 860 and ia64.


One difference is that, according to the article, Intel actually learned quite a bit technically from the 432 even though it was a commercial flop. It's hard to see much of a silver lining in IA64/Itanium for either Intel or HP--or, indeed, for all the other companies that wasted resources on Itanium if only because they felt they had to cover their bases.


A lot of RISC CPU arches which were popular in the 1990's declined because their promulgators stopped investments and bet on switching to IA64 instead. Around the year 2000, VLIW was seen as the future and all the CISC and RISC architectures were considered obsolete.

That strategic failure by competitors allowed x86 to grow market share at the high end, which benefited Intel more than the money lost on Itanium.


It's more complicated than that.

Sun didn't slow down on UltraSPARC or make an Itanium side bet. IBM did (and continues to) place their big hardware bet on Power--Itanium was mostly a cover your bases thing. I don't know what HP would have done--presumably either gone their own way with VLIW or kept PA-RISC going.

Pretty much all the other RISC/Unix players had to go to a standard processor; some were already on x86. Intel mostly recovered from Itanium specifically but it didn't do them any favors.


Actually, they did. Intel promised aggressive delivery schedule, performance ramp, and performance. The industry took it hook, line, and sinker. While AMD decided not to limit 64 bit to the high end and brought out x86-64.

Sun did a port IA64 port of solaris, which is definitely an itanium side bet.

HP was involved in the IA64 effort and definitely was planning on the replacement of pa-risc from day 1.


> HP was involved in the IA64 effort and definitely was planning on the replacement of pa-risc from day 1.

As my memory remembers and https://en.wikipedia.org/wiki/Itanium agrees, Itanium originated at HP. So yes, a replacement for pa-risc from day 1, but even more so...


Another way to look at the Itanic is that HP somehow conned Intel into betting the farm on building HP-PA3 for HP. Which is pretty impressive.


Sun didn't slow down on UltraSPARC but they were just not very good at designing processors.


This isn't really true. IBM/Motorola need to own the failure of POWER and PowerPC and MIPS straight up died on the performance side. Sun continued with Ultrasparc.

It wasn't that IA64 killed them, it's that they were getting shaky and IA64 appealed _because_ of that. Plus the lack of a 64bit x86.


Its simply economics Intel had the volume. Sun and SGI simply didn't have the economics to invest the same amount, and they were also not chip company, the both didn't invest enough in chip design or invested it wrongly.

Sun spend an unbelievable amount of money on dumb ass processor projects.

Towards the end of the 90s all of them realized their business model would not do well against Intel, so pretty much all of them were looking for an exit and IA64 hype basically killed most of them. Sun stuck it out with Sparc with mixed results. IBM POWER continues but in a thin slice of the market.

Ironically there was a section of Digital and Intel who thought that Alpha should be the bases of 64 bit x86. That would have made Intel pretty dominate. Alpha (maybe a TSO version) with 32 bit x86 comparability mode.


Look closely at AMD designs (and staff) of the very late 90s and early 2000s and/or all modern x86 parts and see that ...more or less, that's what happened, just not with an Alpha mode.

Dirk Meyer (Co-Architect of the DEC Alpha 21064 and 21264) lead the K7 (Athlon) project, and they run on a licensed EV6 bus borrowed from the Alpha.

Jim Keller (Co-Architect of the DEC Alpha 21164 21264) lead the K8 (first gen x86-64) project, and there are a number of design decisions in the K8 evocative of the later Alpha designs.

The vast majority of x86 parts since the (NexGen Nx686 which became) AMD K6 and Pentium Pro (P6) have been internal RISC-ish cores with decoders that ingest x86 instructions and chunk them up to be scheduled on an internal RISC architecture.

It has turned out to sort of be a better-than-both-worlds thing almost by accident. A major part of what did in the VLIW-ish designs was that "You can't statically schedule dynamic behavior" and a major problem for the RISC designs was that exposing architectural innovations on a RISC requires you change the ISA and/or memory behavior in visible ways from generation to generation, interfering with compatability so... the RISC-behind-x86-decoder designs get to follow the state of the art changing whatever they need to behind the decoder without breaking compatibility AND get to have the decoder do the micro-scheduling dynamically.


Yes that very much part of the history.

However I disagree that its the best of both worlds.

RISC doesn't necessary require changing the ISA, not anymore then on x86.


I'm certainly not going to claim that x86 and its irregularities and extensions of extensions is in _any way_ a good choice for the lingua franca instruction set (or IR in this way of thinking). Its aggressively strictly ordered memory model likely even makes it particularly unsuitable, it just had good inertia and early entrance.

The "RISC of the 80s and 90s" RISC principles were that you exposed your actual hardware features and didn't microcode to keep circuit paths short and simple and let the compiler be clever, so at the time it sort of did imply you couldn't make dramatic changes to your execution model without exposing it to the instruction set. It was about '96 before the RISC designs (PA-RISC2.0 parts, MIPS R10000) started extensively hiding behaviors from the interface so they could go out-of-order.

That changed later, and yeah, modern "RISC" designs are rich instruction sets being picked apart into whatever micro ops are locally convenient by deep out of order dynamic decoders in front of very wide arrays of microop execution units (eg. ARM A77 https://en.wikichip.org/wiki/arm_holdings/microarchitectures... ), but it took a later change of mindset to get there.

Really, the A64 instruction set is one of the few in wide use that is clearly _designed_ for the paradigm, and that has probably helped with its success (and should continue to, as long as ARM, Inc. doesn't squeeze too hard on the licensing front).


Seems to me that you just have to be careful when bringing out a new version. You can't change the memory model from chip to chip but that goes for x86 to. Not sure what other behaviors are not really changeable.

Can you give me an example of this? SPARC of the late 90s ran 32bit SPARC.


Plus the lack of a 64bit x86.

If you look at the definitions of various structures and opcodes in x86 you'll notice gaps that would've been ideal for a 64-bit expansion, so I think they had a plan besides IA64, but AMD beat them to it (and IMHO with a far more inelegant extension.)


  > and IMHO with a far more inelegant extension
what could they have done that would have been better?


>That strategic failure by competitors allowed x86 to grow market share at the high end, which benefited Intel more than the money lost on Itanium.

In that sense, Itanium was a resounding success for Intel (and AMD).


Itanium was a success right until they actually made a chip.

What they should have done is hype Itanium and then they day it came out they should have said yeah this was a joke, what we did is buy Alpha from Compaq and its literally just Alpha with x86 comparability mode.

Then they would have dominated.


Itanic was a flop due to AMD releasing 64bit CPU. And I still think Intel learned a lot from its failure if not from the technology but business-wise. Just stick to improving the existing architecture while keeping backward-compatibility.


IMO, Itanic was a doomed design from the start, the lesson to be learned is that "You can't statically schedule dynamic behavior." The VLIW/EPIC type designs like Itanium require you have a _very clever_ compiler to schedule well enough to extract even a tiny fraction of theoretical performance for both instruction packing and memory scheduling reasons. That turns out to be extremely difficult in the best case, and in a dynamic environment (with things like interrupts, a multitasking OS, bus contention, DRAM refresh timing, etc.) it's basically impossible. Doing much of the micro-scheduling dynamically in the instruction decoder (see: all modern x86 parts that decompose x86 instructions into whatever it is they run internally that vendor generation) nearly always wins in practice.

Intel spent decades trying to clean-room a user-visible high end architecture (iAPX432, then i860, then Itanium), while the x86 world found a cheat code for microprocessors with the dynamic translation of a standard ISA into whatever fancy modern core you run internally (microcode-on-top-of-a-RISC? Dynamic microcode? JIT instruction decoder? I don't think we really have a comprehensive name for it) thing. Arguably, NexGen were really the first to the trick in 1994, with their Nx586 design that later evolved into the AMD K6, but Intel's P6 - from which most i686 designs descend - is an even better implementation of the same trick less than a year later, and almost all subsequent designs work that way.


Without AMD releasing AMD64, eventually WinIntel would be IA64 no matter what.


Or Intel would have been cut out if they didn't put forth an offering that was less expensive and more performant? When NT4 came out, it ran on Alpha, MIPS, and PowerPC. You could even run (...at about half speed) x86 binaries on the Alpha port with FX!32. Apple has swung a transition like that twice, all the old Workstation vendors went from 68k to their bespoke RISCs, Microsoft could have just slowly transitioned out of Intel parts with no more difficulty than transitioning to IA64. Windows' PE format still doesn't have an elegant Fat binary setup (they have that Fatpack hack in windows-on-ARM, but it's worse than the 90s implementations), but that doesn't mean they couldn't have added one if compelled because the winning x86 successor(s) didn't end up being backward compatible.

The biggest squeeze on 32 bit architectures is the memory ceiling, and Intel was doing PAE to get 36 bit addressing on the Pentium Pro in '95 and kept squeaking by with PAE well into the mid-2000s before most consumers cared. You only got 4GB per-process, and it took a couple years for chipset support to happen. The chipset issue is itself an interesting historical rabbit-hole, only one of the first-party chipsets for the Pentium Pro - the 450GX - which was a many-chip monstrosity, even _claimed_ to support more than 4GB of RAM. I've never found an example of a 450GX configuration with more than one 82453GX DRAM controller as indicated by the documentation to handle multiple 4GB banks to the extent that I suspect it may not have actually worked. By 96/97 there were 3rd party chipsets that could do >4 processors and >4GB, most prominently the Axil NX801 ( https://www.eetimes.com/axil-computer-to-incorporate-pentium... ) sold by DataGeneral as the AV8600 and HP as the HP NetServer LXr Pro8.


Windows NT was dead in all of them by the time IA64 came out.


I disagree for a number of reasons:

The slow performance would eventually have lead to Intel realizing its bad idea. They would use market share to competitors on different ISA.

Windows at some point would want to be on faster processors and would again run on some of the others. Windows doesn't have undying loyalty to Intel.

Other people then AMD could do a x86 implementation with some 64 bit overlay as well. Transmeta style. That kind of system would beat Itanium as well if it was put on top of a fast RISC processor.

And at some point AMD even on 32bit would massively gain market share as they invested more in faster RISC style processors. So Intel would have massive pressure from the bottom end and the top end. And in any possible future AMD at some point it gone do something with 64 bit.

The idea that the whole industry goes massively backwards and stagnates for years because Intel monopoly doesn't really work in practice.


The point is that NT is portable, and once Merced hit the market in 2001 5+ years overdue and not delivering on any of its performance promises, the only question was "What architecture will succeed x86, because we can cross IA64 off the list." In the same way that when the 432 showed up years late and 5-10x slower than a contemporary Motorola 68000 or 286, the 432 was dead in the water and all the early 80s workstations were built with 68ks and the PC market went with 286s.

I don't know if the absence of AMD64 in 2003 would have made an opening for SPARC or PowerPC or ARM or something else entirely, or maybe the "Let's slap an expansion on the 8080 again, just like the 386 bailed use out after the 432 debacle" scenario was inevitable, but NONE of the compiler-scheduled-parallel architectures panned out in the market, so someone else was going to win.


WinIntel was already into IA64 boat, when AMD64 came to be.

Without AMD64, their partnership would carry on anyway.


I have read one insider account that Intel had its own, different x86-64 instruction set, designed in response to AMD's. It approached Microsoft and asked it to port Windows to it.

Microsoft refused, saying "we already support one failing 64-bit architecture of yours, at great expense and no profit. We're not doing two just for you. There now is a standard x86-64 ISA and it's AMD64, so suck it up and adopt the AMD ISA -- it's good and we already have it working."

Or words to that effect. :-)

I've not been able to find the link again since, but allegedly, yes, the success of AMD's x86-64 has been due to Microsoft backing it. It sounds plausible to me.


Based on https://en.wikipedia.org/wiki/File:Itanium_Sales_Forecasts_e... it's clear that Itanium was delayed and sales projections were drastically reduced multiple times before AMD even announced their 64-bit alternative, let alone actually shipping Opteron. (For reference, AMD announced AMD64 in October 1999, published the spec August 2000, shipped hardware in April 2003. Intel didn't publicly confirm their plans to adopt x86-64 until February 2004, and shipped hardware in June 2004.)


VLIW was really marooned in time: driven by overconfidence in the compiler (which had shown that you could actually expose pipeline hazards), and underestimates of the coming abundance of transistors (which make superscalar OoO really take off, along with giant onchip caches). well, and multicore to sop up even more available transistors.


The problem is that VLIW has already proven not to work in the 90s with lots of companies investing money in it but not making it a product.

EPIC was basically VLIW++ with lots of added stuff that should overcome the issues with it, but not doing that successfully.

I don't think they underestimated the amount of transistors, they just thought that EPIC would be a better way to use them.

OoO had already proven itself in the 90s as well, so its not like this was unknown when they designed EPIC.


otoh, for the previous 20 years, things like the 432 and lispms and burroughs large systems had been losing, in favor of architectures that pushed all the hard work onto compilers

so it makes sense that in 01995 you'd look at ooo and vliw and extrapolate that vliw/epic was going to beat the crap out of ooo


Granted it makes some amount of sense. But the issue is that with EPIC you still can address every part of the processor unless you want to continue to grow the instruction. So you end up having to do OoO anyway but you just made it much more complex and hard to reason about.

I'm not a chip designer but is what I understood to be one of the issues.

Also if this compiler stuff wasn't jet written, unlike with RISC where people at Standford showed successful compilation for RISC already before people even developed any high performance RISC chips.

I don't want to claim I'm smarter then those people, clear all the people working on these VLIW processor were a lot smarter then me. But then again many smart people worked on Alpha and they didn't go the VLIW route.


I've built stuff with 80s processors, like wire wrap and solder type of build, so I've read quite a few processor manuals/handbooks/datasheets over the years (decades...) After the failure of the 432 I saw a set of 432 databooks for sale cheap, probably at a hamfest or similar, the typical CPU of the day was a hundred page book but the 432 was an entire bookshelf.

I know there's a marketing product message and we have to "respect" that, however... the impression I had from reading the actual engineering docs (well, glancing and skimming and having heard about it vs looking at the actual documentation) was marketing wanted to SELL an ada/object oriented chip, so they sold it that way, but the actual data sheet showed this was DESIGNED to be the IBM system/360 for the 80s microprocessor generation. Unimaginable list of features most applications would never, ever use. Literally every feature any assembly language programmer could ask for, and then more on top of that.

It seemed too complicated to ever optimize and release a vers 2.0 that's binary compatible. You can imagine, then create, an 8008 version 2, or a 8080 version 2, or a 8086 version 2, or a 6800 version 2 in a logical engineering sense. The 432 was a one-and-done, an evolutionary dead end. Possibly the next one could be smaller process and run slightly faster or use less power. But it was a dead end.


> Literally every feature any assembly language programmer could ask for, and then more on top of that.

I think that one of the key problems with the 432 was that they skipped talking to the assembly language programmers!

The 432 omitted stuff that most assembly language would deem essential (eg registers where I think an assembly language expert would have immediately focused on performance impact) and I think assembly language programming of the 432 would not be fun at all.

Agree 100% that upgrading it would have been a nightmare. I suspect that they simply didn’t have the bandwidth to think about v2.


I used to make similar computers: wire-wrapped things that I'd never do today - 8086, 80186 and twice 2900 bit slices. I was never able to purchase a set of 432 chips. No one had any for sale, or perhaps not for sale to me.


The ambitious 432 was also late, quite late. So Intel needed a simple stopgap product which was an iteration of the 8088, the 8086.


The 8088 (1979) was a low-cost (reduced bus width) follow-up to the 8086 (1978). You may be thinking of the 8080 (1974) or 8085 (1976).


No, I meant the 8086. The iAPX 432 project was started in 1975 but wasn't released until 1981. The 432 was late, very late, and so Intel needed a stopgap product. That was the 8086, started in 1975 and released in 1979.


I'm still not following. You seem to have said above that the 8086 was an iteration of the 8088, whereas Wikipedia claims that the 8088 is instead a variant of the 8086. It also says that the 8088 came out after the 8086. Can you restate and clarify your claim? A typo seems like the most likely explanation, but maybe I'm just misunderstanding.


I'm only talking about the purpose of the 8086 as a stopgap for the very late 432. I shouldn't have mentioned the 8088.


If I recall correctly and shuttling instructions around fast enough is the main bottleneck right now, why do people want to return to RISC?


I think one of the main arguments in favor of RISC-V is the permissive licensing. A lot of the people moving to RISC-V are absolutely weighting this heavily in their decisions.

The RISC-V folks would probably tell you that the compressed instruction set nets the same code density benefits you'd see on more complex ISAs. Myself, I haven't studied it enough to have a real opinion.

RISC proponents tout its benefits for energy efficiency, which is a pretty important thing these days. Though, that might be up for debate. This paper states that "there is nothing fundamentally more energy efficient in one ISA class or the other. The ISA being RISC or CISC seems irrelevant." <https://research.cs.wisc.edu/vertical/papers/2013/hpca13-isa...>

Personally, I think the RISC vs CISC debate is just a little bit dated. The ideas behind RISC were new and exciting in the 80s. The equation had changed and there was a lot of room for innovation and improvement with sweeping changes to computer architecture. They were thinking about how instructions could be pipelined, how compilers would allocate registers, how profiling showed that seldom-used instructions can be deadweight that literally holds a system back. Then in the 90s, the equation changed again. Pentium Pro style superscalar architecture showed that you can do well with even a terrible ISA if you're able to throw enough transistors at it.


> If I recall correctly and shuttling instructions around fast enough is the main bottleneck right now,

I don't think so? Aren't we still at the point in the cycle where moving data around is the bottleneck? Anyways, AIUI modern RISC does have tricks to make things more efficient if you really want, but it's not usually viewed as needed.

> why do people want to return to RISC?

What return? Nearly everything on the market is RISC; the one exception is x86, which is still RISC (ish) but with a CISC ISA stuck on top.


Writing X86 assembly is painful, writing aarch64 or RISC-V assembly is easy and fun. That, and everything is fast nowadays. Even microcontrollers are hundreds of MHz with huge memories. There's very little pressure to increase high-end performance, a lot of pressure to make more software.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: