What’s most interesting to me about the i432 is the rich array of object types essentially embedded into its ISA. The JVM “knows” a little bit about virtual dispatch tables, monitors, arrays, but even that pales in comparison to the i432’s user-facing model of the CPU state.
I started out as a Lisp hacker on machines designed for it (PDP-10 and CADR, later D-machines) so I was very much in the camp you describe. They had hardware / microcode support for tagging, unboxing, fundamental Lisp opcodes, and for the Lispms specifically, things like a GC barrier and transporter support. When I looked at implementations like VAXLisp, the extra cycles needed to implement these things seemed like a burden to me.
Of course those machines did lots of other things as well, and so were subject to a lot of evolutionary pressure the research machines were not subject to.
The shocker that changed my mind was the idea of using the TLB to implement the write barrier. Yes, doing all that extra work cost cycles, but you were doing on a machine that had evolved lots of extra capabilities that could ameliorate some of the burden. Plus the underlying hardware just got faster faster (I.e. second derivative was higher).
Meanwhile, the more dedicated architectures were burning valuable real estate on these features and couldn’t keep up elsewhere. You saw this in the article when the author wrote about gates that could have been used elsewhere.
Finally, some decisions box you in — the 64kb object size limitation being an example in the 432. Sure, you can work around it, but then the support for these objects becomes a deadweight (part of the RISC argument).
You see this also in the use of GPUs as huge parallel machines, even though the original programming abstraction was triangles.
Going back to my first sentence about “at the margins”: optimize at the end. Apple famously added a “jvm” instruction — must have been the fruit of a lot of metering! Note that they didn’t have to do this for Objective-C: some extremely clever programming made dispatch cheap.
Tagging/unboxing can be supported in a variety of (relatively) inexpensive ways by using ALU circuitry otherwise idle during address calculation OR (more likely these days) by implementing a couple of in demand ops, either way pretty cheap.
Finally, we do have a return to and flourishing of separate, specialized functional units (image processors, “learning” units and such, like, say, the database hardware of old) but they aren’t generally fully programmable (even if they have processors embedded in them) but they key factor is that they don’t interfere (except via some DMA) with the core processing operations.
“Going back to my first sentence about “at the margins”: optimize at the end. Apple famously added a “jvm” instruction — must have been the fruit of a lot of metering! Note that they didn’t have to do this for Objective-C: some extremely clever programming made dispatch cheap.”
I’m struggling to think of what you are referring to here. ARM added op codes for running JVM byte code on the processor itself, but I think those instructions were dropped a long time ago. ARM also added an instruction (floating point convert to fixed point rounding towards zero) as it became such a common operation in JS code. There have also been various GC related instructions and features added to POWER, but I think all that was well after Apple had abandoned the architecture.
GP probably meant a “JS” instruction rather than a “JVM” one: FJCVTZS, “Floating-point Javascript Convert to Signed fixed-point rounding towards Zero”[1,2], introduced in ARMv8.3 at Apple’s behest (or so it is said). Apparently the point is that ARM float-to-integer conversions normally saturate on overflow while x86 reduces the integer mod 2^width, and JavaScript baked the x86 behaviour into the language.
Not adding tagging is basically a negligence crime. That feature isn't that expensive and it could have saved most of the security issues that have happened to last 20+ years.
I think you are talking about different types of tagging. Tagging on Lisps and other language VMs is where some bits in a pointer are reserved to indicate the type. So with a single tag bit integers might be marked as type 0 (so you don’t need to shift things when doing arithmetic) and other objects would be type 1. This provides no real protection against malicious code at all. There are other types of pointer tagging that do provide security, and we are starting to see some support in hardware.
It depends - at the time capability machines were all the rage - the idea there is that you add EXTRA bits to memory (say 33-bit or 34-bit memory - Burroughs large systems were 48-bit machines with 3-bit tags - plus parity) with careful rules about how they could be made, so that pointers could not just be created from integers.
At the time memory was really expensive so blowing lots of bits on tags was a real issue (late 70s we bought 1.5Mb of actual core for our Burroughs B6700 for $1M+) plus as memory moved onto chips powers of two became important, getting someone to make you a 34-bit DRAM would be hard, much less getting a second source as well
"powers of two became important, getting someone to make you a 34-bit DRAM would be hard, much less getting a second source as well".
I have several pounds of 9-bit memory. It's 'parity' memory, was fairly common back in the day, and adding another bit just means respinning the simm carrier to add a pad for another chip. Of course, I don't know if anyone is making 1- or 4-bit DRAMS any more, so you might be stuck adding a 8-bit or larger additional chip. Memory is probably cheap enough now if you wanted 34 bits to play with and were somehow tied to larger power-of-two, you could just go to 40 or more bits and do better ECC or more tag space or just call the top 6 'reserved' or something. It's a solvable problem.
I have built CPUs with 9-bit bytes (because of subtle MPEG details), they made sense at the time and ram with 9-bit bytes was available at a reasonably low premium - that probably wasn't true when the 432 was a thing, RAM was so much more expensive back then.
You could get special sized DRAM made for you, it wouldn't be cheap, getting a second source even more expensive - you'd have to be an Intel, IBM or someone of that size to guarantee large enough volumes to get DRAM manufacturers to bite
If you'll notice, memory is often made on a carrier (e.g. SIMM module). So you don't have to find someone to make you a x9 or x34 or whatever bit wide chip; you find someone to make a carrier out of off the shelf parts with enough chips for your word width (possibly burning some bits). Early 9-bit SIMMs had 2 4-bit wide DRAMS and 1 1-bit wide DRAM. Just need a memory controller that makes sense of it.
(I design memory controllers ....) you can sort of do that depending on where your byte boundaries are (and whether your architecture needs to be able to do single byte writes to memory) - more though I was trying to point out that historically just 'burning some bits' was not something you could practically do cost wise (it's why we built a 9/72/81-bit CPU in the 90s rather than a 16/125/128 one - the system cost of effectively doubling the memory size would not have made sense)
These days (and actually in those days too) memory isn't really the size of the memory bus, often it's a power of two multiple - those 9-bit RAMBUS drams we were using really moved data on both edges of a faster clock - our basic memory transfer unit was 8 clock edges x 9 == 72 bits per core clock - as a designer with even 1 DRAM out there that's the minimal amount you can deal with and you'd best design to make the most of it
My point was there was more than one way of solving the problem (economically optimized or not), and having custom width memory silicon wasn't the only answer. But sure, if you move the goalpost around enough you get to be right.
That's an incredibly bad argument because these are the same computer the NSA and the US govenrment itself uses and they are exposing themselves. Not to mention all the US company who are subject to intellectual property theft and so on.
If you squint hard enough, the underlying object capability system as privilege boundary concept still does live on.
In hardware the 432 went on to inspire 16 and 32 protected modes on x86. There it was inspiration for just about anything involving the GDT and the LDT including fine grained memory segments, hardware task switching of Task State Segments, and virtual dispatch through Task Gates.
But a large point of the RISC revolution was that these kinds of abstractions in microcode don't make sense anymore when you have ubiquitous I$s. Rather than a fixed length blob of vendor code that's hard to update, let end users create whatever abstractions they feel make sense in regular (albeit if privileged) code. Towards that end the 90s and 2000s had an explosion of supervisor mode enforced object capability systems. These days the most famous is probably sel4; there are a lot of parallels between sel4's syscall layer and the iAPX432's object capability interface between user code and microcode. In a lot of ways the most charitable way to look at the iAPX432 microcode was as a very early microkernel in ROM.
timing is interesting: ia432 listed as "late 1981" (wikipedia), and 286 (protected mode 16:16 segmentation) in Feb 1982. of course, the 432 had been going on for some time...
I'm not aware of such Super CISC instruction sets in popular use today, but I wonder with VM's and statistically-based AI proliferating now, whether we might revisit such architectures in the future. Could continuous VM-collected statistical data inform compiler and JIT compiler design to collapse expensive, common complex operations we can't identify patterns for with current methods into Super CISC instructions that substantially speed up patterns we didn't know previously existed, or are our current methods to analyze and implement compilers and JIT's good enough and what's mostly holding them back these days are other factors like memory and cache access speed and pipeline stalls?
But ultimately it seems that the idea of language-specific CPUs just didn't survive because people want to be able to use any programming language with them.
Surviving? No. The most recent is arguably Sun's Rock processor, which was one of the final nails in their coffin, was quite an i432 redux. It promised all sorts of hardware support for transactions and other features that Sun thought would make it a killer chip, was rumoured to tape out requiring 1 kW of power for mediocre performance, and Oracle killed it when they saw how dysfunctional it was.
I feel like the Nvidia chips from NV1-NV3 were doing a take on this idea IIRC. My memory is foggy and would really love to see the documentation again. Possibly it survived on beyond the original chips?
I'm not entirely sure if it was just the driver API or the actual hardware that supported objects but definitely they were trying to abstract it that way.
I found this through googling "redesigned PGRAPH objects, introducing the concept of object class in hardware"
Is there anything comparable surviving today?