The Itanic Has Sunk

throwaway9870 · on July 30, 2021

I would like the suggest the Itanium was a huge business success. At the start of the 90s, there were many decent workstation CPUs: SPARC, MIPS, PowerPC, Alpha, and PA-RISC. Intel CPUs were not considered in the same class. Big engineering work was done on workstations powered by these chips. But as the decade went on, the chips became vastly more complex and expensive to design. Near the middle of the decade, it was becoming clear what the trajectory was and companies started looking around trying to figure out how to navigate this. On top of this problem, Intel/HP teamed up to design a processor that was quite beyond (in terms of complexity) anything being designed at the time and it became clear competing was just not economically feasible. So basically the industry folded. In one move, Intel took out entire industry of companies that could compete. PowerPC is still there, but is marginal. SPARC, PA-RISC, MIPS, and Alpha design teams are gone.

Interesting technical tidbit - I remember an Intel engineer giving a talk at Stanford about the architecture and Hennessy came to listen. At the end of the talk he asked why they thought moving so much complexity to the compiler would work when other (including John) found it to be problematic? They didn't have an answer and I could see the look on John's face and at that point realized they might not understand what they are getting into.

pjc50 · on July 30, 2021

Back in the day of 2001, I was working on the hugely cross-platform Zeus Web Server. It served the whole selection of UNIX platforms: https://web.archive.org/web/20011208194616/http://www.zeus.c...

We certainly had an Itanium build as well, although it doesn't appear on that list, which suggests that it never sold. But it's notable how many of those platforms are effectively dead and only the open source ones on x86 are left standing - and OSX, which deliberately has no server platform.

That era was when people were just starting to realise how much more cost-effective OSS+X86 was. That is what ate the server and workstation markets, especially once people started to horizontally scale and "the cloud" appeared.

The only orgs left with the capacity to choose a new platform and make it cost-effective for the world are basically FAANG+M. And which one of them bought a fabless semi company full of RISC designers from all those dead instruction sets? Apple.

jandrese · on July 30, 2021

The article makes it sound like Itanium was some genius move on Intel's part to kill off the old workstation market, but IMHO that's exactly the opposite. This was Intel spending god knows how much money to jump into the workstation market right as it was suffering its final death throes from fatal wounding it took from the PC clone market.

Sure those workstations were powerful and had impeccable engineering, but they also cost at least 5 times as much as a similarly powerful PC. Sometimes more a full order of magnitude more. Plus they were full of obnoxious licensing requirements, like making you purchase a compiler for thousands of dollars per year or requiring you to buy an obscenely overpriced support contract to get OS updates. They just couldn't complete with some cheap clone running Linux, especially as Intel poured way more money into R&D each year and overtook the workstation chips. By the time the Core architecture, and especially Conroe, were released those old workstations had nothing going for them but inertia.

bcantrill · on July 30, 2021

I was at that Stanford lecture! Assuming it was the one that I attended, the Intel engineer was John Crawford, the lead architect of Merced -- and John Hennessy was absolutely annihilating him. I described attending that lecture in a Reddit AMA years ago [0]:

In fact, allow me to tell a story: as a young engineer, I attended a talk at Stanford, by John Crawford, architect of the Itanium (then known as Merced). He was presenting a bunch of benchmarks that seemed... dubious. And while Merced kicked butt on Eight Queens, it was terrible on gcc (and everything else that looked vaguely real-world). Even as brash as I might have been (ahem), I wasn't about to say anything, but then some guy in the back of the room just got on Crawford on this. Crawford tried to shake him off, but the guy was just a badger, at one point saying "Doesn't this just willfully ignore everything we've learned about commercial workloads over the last decade?" At this point, I turned around -- who the hell was this guy?! -- and saw that it was none other than John Hennessy himself. A Silicon Valley gangland slaying if ever there was one -- and told me everything I (or anyone) needed to know about Itanium's fate.

The room was packed (and open to non-Stanford students, which is how I got there); glad to know I wasn't the only one on whom this made an impact!

[0] https://i.reddit.com/r/IAmA/comments/31ny87/i_am_the_cto_of_...

codesections · on July 30, 2021

Did you tell this story on On the Metal? It sounds familiar, and I'm pretty sure I haven't looked at that AMA.

(I hope the podcast comes back, by the way!)

bcantrill · on July 30, 2021

Surely -- I have retold it many times over the years. And we're working on On the Metal! A combination of pandemic/resurgence + us being (very) heads down has delayed us, but stay tuned!

throwaway9870 · on July 31, 2021

That sounds like the lecture I was at. Thanks for sharing that, fun times. There were some really great lectures back then. I remember when Exponential gave their talk right after Apple moved away from the PowerPC architecture. Everyone knew their processor was a dud in every commercial way (super high-power BiCMOS PowerPC chip), but almost felt sorry for the guy and were amazingly polite.

mumblemumble · on July 30, 2021

Sort of the New Coke theory of CPUs. I like it.

--

For anyone who wasn't there: In the 80s, Coca-Cola had lost its crown. Pepsi was the most popular soft drink, at least in the USA. Coca-Cola responded by changing their recipe, to great fanfare.

It's often held up as one of American business's greatest failures. People hated New Coke, and later that year they brought back the old formula, under the name Coca-Cola Classic.

I think that that summary of the story misses an important point, though. After the dust had settled. Coke had regained its crown, and has held it ever since. So, while New Coke itself failed on the market, the event's long-term influence on the company's financial outcomes was overwhelmingly positive. To the extent that some people have claimed that the whole thing was just an exceptionally clever PR stunt.

WorldMaker · on July 30, 2021

New Coke is a great example of design biases. Pepsi in the 80s was beating Coke year after year in 80s double blind taste tests of 2oz or so sample pours. Pepsi was hugely proud of this and it was a big part of their advertising.

Coke wanted to beat those tests, so they designed a soda that people loved much more than Pepsi under those same double blind studies. In an 2oz or so sample people adored New Coke over Pepsi.

In a 2oz pour (shot) the sweetness is a big differentiator, the one that stands out is pretty much always going to be the sweetest. (Like the Loudness Wars in music, there's a dumb human psychological bias in small samples for sweeter in taste or louder in volume as "better", despite both being awful for your health in quantity.) What Pepsi and Coke both seemed to forget in the 80s that focusing on these 2oz "micro-benchmark" test studies was that most people don't drink just 2oz at a time, the unit is generally closer to the 12oz can or more. When drinking more than say 2oz overly-sweet becomes a problem in how it lingers and you want a more balanced palette, which Coke Classic always delivered better than Pepsi.

The failure of New Coke will always be a great illustrated story of make sure you are designing for the right benchmarks (and sometimes micro-benchmarks especially are a trap).

I believe that deeper extension of this metaphor likely also applies to Itanium, they designed it for 2oz pour benchmarks where it sometimes got fantastic numbers, but most compilers got awful numbers in realistic real world workflows.

ngold · on July 30, 2021

Hey, thanks for a new take on an old story. That makes much more sense than any other version I have heard over the decades.

sneak · on July 30, 2021

FWIW, both New Coke and Coke "Classic" (aka new coke 1.1) both use high fructose corn syrup, which is substantially cheaper in the USA in industrial quantities due to huge tax-funded agricultural subsidies for corn (maize) farmers. These remain in place today (which is why everything in the US is sweetened with corn syrup instead of sugar).

"Old coke", as well as what is now known as "mexican coke" in the USA, used/use sugar.

New coke was the flag day for switching to a cheaper sweetener in the US market. When they switched "back" to coke "classic", they kept the new sweetener.

lizknope · on July 30, 2021

This article claims that Coca Cola switched to high fructose corn syrup in 1984 which was a year before the debut of New Coke

https://www.motherjones.com/food/2019/07/the-secret-history-...

creamytaco · on July 30, 2021

When I first moved to USA from Denmark, back in the late 90s, the chemical, sickly syrupy taste of Coca Cola made me gag and I was very surprised in terms of how different it was from the European version. Then I found out that in Europe, they use sugar rather than corn syrup. It's amazing what the Americans will put up with.

tptacek · on July 30, 2021

Two things here.

First, people have done triangle tests on sugar- vs. HFCS- formulations of Coke (Serious Eats has a good one) and, while there is definitely a perceptible difference, the preferences aren't what you'd expect. People who go into these tests saying they prefer cane sugar Coke tend actually to reveal an HFCS preference in the test.

Second, formulations in different parts of the world vary in ways other than which sweetener they use. It's possible that regardless of the sugar involved, there might just be more of it in the formulation (or in the way it's delivered, since concentration systems also vary) in your least-preferred Coke instances.

sneak · on July 30, 2021

Thomas' knowledge-dropping in a sibling comment is more in depth, but anecdotally: I prefer sugar coke after drinking HFCS coke in the USA for a few months, then prefer HFCS coke after drinking sugar coke in europe for a few months.

Non-lite/diet/zero coke is always sickly syrupy oversweet regardless of whether it's sugar or corn syrup. It's sort of what people are going for. I can't believe I used to drink liters of it a day growing up; I limit myself to a dozen servings a year or so (of the full sweet stuff) these days.

novok · on July 30, 2021

Might be a novelty effect there. We all tend to like new foods if we eat one food too much.

ksec · on July 30, 2021

Well is the same with Carlsberg, they taste "different" around the world.

jabl · on July 30, 2021

True, but the other side of the coin is that the massive economics of scale in chip manufacturing as well as around ISA ecosystems (software) meant that some massive consolidation of the industry was all but inevitable.

Had Intel not embarked on the Itanium project, we might still have ended up with something close to the world we have now, with x86-64 and ARM being the top dogs, and some "traditional RISC workstation" architecture surviving in a small high-end niche (POWER).

throwaway9870 · on July 30, 2021

No question there are a lot of ways to win, probably many better than what transpired. But in the end, Itanium played a big role in Intel consolidating the CPU market and taking complete control for a significant period of time.

cpleppert · on July 30, 2021

The competing unix manufacturers didn't need Itanium as an excuse to fold; they were completely non-competitive by the time the itanium came around. Customers didn't purchase a unix system because of the performance of the proprietary architecture but the entire ecosystem provided by manufacturers. DEC and its Alpha was on life support even though it was marginally more competitive against x86 precisely for this reason.

The manufacturers that intended to switch to Itanium were either part of HP (compaq & dec) and would have thrown in the towel otherwise or were desperate to remain relevant (SGI). They could never had tried to compete itanium or no itanium. Itanium actually made no impact on the strongest competitors, SPARC and POWER.

Not only did Intel lose billions on Itanium but it tied up development resources that couldn't be used against AMD and influenced the decision not to compete in the mobile space.

fatbird · on July 30, 2021

Your causality doesn't seem to line up. As you tell it, the workstation CPUs were already peaking in terms of complexity and cost, when Intel and HP announce a huge investment in the next generation of this class of CPU.

This doesn't signal that the segment is economically unviable, and if you're an existing player with an existing strong design, it signals an opportunity to invest in your own next generation and meet Intel/HP head-on with a superior brand (for this segment) and a more plausible next-gen design.

Intel and HP show up promising a clean sheet design in a decade doesn't seem like a time to roll up your market leading offering and go home.

jeffbee · on July 30, 2021

Nothing about this history lines up with facts. Intel killed the RISC workstation CPUs with the Pentium Pro, not with the Itanium. PRO/Engineer and SolidWorks had been ported to Windows NT at the very beginning of that operating system's availability, and the performance on x86 was already better than RISC workstations, at a tiny fraction of the cost, by late 1995.

throwaway9870 · on July 30, 2021

The Pentium Pro and eventually the AMD64 chips equalled or bested the workstations on many tasks by the end of the 90's and early 2000s. But the Itanium processor distracted them from even competing. SGI for example had at least one high-end MIPS processor under development (iirc "the beast") and they stopped the project and later were trying to ship Itanium servers. HP threw the towel in and joined forces with Intel.

Here is an article from 1999 on Sun booting SunOS on Itanium:

https://www.zdnet.com/article/sun-boots-solaris-on-itanium-h...

Compaq stopped making Alpha servers and switched to Itanium:

https://en.wikipedia.org/wiki/DEC_Alpha

Itanium was a huge misdirection and distraction for everyone in the workstation business and basically put them in such a bad spot they failed to even compete with Intel x86 past 2000 or so in any real way.

cpleppert · on July 30, 2021

The Itanium wasn't just designed as a workstation CPU it was designed to be superior to existing high-end systems and then over time with volume be the eventual x86 replacement. The business case for itanium didn't work as a workstation cpu.

>>Intel and HP show up promising a clean sheet design in a decade doesn't seem like a time to roll up your market leading offering and go home.

They didn't roll up so much as have their existing market disappear. SGI, DEC and Compaq were dead. Sparc and Power gradually transitioned away from any workstation offering to servers.

nabla9 · on July 30, 2021

Itanium was disappointing from the day one. Itanium never matched the performance of RISC or CISC competitors (POWER, SPARC, x86).

HP replaced Alpha with Itanium but that was a costly mistake.

throwaway9870 · on July 30, 2021

Alpha was DEC, HP had PA-RISC. I had a PA-RISC in my office and had to have facilities come and adjust air conditioning airflow because the computer put off so much heat.

_abox · on July 30, 2021

Not all of them did. I have a 712 PA-RISC workstation and its CPU is passively cooled with a spring loaded tiny cooling block. They didn't even bother to use paste.

acomjean · on July 30, 2021

I had a PA-RISC workstation too. No heat problems, but it wasn't particularly fast. But the "superdomes" (Refigerator sized computer in the very cold and loud server room....).

It was weird, I worked on those superdome machines for years without seeing them. and only got to see them when they had to debug some of my serial port code (it turned out the cable was wired wrong...)

HPUX had some really interesting Real Time Extensions. I was on a exploratory group for switching to Linux and scheduling control was going to be a problem (15 years ago...)

_abox · on July 30, 2021

Yeah we had a superdome at work too. One of the earlier beige PA-RISC ones (the itaniums were black IIRC).

But those were in a class of their own... :) If you scale things up that big they're going to belch heat no matter the architecture.

I don't think PA-RISC was inherently worse at performance to Watt than other competing processors at the time.

blakespot · on Aug 2, 2021

I love my HP PA-RISC "Gecko."

https://bytecellar.com/2016/03/02/a-quick-tour-of-the-hp-900...

icedchai · on July 30, 2021

They may have meant that HP acquired Compaq, which had acquired DEC. Itanium was a joint project between HP and Intel, IIRC.

nabla9 · on July 30, 2021

HP had PA-RISC and Alpha.

HP bought Compaq 2001, Compaq had bought DEC earlier.

tyingq · on July 30, 2021

I follow your line of reasoning, but I think AMD had a much bigger role in killing off the various RISC architectures.

Itanium felt much more like a sideline activity for those that didn't believe x86_64/Linux was going to be as big as it was. They viewed it as a sort of "toy infrastructure" until it was too late.

bsder · on July 30, 2021

> They didn't have an answer and I could see the look on John's face and at that point realized they might not understand what they are getting into.

Ayup. The DEC guys cheered when Intel announced that Itanium would be VLIW. They knew that Intel was about to sink a huge amount of engineering effort into a dead end that would never work.

Of course, nobody realized that all the business management chains were complete cowardly dipshits and that Intel spending a Gigabuck was sufficient to terrify them into completely abandoning the high-end market to Intel who milked it for years.

So ... Itanic was a horrible engineering failure but a great business success.

acje · on July 30, 2021

Itanium was a huge business success for anything AMD64. I think IA64 was too early and targeted the wrong market too be a success of its own. Also being a single source ISA is very unattractive. If the ISA had been open like RISC-V, and the target market was scale out servers and small computers rather than the nonexistent scale up HPC market and almost nonexistent scale up database market, things could have been very different. On the positive side maybe some open source ISA can emerge using EPIC type architecture now that the proprietary version is dead.

Hypx_ · on July 30, 2021

Though in hindsight a 64-bit version of x86 would’ve killed them all anyways. Had Intel invested money into that instead, they’d still win easily.

grp000 · on July 30, 2021

Might be a case of sunk cost fallacy. The article mentions that while vendors were asking for 64 bit, they didn't want to cannibalize Itanium, but were forced to anyway when AMD came out with their 64 bit pentium compatible CPU.

andrem · on July 30, 2021

Little anecdote:

The Itanium led to one of a more remarkable episodes in my career. Around 2007 we were running a heavy workload on MS SQL Server in a fast growing business.

We faced a lot of outages due to DB overload. Instead of trying to investigate and understand the issue better and optimize the software, some external consultants were brought in and recommended to upgrade the hardware to an Itanium based monster. It was a massive piece of hardware with a price tag close to 7 figures.

The thing went live and performance decreased and issues increased. After a couple of weeks of trying to run on the Itanium we switched back to the old setup and then focused on software improvements.

Long story short - after about 8 weeks of dedicated troubleshooting and improvements the whole app became stable, was capable to double workload on the same hardware without outages for the next 12 months.

The Itanium took up a lot of space in the server room before it was dismantled and used as a paperweight. A lawsuit involving multiple parties (supplier,consultant,business) eventually got settled out of court in 2015 (?) long after I left there.

Farewell Itanium :)

lumost · on July 30, 2021

It’s funny, I remember many such projects back when my title was “systems engineer”. I swear the appeal was always that company could spend $$ and get a working solution. It after all makes sense that a growing business would need more hardware. But the problem usually was a few terrible queries.

I swear the big benefit of cloud deployments is the teams ability to say “we tried throwing $$ at the problem, if we don’t want to spend $$$ we’ll need to do some work”. And have this convo play out over a day rather than months.

35fbe7d3d5b9 · on July 30, 2021

> I swear the big benefit of cloud deployments is the teams ability to say “we tried throwing $$ at the problem, if we don’t want to spend $$$ we’ll need to do some work”. And have this convo play out over a day rather than months.

My current job is on the tail end of hypergrowth and we are just starting to get our arms around the years of hacks and inefficiencies that made it possible to succeed. We've had a dozen conversations where we've decided

* to throw $10^2/day at a problem for two weeks so the engineers are free to deliver the features required to land $10^5/year in ARR, then work on perf

* to analyze the system, identify the one or two features that cost the most, and tackle those while leaving the rest alone

* and yes, to translate inefficiencies to real dollars and use that to force prioritization (we do that a lot ;-))

A team that understands cloud computing and can do some cost forecasting makes some amazing things possible.

api · on July 30, 2021

> some external consultants were brought in

Is there a single example of that working anywhere ever?

I have never seen one.

jiggawatts · on July 30, 2021

I am that "external consultant".

I have recommended buying a huge piece of tin to run SQL Server on as a valid, cost-effective solution to a performance problem. Currently, EPYC CPUs are great value for money, programmers are expensive, and some workloads are too time-consuming to tune.

The customer implemented the change, and it worked.

I have also recommend a reduction in size of a too big SQL Server coupled with some judicious optimisation to reduce the load dramatically. Even expensive programmers can spend a few days of their precious time fixing glaring query issues.

The customer implemented this change also, and it also delivered the promised benefits.

I have the before-and-after metrics to prove that there was a huge benefit in both cases.

In both cases the issues were ongoing, had caused drastic outages, and the internal staff were not capable of resolving the issues on their own.

To be honest, 99% of my job is just to be the outsider that's not playing politics and not stuck in a narrow job description. I'm told to "fix it", so that's what I do. The internal staff have "roles and responsibilities", and they fight with other teams more than they cooperate. Some people actively hate each other. I come in as the neutral party and for a brief shining moment I can get everybody to row in the same direction.

ChuckMcM · on July 30, 2021

This: To be honest, 99% of my job is just to be the outsider that's not playing politics and not stuck in a narrow job description. I'm told to "fix it", so that's what I do. The internal staff have "roles and responsibilities", and they fight with other teams more than they cooperate. Some people actively hate each other. I come in as the neutral party and for a brief shining moment I can get everybody to row in the same direction.

That is the single reason driving external consultant hire in many enterprises.

ngold · on July 30, 2021

My pops did a consulting gig in the 90s for Rockwell, and he described it as being the sheriff dressed in black.

Always thought that was funny, and probably pretty accurate.

mprovost · on July 30, 2021

There was that time a bunch of Silicon Valley engineers went and fixed the healthcare.gov (Obamacare) site for the US govt after the original contractors did a terrible job. But then the first team were already contractors, so maybe the lesson there is, if your first external consultants don't work, just keep bringing in more?

dekhn · on July 30, 2021

the engineers who "fixed" the site all worked themselves to near-death "fixing" the site. Not sure that's a real win.

thehours · on July 30, 2021

I didn’t know much about this from the contractors’ perspective. This Atlantic article [1] seems like a decent overview for anyone else interested.

[1] https://web.archive.org/web/20210705021438if_/https://www.th...

luma · on July 30, 2021

Loads, but successful consulting engagements don't lead to headlines. Source: 25 years as a consultant with many successful customer projects under my belt and zero that landed my name in the press.

andrem · on July 30, 2021

Totally agree with this. External consultants are easily blamed. But the problem usually lies elsewhere - someone makes a decision without understanding the full scope of the challenge.

ehvatum · on July 30, 2021

Yes, it works when your best people get tired of middle management interference and quit to form a consultancy that upper management eventually hire in desperation.

Kye · on July 30, 2021

Some of the best anecdotes on HN are of the genre "company brought me back as a consultant for a multiple of what I made as an employee when they realized their mistake."

dredmorbius · on July 30, 2021

Selection bias. The bad cases are the ones you hear about --- lawsuits and politicking (office or government) resulting in airing grievances to the press.

fatbird · on July 30, 2021

Yes, this. I've never seen a "crisis averted by timely application of consultants!" post on Medium, but I'm sure many such tales exist, as I'm one of those consultants.

sbierwagen · on July 30, 2021

Fun fact, Intel did this exact thing once before, in 1981, with the iAPX 432: https://en.wikipedia.org/wiki/Intel_iAPX_432

They had made the 8008 and 8080, but those were awkward and ungainly chips meant to power calculators, of all things. iAPX 432 was a clean sheet 32 bit design designed for high level languages and real computers, kind of like mini Lisp machines, complete with native machine support for garbage collection. But it was taking a while to get out the door, and performance wasn't so great, so they hacked together a quick upgrade to the 8080... the 8088, which was used in the original IBM PC.

History takes over from there.

masswerk · on July 30, 2021

> They had made the 8008 and 8080, but those were awkward and ungainly chips meant to power calculators, of all things.

The 8008 was the (slower) single-chip implementation of the Datapoint 2200 terminal's processor. The DP 2200 was a serial terminal capable of running code on its own as a standalone machine, conceived in 1969 (originally as a drop-in replacement for the IBM 129 key punch), announced in 1970, and eventually introduced in 1971. Datapoint had commissioned the chip, but rejected the result, as it was slower than the discrete logic design, with rights for the chip-implementation remaining at Intel. This remarkable terminal has been with us ever since.

[0] https://en.wikipedia.org/wiki/Datapoint_2200

criticas · on July 30, 2021

There were two competing schools of thought in the late '70s and early '80s.

One said that computer hardware should have hardware support for high level languages. That school led to the VAX, iAPX432, and 80X86. The other school said that computer hardware should be simple so that it could go faster, and that the compiler should be smart enough to map high level languages to simple hardware.

Intel was firmly in the first camp. It flirted with the second repeatedly: see the late '80s i860/i960. Itanium was the biggest bet placed: it was a RISC-like architecture with the added complexity of multiple dispatch encoded in it's Very Long Instruction Word architecture. The compiler could rearrange instruction order to keep hardware busy, AND (to some extent) instruction contents.

Itanium seemed like a reasonable bet, and Intel (and HP) had the clout to convince others. It was a factor in the decline of the Alpha and MIPS server market, and even Sun hedged its bets by porting Solaris.

phkahler · on July 30, 2021

>> That school led to the VAX, iAPX432, and 80X86.

The MC68000 ISA was meant to run C code.

jhgb · on July 30, 2021

I love the function call timing story:

> The iAPX 432 failure could best be summed up by a meeting of Intel marketroids and Tandem engineers who wanted to use the chip in their next generation machine, after the slides, the senior engineer asked: “How long does it take to execute a procedure call?”

> The presenter looked it up. “Two hundred and fifty microseconds.”

> Tom immediately walked out, followed by the majority of the Tandem software department. The presenter was poleaxed. “What did I say?”

Found in comments to https://www.youtube.com/watch?v=FvmTSpJU-Xc , but I saw it somewhere else, too, I just don't recall where. Doesn't anyone know?

mst · on July 30, 2021

Interestingly, it appears based on https://en.wikipedia.org/wiki/Intel_iAPX_432#The_project's_f... the procedure call instruction was a heavyweight thing designed to maximise features and it could branch much more quickly - but their compilers didn't handle that out of the box.

krylon · on July 30, 2021

And they did it again with the i960 and i860. I'm told they were quite nice processors in their day (never having been close to one myself, that was before my time), but for whatever reasons, they did not go mainstream.

quercusa · on July 30, 2021

The i960 flopped in part because it was tied [0] to the I2O ('Intelligent I/O') project. (The 2 was always rendered as a subscript). I2O pushed a split-driver model in which the OS driver ('top half') did not talk directly to its hardware ('bottom half'), but queued I2O messages to the bottom half, which sat behind the i960 which did the proxy work.

IHVs had little interest in I2O as it would reduce Intel's costs for swapping out vendors and there was no demonstrable performance improvement. The latter was at least in part because the I2O infrastructure was immature in comparison to the IHV drivers. It eventually formed the model for I/O over Infiniband (where it did make sense).

[0] It's not clear if I2O was the original application for the 960 or it was a pivot for an otherwise homeless processor.

adrian_b · on July 30, 2021

During the eighties there was a joint research project of Intel with Siemens, for a new processor architecture named BiiN.

For some reason, the BiiN project was terminated in 1988 and Siemens was not interested any more in it.

On the other hand Intel decided to not scrap the results of that project and they introduced the 80960 series based on the architecture formerly known as BiiN.

The commercial name 80960 was derived from their previous 8096 series of 16-bit microcontrollers, so 80960 was initially presented as higher-performance 32-bit replacement for the 8096 series, which was used in various embedded computers.

One interesting feature of BiiN was that it was the first monolithic CPU with an atomic fetch-and-add instruction (first used in 1981 in the NYU Ultracomputer project).

The 80960 inherited the atomic fetch-and-add from BiiN and then Intel added it to 80486, under the XADD mnemonic, together with the atomic compare-and-swap taken from IBM 370 and Motorola 68020 (CMPXCHG).

The applications for which 80960 was best known, like I2O and laser printers, happened significantly later than its initial introduction.

quercusa · on July 30, 2021

BiiN would eventually be explained as "Billions invested in Nothing".

clacke2 · on Aug 9, 2021

> For some reason, the BiiN project was terminated in 1988 and Siemens was not interested any more in it.

You mean it was biinned.

drmpeg · on July 30, 2021

I2O was definitely a pivot. The original i960Kx, i960Cx and i960Jx had nothing to do with I2O. I2O was introduced later with the i960Rx series.

I developed an i960RP design back in the late '90s for an MPEG-2 encoder PCI card. The encoder chips were also PCI, so the PCI bridge on the i960RP made for a nice design where all the PCI stuff was handled in one fell swoop.

lizknope · on July 30, 2021

The i960 was launched in 1984

https://en.wikipedia.org/wiki/Intel_i960

The I2O project wasn't until the mid 1990's

https://en.wikipedia.org/wiki/I2O

adrian_b · on July 30, 2021

No, the commercial launch was in 1988.

In 1984 it was just the start of a research project named BiiN, of Intel and Siemens, which lasted 4 years, until 1988, when Siemens dropped out.

The series name, 80960, did not appear before the launch from 1988.

rwmj · on July 30, 2021

The i860 was lovely! I wrote some software for an i860-based hypercube back in the day, and it was the fastest thing on earth.

The reason it never went mainstream is easy enough to explain: expensive chips and (for the hypercube) a very different programming model.

f00zz · on July 30, 2021

The i960 was the main CPU in the Sega Model 2 arcade board, used in games like Daytona USA and Virtua Cop. At the time (early 1990s) the texture mapped 3D graphics in those games were pretty impressive.

https://segaretro.org/Sega_Model_2

cesaref · on July 30, 2021

I remember i860 accelerator cards being used with Fortran compilers, not really sure if this went anywhere, it's all a bit hazy exactly what happened, but the i860 was VLIW, so the compiler tech required for Itanium was already being explored for the i860.

I also believe that MMX was basically lifted from the i860, although again, i'm not sure.

It's funny, I tend to think of Intel as the boring chip company, due to the success and legacy that we like to moan about from ia-32, but they have actually made a fair few plays to try and move things on from there, fighting against the market which just wanted ia-32 compatible but faster processors.

krylon · on July 30, 2021

I don't know about the i860, but accelerator cards for math-heavy workloads are still a thing, of course. I remember seeing a brochure for an add-on card with one or two Cell CPUs, Intel has (had?) their Xeon Phi, and of course GPUs are very popular for things other than graphics.

Intel, for better or worse, are a victim of their own success. On the plus side, their success gave them lots of money to throw at the problem of making faster x86 CPUs. It seems, though, that Intel is gradually running out of luck, with AMD and now Apple introducing strong competitors, and Intel's advantage in fabrication eroding. So CPU-/ISA-wise, things could get very interesting in the foreseeable future.

justthisonce666 · on July 30, 2021

I've programmed both, though only shipped products that used the i960 (CA and KB). I believe the i960 tended to be used in things like printers but there weren't many design wins for the i860.

I managed to miss the magic that is the 88k thankfully. Who actually used it? Linotype? Tek workstations maybe?

jes · on July 30, 2021

I worked with the Intel i960CA parts and the i960MX parts, while working for Applied Microsystems Corporation (now defunct.)

Applied made high-end in-circuit emulators. My team worked on software for execution trace disassemblers. We had wide and deep memories and could record something like 16,384 bus cycles in emulator (trace) memory.

Once we had some bus cycles to analyze, we’d sort out what the processor had done when it ran (an execution trace) and show it. It was a great tool for answering questions like “How did I get here?” when your embedded code jumped off into the weeds.

The i960CA was relatively simple to work with. The i960MX was a lot more complicated, as I recall.

bashinator · on July 30, 2021

I believe Data General AViiON mainframes used m88k CPUs. One came through a used computer shop I worked at in the early 2000s.

__d · on July 30, 2021

Some NCD X terminals used M88k CPUs.

the_only_law · on July 30, 2021

I’ve been trying to hunt down some X terminals recently. So far have only found some HP ones.

__d · on Aug 1, 2021

I used to have both a 15" all-in-one model, and a separate pizza-box style model with a 17" monitor. IIRC, both were 88k CPUs.

tenebrisalietum · on July 30, 2021

I think either the i860 or i960 was a CPU used in Adaptec PCI cards in the 90s.

krylon · on July 30, 2021

I recall Wikipedia saying one of them (I tend to confuse the two) was popular in RAID controllers, so that sounds plausible.

justthisonce666 · on July 30, 2021

While I was musing about such matters, now, the TI C80 MVP, that was an interesting one to write software for.

ngcc_hk · on July 30, 2021

ibm used 8088 a 8 bit bus version of 8086?

sbierwagen · on July 30, 2021

Comment corrected.

justthisonce666 · on July 30, 2021

>They had made the 8008 and 8080, but those were awkward and ungainly chips meant to power calculators,

I remember talking to Vic Poor when I used to work with him and calculators didn't seem to figure into the whole deal. It's probably worth a quick read through Datapoint history.

fredoralive · on July 30, 2021

I think the poster is thinking of the 4004, which was originaly designed for a calculator. But that isn't really related to the 8008 onwards, which are based on a Datapoint computer terminal as you say.

lmm · on July 30, 2021

People let their (understandable) hatred of Intel-the-company colour their technical judgement. Itanium was one of the more interesting architectures of its time, it fairly flew on expert-tuned assembly; I still believe we'll see a return to its ideas once the computing world finally moves on from C.

(Netburst is also unfairly maligned if you ask me; contrary to the article, enthusiasts have clocked those P4s up to 12GHz. As far as I know they're still, over a decade later, the fastest CPU for single-threaded sequential integer workloads that has ever been made; certainly the fastest x86-compatible processor for such. They're kind of the equal and opposite failure to the Itanium, ironically enough)

chithanh · on July 30, 2021

> People let their (understandable) hatred of Intel-the-company colour their technical judgement. Itanium was one of the more interesting architectures of its time, it fairly flew on expert-tuned assembly;

I know only few people who maintained software for Itanium, but from their reports it was a nightmare to debug code on. To have a chance to see what was going on, you would have to use special debug builds that disabled all explicit parallelism. Debugging optimized code was almost impossible, and user-provided crash dumps were similarly useless. Your only hope would be that the issue was reproducible in debug builds or on other architectures.

Needless to say, they hated it and were happy when ia64 was finally phased out.

> once the computing world finally moves on from C.

Yeah, it moves on from C... to JavaScript. Making compilers slow and complex doesn't mix well with JIT compilation.

One thing I have to give Itanium credit for is that due to EPIC it was totally safe from the speculative execution vulnerabilities like Spectre/Meltdown/etc. That was certainly a forward-looking aspect of it.

pjc50 · on July 30, 2021

> you would have to use special debug builds that disabled all explicit parallelism

Oh god. Let me guess, when it crashes, you get a pointer to the word with the failed instruction in ... but no elaboration on which of the 3 instructions it was? Or is it worse than that and it fails to maintain the in-order illusion?

clacke2 · on Aug 9, 2021

> Making compilers slow and complex doesn't mix well with JIT compilation.

Funny, I was just thinking the opposite: Compiler-driven parallelism loses against CPU-driven parallelism because the CPU has live profiling. With a JIT the compiler can have it too.

The debugging problem on the machine-code level becomes less of an issue when most people write higher-level code too.

throw0101a · on July 30, 2021

> it fairly flew on expert-tuned assembly

There's your problem.

Given the bajillion programs out there already, how many companies wanted to dig into assembly instead of just waiting 18-24 months for Moore's Law to speed up their software?

It's all very well and nice to have nice hardware in theory, but if you can't compile existing code to be fairly fast, then in practice you just have some expensive sand (silicon) in the shape of a square.

> People let their (understandable) hatred of Intel-the-company colour their technical judgement.

So getting back to your first statement: no they didn't. Everyone was basically all-in on Itanium. All the Unix vendors (except Sun) dropped their own architectures and steered their customers toward Intel. Microsoft released software for it.

But it seems the market didn't like what they saw, and just kept on with x86—and then amd64 came out and gave 64-bits to everyone in a mostly compatible way.

OldHand2018 · on July 30, 2021

> how many companies wanted to dig into assembly instead of just waiting 18-24 months for Moore's Law to speed up their software?

The people that bought Itanium-powered servers certainly weren't replacing them every 18-24 months. At the price they paid, you were looking at 5-8 years of computing before replacement. Or more.

My employer bought a pair of the final batch of Itanium servers. To replace 10-year old ones. This was an insurance purchase. The original plan was to shift all of that workload into the cloud, but that's neither going quickly enough nor is it saving any money. If you have a workload for which Itanium does well, it does it really well.

throw0101a · on July 31, 2021

> The people that bought Itanium-powered servers certainly weren't replacing them every 18-24 months.

I was referring to the software vendors: why would they go through the effort of optimizing their code for this new architecture when they could simply wait a little while for the "old" one to get faster via Moore's Law?

Nursie · on July 30, 2021

> All the Unix vendors (except Sun)

cough IBM cough

They were never going to ditch power... Did they ever even have an Itanium product? I know they've had x86/x86_64, all sorts of power variants like Cell and god knows what.

I did briefly work on an Itanium system at IBM, but it was an HP box.

trasz · on July 30, 2021

Of course they did: http://public.dhe.ibm.com/systems/support/system_x_pdf/x455o...

Nursie · on July 30, 2021

Oh interesting, they were going to roll it into xSeries.

MichaelZuo · on July 30, 2021

I imagine then there will be a great resurgence of interest after Moore’s Law hits the atomic scale wall.

gpderetta · on July 30, 2021

I Can't find any hit for 12ghz P4. I thought the record is around ~8GHz (And you can push modern processors in that ballpark).

I doubt that even a 8GHz P4 would be able to beat a lower clocked more modern design even on single threaded integer workloads. The P4 had a lot of glass jaws (the non-constant shifter, load replays on misses, very narrow decoder when running out of trace cache).

fogihujy · on July 30, 2021

I've heard about 8+ GHZ Celerons (Netburst-based ones) and they were definitely on top a few years ago. I haven't kept track lately, though, and those records may have been beaten by now.

selectodude · on July 30, 2021

https://valid.x86.fr/records.html

I think that's still pretty much the bible for frequency records.

ngold · on July 30, 2021

That is crazy fascinating. It seems windows xp and celeron and amd fx chips with 2 to 4 gigs of ram are where it's at.

pjc50 · on July 30, 2021

> the computing world finally moves on from C.

The computing world has moved on from C, mostly. To Javascript. The main impact of that seems to be a couple of numeric conversion instructions on ARM?

(OK, not entirely fair: the computationally heavy stuff has moved away to GPUs. But if you ask the question for every button press a human makes on a computer where the dominant execution time is you might have some interesting answers, and for a lot of them it is going to be JITted Javascript)

I think it's fairly clear that for general purposes VLIW is not what either the programmer or the compiler writer wants to deal with. In-order execution is such a convenient mental model that people are willing to accept any tricks that keep it working.

kmeisthax · on July 30, 2021

The numeric conversion instructions you're thinking of are branded "JavaScript", but actually exist to emulate Intel x86 floating-point behavior. It just so happens that the ECMA specs call for said behavior because existing code relied upon it.

IshKebab · on July 30, 2021

Nonsense. A ton of code is still written in C/C++. What do you think runs all of that Javascript?

The C world isn't moving to Javascript, it's moving to Rust, Zig and Go.

the_only_law · on July 30, 2021

Kinda veering a bit off topic, but I’ve always seen go marketed as a systems languages alongside C, Rust, etc., but in practice I’ve only really ever seen it used to developer high level web applications.

sophacles · on July 30, 2021

Docker and k8s, flannel, etc are all written in go and something I'd consider "systems programming" - I mean they have to do some pretty complex coordination w/ the kernel to do thier work.

api · on July 30, 2021

My understanding is that the problem with VLIW is that it exposes too much. Anything you expose via the instruction set becomes fixed permanently, so if you have say 4X wide VLIW there is no way to ever make it wider or change how things like dispatching work. The only way to do that would be to start pipelining and scheduling VLIW chunks, in which case you are back where you started.

Instruction level parallelism achieved by decoding a single stream and then sorting and scheduling requires more silicon and a bit more power, but low power high performance superscalar chips like modern ARM64 CPUs have shown that the cost is not that high and that you can go very wide. The M1's Firestorm cores are 8X wide from what I read, which is better than Itanium.

Since the whole superscalar architecture is hidden, it can evolve freely.

That being said I don't think VLIW was a horrible idea at the time, and it might still have a chance if it were revived in specialized high performance or ultra-low-power use cases. The mistake was betting the farm on it.

The other big thing we learned since then is that the important part of RISC wasn't reduced instruction set size, but uniform instruction size and encoding. That allows you to decode arbitrarily wide chunks of instructions in parallel without crazy brute force hacks like those required to do parallel decoding of the variable length X86 instruction stream. The problem with CISC isn't how many instructions there are, but the complexity of the encoding and the presence of a lot of confounding requirements that arise from instructions that to very different things at once (e.g. complex math with memory operands). You want the instruction stream to be trivial to decode and easy to schedule.

In the end the best approach seems to be a simple general purpose instruction set augmented with special instructions for common special cases that can be greatly accelerated this way (e.g. vector operations, floating point, cryptography, etc.), and all with a logical fixed length encoding that is easy to decode in parallel. Load-store architecture and a relaxed memory ordering model seem to also be performance wins since separation of concerns simplifies the scheduler. The future (for conventional CPUs) looks a lot like ARM64 and RISC-V.

lizknope · on July 30, 2021

My professor in college back in 1997 was doing research on maintaining binary compatibility between different generations of the same VLIW architecture. If you had a different number of execution units and stuff like that. He had a few ideas and one of them was preprocessing the compiled binaries and rewriting them. Some of them were having flags in the architecture for what generation of chip they were to have the OS do on the fly changes

twic · on July 30, 2021

In a world where all software is JITted (Java gang rise up), the fixedness of a VLIW ISA doesn't matter, because you always compile specifically for the target machine anyway. What you describe sounds like applying that strength of JITting to AOT-compiled code.

Vaguely related ideas from the distant past are ANDF:

https://en.wikipedia.org/wiki/Architecture_Neutral_Distribut...

And TaOS's VP Code:

https://sites.google.com/site/dicknewsite/home/computing/byt...

clacke2 · on Aug 9, 2021

This is what IBM did with IBM i / AS/400 / System/38 and https://en.wikipedia.org/wiki/IBM_i#TIMI.

IBM i is on a POWER CPU today, but can still run System/38 binaries from the 70s, thanks to install-time compilation to whatever CPU the system is running this decade.

xorcist · on July 30, 2021

What types of languages do you see would enable more efficient VLIW compilers?

From my limited perspective, I find C one of the easier languages to write optimizing compilers for, and would therefore expect optimizing compilers to be the most efficient there. 40 years of collective experience of optimizing for C-like languages also helps of course.

Or is it the lack of explicit parallelism in the language that is limiting? Somehow I suspect the limited uptake of better suited languages to be a sign that they aren't very helpful most of the time, and most of the parallel operations people do is more like serving a lot of individually sequential transactions per second, which is something C and unix is pretty good at.

chalst · on July 30, 2021

C is hard to optimise. Graydon Hoare gave a nice introductory compilers talk that went into some of the reasons why

http://venge.net/graydon/talks/CompilerTalk-2019.pdf

throwaway81523 · on July 30, 2021

One well known obstacle to optimizing C is the difficulty of alias analysis. It's easier to do that for languages that don't have C's unrestricted pointers.

jonathrg · on July 30, 2021

Doesn't the restrict keyword solve this?

tjalfi · on July 30, 2021

The paper Why Programmer-specified Aliasing is a Bad Idea[0] evaluated the effectiveness of restrict in 2004. They found that adding optimal restrict annotations provided only a minor performance improvement, on average less than 1% across the SPEC2000 benchmarks.

[0] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.94....

dralley · on July 31, 2021

How much of this is because nobody puts effort into these optimizations?

The Rust compiler has repeatedly found critical bugs in LLVM's restrict / noalias support, bugs that would impact C / C++ as well if any real-world C / C++ programs actually used it.

If compilers produce straight-up broken code in these situations, I can only imagine they're not putting a lot of effort into these optimization strategies.

tjalfi · on Aug 1, 2021

> How much of this is because nobody puts effort into these optimizations?

restrict is rare in C and C++ but common in Fortran; array parameters in Fortran aren't allowed to alias. Intel and IBM both have great Fortran compilers so I would expect their C and C++ compilers to have good support for restrict.

user-the-name · on July 30, 2021

I don't think anyone has ever used the restrict keyword, or understood what it does.

jonathrg · on July 30, 2021

What? I use it whenever I have a function that takes two or more pointers if I know they can't refer to overlapping memory. And it's part of the signature for memcpy since C99

IshKebab · on July 30, 2021

When he said "anyone" he meant "almost anyone". You're an outlier if you use `restrict` regularly.

I checked on grep.app, there are 10k results for `restrict` for C code, compared to 700k for `struct` (I know they're not directly comparable but that gives an idea.

jonathrg · on July 30, 2021

That seems like about the right proportion to me - struct solves a much more common problem than restrict does. And to be fair, it is a lesser known feature. But user-the-name is implying that restrict is somehow difficult to use or understand, which I don't agree with at all.

icedchai · on July 30, 2021

That probably explains it. In many shops, C may as well have stopped at C89.

I've been working with C since the early 90's. I've never seen any code use restrict.

enjoy-your-stay · on July 30, 2021

Also large chunks of libc use it as well, e.g. the printf family of functions.

moonbug · on July 30, 2021

derp.

pantulis · on July 30, 2021

So Rust comes to mind, right? Anything else?

rwmj · on July 30, 2021

FORTRAN or any language with lots of arrays and matrices.

jabl · on July 30, 2021

I suspect if you want a HW architecture for running array operations you'll end up with something like a vector machine (e.g. ARM SVE(2) ) or a GPU rather than a VLIW CPU?

marcosdumay · on July 30, 2021

VLIW is basically a more flexible kind of vector machine.

jabl · on July 30, 2021

And a traditional scalar architecture is more flexible still. The trick is to pick the correct set of tradeoffs for the targeted applications. I claim that for most array style workloads vector/GPU architectures are flexible enough, and offer better perf/watt and perf/chip area.

Semaphor · on July 30, 2021

So, APL?

rwmj · on July 30, 2021

Absolutely yes that would make sense (or more likely the modern "derivatives" like J & K)

jiggawatts · on July 30, 2021

C is very "pointer heavy", and much code involves chasing linked lists and the like. This tends not to suit VLIW well.

Modern languages like Rust tend to produce more instructions for the same high-level logic, but those instructions are easier to schedule for superscalar CPUs. It typically ends up as a bit of a wash on CISC processors, but could be better than C/C++ on VLIW.

I guess we'll never know now...

secondcoming · on July 30, 2021

C is pointer heavy if you write pointer heavy code

lmm · on July 31, 2021

> What types of languages do you see would enable more efficient VLIW compilers?

I was thinking of languages where dependencies are more explicit and the idea of a global evaluation order isn't there in the first place. I'd be very interested to see a reduceron-style effort that implemented graph-reduction evaluation on a VLIW processor.

> Somehow I suspect the limited uptake of better suited languages to be a sign that they aren't very helpful most of the time, and most of the parallel operations people do is more like serving a lot of individually sequential transactions per second, which is something C and unix is pretty good at.

Heh, that was the idea that those barrel-processor SPARCs were designed around. But they weren't so successful in the market either in the end.

TickleSteve · on July 30, 2021

The TMS320C6678 (C66x architecture) DSPs still use VLIW and work pretty well. Like most DSPs, they're typically programmed using C, for which TI supplies optimised libraries for processor-intensive operations. IIRC, the compiler itself was fairly standard.

sjagoe · on July 30, 2021

I had a netburst P4 for a while.

MATLAB simulations were comically faster on my lower clocked Pentium M laptop.

veltas · on July 30, 2021

I malign Netburst because I owned one (actually still own it) and it was slower than the previous generation of processors (under certain loads) despite costing more.

pwdisswordfish8 · on July 30, 2021

> I still believe we'll see a return to its ideas once the computing world finally moves on from C.

If only for the reason that Itanium was one of the few architectures not affected by Spectre-family attacks.

saati · on July 30, 2021

The later ones are out of order, they are very likely affected, just no one cares enough to prove it.

kevin_thibedeau · on July 30, 2021

Going "faster" by doubling pipeline stages doesn't gain anything but fat bonuses for marketing.

kstrauser · on July 30, 2021

I had to maintain an IA-64 Linux system at a previous job, and it was such an odd duck. The OS did a decent job of abstracting away most of the weirdness, but at the end of the day it was just a very slow server. The compiler breakthroughs that Intel was counting on to make it competitive never happened, and since its unusual architecture made it bad at running code not specifically tuned to it, the end result was that nothing ran great on it. I’m sure that HP had some highly optimized code that ran like greased lightning, but that never worked its way out to the general public.

I admit that I’m glad Itanium finally died. It killed a lot of other interesting architectures and gave nothing in return.

throwawaylinux · on July 30, 2021

For some periods (including during some of the Opteron era) Itanium 2 actually did okay.

https://www.realworldtech.com/forum/?threadid=27345&curposti...

It had very strong floating point performance and on spec int it was holding pace with Opterons, Athlons and P4s clocked a lot higher. Apparently a lot of the int performance was due to large fast caches (I think it had a bigger die size compared with the others) -- have a look at the 1.5MB version vs the 3MB version. But even there it wasn't doing so bad against the Sun and IBM processors which were out of order I think. So yes you're right some things did run pretty fast.

Unfortunately for Itanium, compiler techniques to make up for in-order execution had just about run out of steam at that point, while OOOE continued to scale up and improve steadily. Which is pretty much the opposite of what HP and Intel had predicted in the 90s, which is why they justified Itanium's approach.

h2odragon · on July 30, 2021

> I’m sure that HP had some highly optimized code that ran like greased lightning,

unwarranted optimism. I recall HP Pentium Pro servers with loads of "built in" stuff and dual PCI buses: all the builtins were hung off the 2nd, chained bus; on one IRQ. The memory was severely limited, too, iirc; NUMA'd through one of 4 CPUs.

HP design seems to be all about locking performance away; it might be in that box in theory, but all their ingenuity was spent ensuring it stayed there.

frjalex · on July 30, 2021

> It killed a lot of other interesting architectures and gave nothing in return.

What sort of architectures (if any) got killed as a direct result of developing IA-64?

That aside, though odd and impractical, I found Itanium to be at least a technically interesting perspective (both in terms of architecture, ie VLIW, and in terms of the amount of technical work it inspired at HP and Intel, such as the Itanium C++ abi [1])

[1] https://itanium-cxx-abi.github.io/cxx-abi/abi-eh.html

garaetjjte · on July 30, 2021

>What sort of architectures (if any) got killed as a direct result of developing IA-64?

Alpha?

hulitu · on July 30, 2021

And HP-PA and, later, MIPS (as a mainstream computer). MIPS survives in some routers.

__d · on July 30, 2021

MIPS is very, very close to dead now.

To be fair though, I think its death is more attributable to ARM than Itanium. And RISC-V is killing off anything that remained.

WorldMaker · on July 30, 2021

MIPS the brand was bought for cheap from a bankruptcy, and resurrected by its new owner… to be an ARM partner building ARM chips.

aidenn0 · on July 30, 2021

Arguably it was the nail in the coffin for alpha

thequux · on July 30, 2021

Itanium may have been the nail in the coffin, but by the time the hammer fell on that nail, Alpha was already basically dead due to bad business structure and incentives that were inherited from DEC. They had many of the same problems that Intel has been having recently (low yields, failure to keep up with performance increases) without Intel's long-standing business agreements or inertia to hold up the product line through a difficult period.

saati · on July 30, 2021

Alpha, HP-PA.

shrikant · on July 30, 2021

> In 2003, AMD launched their 64-bit, but Pentium-compatible, Opteron CPU. Everyone stopped buying Intel CPUs for a while. [...] almost everyone immediately embraced AMD's instruction set and no one but HP wanted anything to do with Itanium.

Huh, I never really connected the dots before, but TIL that this is why 64-bit images and package repositories of various Linux distributions were referred to as `amd64`...

pjc50 · on July 30, 2021

Yup. It was a classic mishandled transition; they messed up the emulation https://www.zdnet.com/article/intel-scraps-once-crucial-itan... , and getting the benefits of the transition required a big ecosystem shift. You more-or-less had to use Intel's compilers.

Meanwhile AMD offered "x86, but faster and wider", which (once you booted a 64-bit operating system) could run either type of binary at native speed. Sometimes people really do just want a faster horse.

_ph_ · on July 30, 2021

It also didn't help that the Itanium project was delayed many years. Actually the first one "Merced", was supposed to come out 95 or 96. Had it been on time, things might have looked quite differently. And indeed, I think it was the AMD64 which breathed new life into x86, killing both the Itanium as well as most classical RISC architectures, because it was a nice upgrade path from existing PCs.

krylon · on July 30, 2021

This leads to an interesting situation, where Intel owns x86 and licenses it to AMD, while AMD owns the amd64 extension and licenses it to Intel. Should one company revoke their license, so would the other (I assume, naively), and boom - no more amd64/x86_64 CPUs for you and me. Intel could probably create a new extension to x86, but that would take some time.

And IIRC, AMD's x86 license evaporates automatically, if the company is acquired by someone else, which means the only company that could buy it without disastrous consequences is Intel, who probably would run into some anti-trust issues if they did.

Beldin · on July 30, 2021

What is it that ensures licensing is required? Patents? Because patents on x86 technology predating amd64 should be expired by now... and without patent protection, another company would be allowed to reverse-engineer it, no?

ramchip · on July 30, 2021

https://www.blopeur.com/2020/04/08/Intel-x86-patent-never-en...

Tl;dr: many new patents have been created over the years, for instance SSE extensions, and those haven't expired.

dekhn · on July 30, 2021

it was a great time. I remember buying a rackfull of AMDs at 1Ghz when the opteron first came out. They flew. They also consumed a ton of power and generated a ton of heat (I blew the circuit breaker for the whole floor of Evans Hall, in Berkeley, just turning them on).

SavantIdiot · on July 30, 2021

> This annoyed them and they decided to make a new CPU that no one would want to use.

I thought this was sarcasm, but the tone continues throughout the article. Are people this really painfully misinformed? The ISA was brilliant. The execution, not so much. It was a far more scalable architecture, completely "clean" from x86 baggage (I'm talking about the ISA, not the thermal issues Madison and McKinley had with the floating point units). It even ran Windows and had two compilers (one from HP and one from Intel), and it decoded x86!

Then again, people were also hot on Transmeta at the time, so perhaps the author is confusing things.

> better known for the good designs you killed

Like?

Intel sat on Yamhill (64b "star-T" for x86) for a year, and when I was brought on, the project had issues in both instruction fetch and execution-unit area. It was not a good design, and was not ready because it was cobbled together at the last minute. Quite literally. AMD beat Intel to both 64-bit and 1 GHz, and we all (architects) knew it.

Looking at it one way, Itanium was a product that was ahead of its time. In 1995~97 Google didn't even exist yet, Amazon was just getting started. Server farms with their own power plants were something reserved for investment banks as redundant backup, US eCommerce was well under $1Billion/yr . I'm 100% certain that if Intel had brought the yields up on Itanium and taken an early loss to build market share, it would have been the dominant architecture in integer cloud today.

But in reality they had to beat AMD, because AMD was going after their cash cow (commodity x86) with a vengeance, and winning. Couldn't fight two wars at once, and thus, Itanium withered.

So looking at it another way, one could argue Itanium was too late because the internet scaled out fast in late 1990, and Itanium was caught with its pants down, so x86 filled in the gap.

Either way, Arm is more likely to take away server market from x86 than IA64, going forward, but that's a not a sure thing.

I could (30% chance) see VLIW making a comeback if capacity weren't such a problem, it is far more suited to e-commerce cloud instruction mixes, and with hypervisors and docker-like virtualization so much faster than VM, it could happen.

kstrauser · on July 30, 2021

> I thought this was sarcasm, but the tone continues throughout the article.

It was sarcasm, but the tone continued because I think Itanium was an enormous waste of time and other resources.

> The ISA was brilliant.

I completely disagree. The ISA depended on enormous leaps in compiler technology in order to make it useful. However pretty it may have been, it couldn't be implemented in a performant way. It'd take ugly-but-practical over pretty-but-impossible any day of the week, especially when I get to use languages that abstract the ugliness away from me.

SavantIdiot · on July 30, 2021

I think that statement suffers from hindsight bias: x86 is great now because of enormous leaps in compiler technology to work. Yes, compilers do need to evolve with CPUs, they don't just grow on trees!

Put another way: given a choice between stuffing an electric engine in a Ford Fiesta, and a Tesla, Itanium could have been the Tesla in the hands of another company, but the market went after the Fiesta retrofit because it was here-and-now, not coming-soon.

EDIT: changed from "you" to "that statement"; added analogy.

kstrauser · on July 30, 2021

I don't believe that's historically accurate at all. x86 compiler performance has been good enough for a long, long time. Yes, newer versions are clearly a lot better than ones from the mid 90s, but the performance improvements would be described in terms of percentages, not orders of magnitude.

IA-64 required a complete rethinking of compiler design to solve NP-hard problems on a large scale just to get passable performance. Translating C to ASM that keeps execution units busy is radically different on IA-64 than pretty much anything else.

SavantIdiot · on July 30, 2021

> x86 compiler performance has been good enough for a long, long time.

> NP-hard problems on a large scale just to get passable performance.

First, it is hard to refute a claim, or find meaning in it, when you start using terms like "passable" and "good enough".

Second, I think this overstates the need for optimal scheduling because Itanium SPEC performance was "passable" out of the gate.

chasil · on July 30, 2021

I know that I don't have sufficient engineering background to debate you, but still I see ARM's Johnny-come-lately 64-bit implementation that has gone farther up and down than would ever have been possible for Itanium (Fujitsu with the top supercomputer, and any 64-bit phone illustrate the point).

So I am drawn to Linus Torvald's commentary on IA64:

"IA64 made all the mistakes anybody else did, and threw out all the good parts of the x86 because people thought those parts were ugly. They aren't ugly, they're the "charming oddity" that makes it do well. Look at them the right way and you realize that a lot of the grottyness is exactly _why_ the x86 works so well (yeah, and the fact that they are everywhere ;)."

https://yarchive.net/comp/linux/x86.html

SavantIdiot · on July 30, 2021

I wish I knew which parts Linus was talking about. Segmented addressing? Can't be that. Assymetric registers? Probably not that. Vitual 86 mode? Meh. Variable instruction length? God I hope not. Begs the question: what oddities?

I think this is the most astute statement he makes: "yeah, and the fact that they are everywhere."

> ARM's Johnny-come-lately 64-bit implementation that has gone farther up and down

What do you mean? Arm is primarily embedded IP, where 64b only recently has come en vogue (unless you mean A5/A7?, I'm talking Cortex-M). Arm competitors are really Renesas, Synopsys, TI, and a number of Chinese chips, and 64b isn't a prioirity in most of that space. Not defending anyone, just not sure what you mean.

lizknope · on July 30, 2021

Did you read the post you are replying to?

Linus says to ignore the design mistakes like segmentation.

Then he says:

"the baroque instruction encoding on the x86 is actually a _good_ thing: it's a rather dense encoding, which means that you win on icache."

FullyFunctional · on July 30, 2021

He did say that and he's wrong. x86 isn't particularly dense and RISC-V compressed is denser. However the insane encoding is burning real power, takes up real area, and is a real limiter for decoding. Dear Linus has an irrational love for x86.

lizknope · on July 30, 2021

He wrote those comments in 2003. The instruction decode logic takes up less space with each new chip and process shrink. These days I don't think it really matters much.

FullyFunctional · on July 31, 2021

It does. Source: Intel engineering, privately off the record.

How wide is the widest Intel CPU? How wide is M1? Case closed.

SavantIdiot · on July 30, 2021

Oh, I don't think the link was originally there, or I missed it. Yes, of course: that claim is like the first shot across the bow in every CISC vs. RISC debate.

chasil · on July 30, 2021

Could Itanium have done this?

https://www.arm.com/blogs/blueprint/fujitsu-a64fx-arm

SavantIdiot · on July 30, 2021

Probably not: this is a boutique machine defined with an esoteric architecture, not a general-purpose integer computer for the cloud, which I was what I was talking about in my grand-parent-comment.

chasil · on July 30, 2021

Perhaps the key point is the era of design.

The P4 was built with an enormous pipeline (~20 stage?) in the hopes of reaching a 10ghz goal that was never going to happen.

The Itanium was designed a decade before, programmed for parallelism that was also never going to happen.

The M1 is not an esoteric machine, and if Apple had bundled an Itanium of equal computational ability, all of those NOPs would burn a hole through the casing.

ARM's 64-bit implementation is obviously something special, esoteric or not it is more powerful than anything produced by Intel in several measures.

SavantIdiot · on July 30, 2021

Did HP really start that VLIW design _TEN_ years before P4, because that would be 1985 and I find that unlikely? At Intel IA64 and P4 were developed at the about same time, with P4 taping-out a few years later; the former in Santa Clara, the latter in Oregon.

> The M1 is not an esoteric machine, and if Apple had bundled an Itanium of equal computational ability, all of those NOPs would burn a hole through the casing.

You're right, its not. And I never said it was. But a 6D mesh torus sure is.

Symmetry · on July 30, 2021

Linus is a big fan of rep movs.

dragontamer · on July 30, 2021

Given that we have magic compilers that convert mostly sequential CUDA into highly parallel SIMD assembly code, I don't think that the Itanium concept was too far off.

But all of the details had gross amounts of complexity. More modern VLIW architectures are more obvious about their parallelism... Without any of the weird 'bundles' that made the Itanium terribly complex.

I still think that 'simple decoder' has hope, and I point to NVidia Volta and Turing as my case in point. At the assembly level, the NVidia SASS compilers generate the read write dependencies. NVidia has proven that the compiler can indeed specify the dependencies at compile time, though they require a huge amount of software bulk to be able to do so (PTX pseudo assembly that compiles into SASS later)

The issue is that the Itanium was poorly designed and impractical. But the overall concepts seem to still be possible (indeed, more possible given today's technology). I'd be interested in a modern Itanium, but instead split as follows: where the compiler determines the dependencies but the core figures out the precise scheduling.

pjmlp · on July 30, 2021

> In 2003, AMD launched their 64-bit, but Pentium-compatible, Opteron CPU. Everyone stopped buying Intel CPUs for a while. Within a few years Intel made their own 64-bit, but AMD-compatible, CPUs to avoid entirely losing the desktop and small server market. They were right earlier: almost everyone immediately embraced AMD's instruction set and no one but HP wanted anything to do with Itanium.

This is the only reason why Itanium failed.

Had AMD not been allowed to produce Intel-compatible CPUs, everyone would eventually be forced to eat Itanium regardless how they tasted.

One doesn't get to be picky when food supply dries out.

Nursie · on July 30, 2021

History may have unrolled quite differently at that point, if the only choice was to shift architecture, then we may have seen a more diverse server CPU estate being maintained. At that point Sun were still relevant, POWER systems were still being bought and deployed in larger proportions etc.

pjmlp · on July 30, 2021

None of them would adopt Windows laptops running on POWER, just as one possible scenario.

Nursie · on July 30, 2021

True, but Apple were running Macbooks on power back then I think. I know windows was even more dominant then than it is now so perhaps large numbers of powerbooks was never on the cards.

But in the server space the story might have been different.

I take it intel were planning "Mobile Itanium" at some point?

pjmlp · on July 31, 2021

On those days Apple market share was still of a company struggling to get out of insolvency, and they never had any significant market share outside North America during the last century.

Yes, on the server it would have been differently, however Linux distributions might have still killed the other UNIX vendors with their CPUs alongside, as it happened.

whereistimbo · on July 30, 2021

Not to mention AMD was in the business when IBM mandated second/backup supply source to Intel, and that's how AMD began producing x86 chips like Intel.

cesarb · on July 30, 2021

Everybody here seems to be focusing on the negatives, but there's also a few positive legacies from the Itanium architecture, all of them on the software side: EFI which became UEFI (like it or not, it's still better than the legacy BIOS), the GPT partition table format, and the C++ ABI now used for all architectures on Linux.

fanf2 · on July 30, 2021

Sadly EFI killed OpenFirmware which could have given us open source boot code and portable drivers.

not2b · on July 30, 2021

One very good thing came out of the Itanium project: the vendor-independent ABI for C++. IBM, HP, Sun, Intel, and GCC all signed on to rigorously specify the layout of C++ objects and the function call interface on the Itanium platform (GCC got funding to help with this) and the processor-independent parts of the standard were then used pretty much everywhere else (with adaptations for different instruction sets to handle the call interface).

DrBazza · on July 30, 2021

> In 1994, Intel and HP looked around and saw a wide variety of successful server CPU architectures like Alpha, MIPS, SPARC, and POWER. This annoyed them and they decided to make a new CPU that no one would want to use.

I spent some time "optimizing" code to run on Itanium for "MegaBank". Who then dumped it. Then we swapped to AMD Opteron for while before Intel caught back up.

I suppose it was successful in that it created work.

throwawaylinux · on July 30, 2021

I'm pretty surprised to hear "MegaBank" ran AMD. Not that Opteron wasn't awesome and better than what Intel had at most things, at least until Woodcrest if not Nehalem. But because MegaBanks are usually pretty conservative. More than a few still run mainframes and proprietary unixes.

Unless you worked in their HFT group, that is.

DrBazza · on July 30, 2021

We wrote quant code that had to run on all architectures... for the HFT team [1], unfortunately, including a long drawn out debacle supporting Solaris 5.7 for an absurdly long time. Other members of my team were looking at IBM architectures as well, mostly because IBM were throwing money at us to use their kit.

[1] high frequency, not necessarily low latency, for some defintion of the word 'high'.

FullyFunctional · on July 30, 2021

I didn't see anyone mention this, but this is completely wrong:

"This annoyed them and they decided to make a new CPU that no one would want to use."

Facts: 1. Itanium grew out of internal HP efforts to define their next generation architecture, presumably in response to their difficulty scaling the Precision Architecture.

2. Intel only got involved part way through.

3. Nobody ever "decides to make a CPU that no one will use".

Itanium for all its issues, was used in great number, but nowhere near what it was supposed to. The failure of Itanium was a bet of handling cache misses with the compiler (like Transmeta, my alma mater) and having a design by committee (lots of nice ideas in isolation, a disaster in aggregate).

There are so many technical problems with the ISA, too many features, too much performance draining complexity. Something as critical as registers is burried behind (IIRC) three levels of indirection (rotating windows, modulo scheduled registers, etc). The big bet on predication didn't help code density nor cycle time.

It's really a shame that Intel throughout history has botched ISA design after ISA design. That it can be done right is exemplified by Alpha, RISC-V, and ARM64. If the rumored acquisition of SiFive happens, then maybe they can finally get it.

EDIT: Typos

_ph_ · on July 30, 2021

The grand plan of VLIW might not have worked out anyway, but I always wondered what would have happened, if Intel had moved the Itanic to current process nodes, even without huge modifications. By the transistor count, it could probably be power-efficient enough to drive mobile phones.

Also, on the time of its peak, the Itanium did perform quite well, I wonder what it could deliver, if it would be grown to use all of modern production processes.

Maybe VLIW cannot be made to work, but I do think that Intel missed the boat by not providing cheap development systems to enthusiasts. Like a PC-compatible motherboard with an Itanium for like 500$. The Linux crowd would have picked it up gladly. Instead, it was only available as espensive hardware, which reduced the potential developer audience (both from the availability of the development resource itself as in the target market).

The post mentions, that everyone picks up the new Apple Silicon processors, because they are nice. They certainly are, but the big point is, that a compatible ARM system can be picked up for as little as $35, the Raspberry Pi and its friends. And while Apple hardware isn't low-cost, it is increadibly cheap in comparison to the Itanium systems of that time.

lmilcin · on July 30, 2021

Intel lost a lot by playing games rather than letting engineers build the best stuff they could and cash out checks on it.

Now they cry help to the same engineers they have turned away from in the past.

phkahler · on July 30, 2021

>> Intel lost a lot by playing games rather than letting engineers build the best stuff they could and cash out checks on it.

Which is strange since IIRC Intel engineers designed the PCI bus to replace ISA in spite of management saying that's for the PC makes to do. I would have thought they'd learn from that success, but apparently not.

lmilcin · on July 30, 2021

I have worked for Intel sometime later, when they stopped being so elitist and actually started overdoing it in the other direction -- building everything else and the kitchen sink rather than focusing on their core competency.

allenrb · on July 30, 2021

I’ve seen it mentioned over the years that writing the necessary compilers was more than merely difficult. As I understood it, the dynamic nature of load latencies (hit L1 cache? L2? L3? Slow trip to DRAM?) means that some workloads just can’t be statically scheduled in an effective manner. Anything to that?

Symmetry · on July 30, 2021

Essentially yes. In cases where memory latency is predictable like DSP workloads then VLIW processors actually tend to work really well. Itanium had some facilities to allow speculative loads but they had a lot of overhead in practice.

wil421 · on July 30, 2021

Don’t forget HP and Oracle have been in a lawsuit for a decade about Oracle dropping support for Itanium. Oracle is on the hook for 3 billion.

https://www.reuters.com/legal/transactional/oracle-loses-bid...

pabs3 · on July 30, 2021

I note that Debian still has an unofficial ia64 port:

https://www.debian.org/ports/ia64/ https://wiki.debian.org/Ports/ia64

macintux · on July 30, 2021

Minor historical footnote: HP paid Progeny (Ian Murdock's company he founded years after creating Debian) to help port Debian packages to Itanium. They gave us a first-generation computer, which was epically slow of course.

That contract helped keep Progeny alive after the venture capital funding dried up; the company's original plan was to allow customers to reboot their Windows desktop computers overnight to run as a distributed Linux supercomputer.

tobiasu · on July 30, 2021

Gentoo also has instructions. Linux/ia64 works fine for the most part, it's not an architecture that causes much drama for userspace applications.

Servers are cheap on ebay and the older Intel boards (also sold by Dell, Fujitsu,..) can be upgraded to newer CPUs that are either less power hungry or faster with more cores and sort-of HT. HP are generally not upgrade-able and SGI/ia64 is a special case with lots of other custom hardware as usual.

Annoyingly many Linux / gcc developers want to remove ia64 support from their source trees because the architecture is no longer commercially relevant.

As a necrocomputing enthusiast it's quite sad, but not much one can do about.

If only this old junk was as popular as the various homecomputers...

fredoralive · on July 30, 2021

Note at a kernel level it's now marked as orphaned. It also seems to just get random regressions, because nobody actually tests it.

https://www.phoronix.com/scan.php?page=news_item&px=Linux-Or...

cbmuser · on July 31, 2021

That's because not everyone is always running bleeding-edge kernels.

We have found and fixed multiple regressions in the kernel on ia64. As of Linux 5.12, the kernel runs fine on most ia64, there is still one known regression that affects some machines.

Disclaimer: I'm Debian's maintainer for ia64 (and most other older architectures such as m68k).

jwildeboer · on July 30, 2021

Kind of curious why the article didn't mention HPs PARISC architecture. IIRC, HP switched away from PARISC to Itanium at the time. I guess because they saw that managing your own architecture comes at a huge cost? I very well remember how Itanium was heavily promoted by HP to existing PARISC customers as the natural upgrade path and how they played down x86 as unreliable PC derived stuff that no-one should put in a datacenter ;)