Hacker News new | past | comments | ask | show | jobs | submit login
The Itanic Has Sunk (honeypot.net)
203 points by kstrauser on July 30, 2021 | hide | past | favorite | 244 comments



I would like the suggest the Itanium was a huge business success. At the start of the 90s, there were many decent workstation CPUs: SPARC, MIPS, PowerPC, Alpha, and PA-RISC. Intel CPUs were not considered in the same class. Big engineering work was done on workstations powered by these chips. But as the decade went on, the chips became vastly more complex and expensive to design. Near the middle of the decade, it was becoming clear what the trajectory was and companies started looking around trying to figure out how to navigate this. On top of this problem, Intel/HP teamed up to design a processor that was quite beyond (in terms of complexity) anything being designed at the time and it became clear competing was just not economically feasible. So basically the industry folded. In one move, Intel took out entire industry of companies that could compete. PowerPC is still there, but is marginal. SPARC, PA-RISC, MIPS, and Alpha design teams are gone.

Interesting technical tidbit - I remember an Intel engineer giving a talk at Stanford about the architecture and Hennessy came to listen. At the end of the talk he asked why they thought moving so much complexity to the compiler would work when other (including John) found it to be problematic? They didn't have an answer and I could see the look on John's face and at that point realized they might not understand what they are getting into.


Back in the day of 2001, I was working on the hugely cross-platform Zeus Web Server. It served the whole selection of UNIX platforms: https://web.archive.org/web/20011208194616/http://www.zeus.c...

We certainly had an Itanium build as well, although it doesn't appear on that list, which suggests that it never sold. But it's notable how many of those platforms are effectively dead and only the open source ones on x86 are left standing - and OSX, which deliberately has no server platform.

That era was when people were just starting to realise how much more cost-effective OSS+X86 was. That is what ate the server and workstation markets, especially once people started to horizontally scale and "the cloud" appeared.

The only orgs left with the capacity to choose a new platform and make it cost-effective for the world are basically FAANG+M. And which one of them bought a fabless semi company full of RISC designers from all those dead instruction sets? Apple.


The article makes it sound like Itanium was some genius move on Intel's part to kill off the old workstation market, but IMHO that's exactly the opposite. This was Intel spending god knows how much money to jump into the workstation market right as it was suffering its final death throes from fatal wounding it took from the PC clone market.

Sure those workstations were powerful and had impeccable engineering, but they also cost at least 5 times as much as a similarly powerful PC. Sometimes more a full order of magnitude more. Plus they were full of obnoxious licensing requirements, like making you purchase a compiler for thousands of dollars per year or requiring you to buy an obscenely overpriced support contract to get OS updates. They just couldn't complete with some cheap clone running Linux, especially as Intel poured way more money into R&D each year and overtook the workstation chips. By the time the Core architecture, and especially Conroe, were released those old workstations had nothing going for them but inertia.


I was at that Stanford lecture! Assuming it was the one that I attended, the Intel engineer was John Crawford, the lead architect of Merced -- and John Hennessy was absolutely annihilating him. I described attending that lecture in a Reddit AMA years ago [0]:

In fact, allow me to tell a story: as a young engineer, I attended a talk at Stanford, by John Crawford, architect of the Itanium (then known as Merced). He was presenting a bunch of benchmarks that seemed... dubious. And while Merced kicked butt on Eight Queens, it was terrible on gcc (and everything else that looked vaguely real-world). Even as brash as I might have been (ahem), I wasn't about to say anything, but then some guy in the back of the room just got on Crawford on this. Crawford tried to shake him off, but the guy was just a badger, at one point saying "Doesn't this just willfully ignore everything we've learned about commercial workloads over the last decade?" At this point, I turned around -- who the hell was this guy?! -- and saw that it was none other than John Hennessy himself. A Silicon Valley gangland slaying if ever there was one -- and told me everything I (or anyone) needed to know about Itanium's fate.

The room was packed (and open to non-Stanford students, which is how I got there); glad to know I wasn't the only one on whom this made an impact!

[0] https://i.reddit.com/r/IAmA/comments/31ny87/i_am_the_cto_of_...


Did you tell this story on On the Metal? It sounds familiar, and I'm pretty sure I haven't looked at that AMA.

(I hope the podcast comes back, by the way!)


Surely -- I have retold it many times over the years. And we're working on On the Metal! A combination of pandemic/resurgence + us being (very) heads down has delayed us, but stay tuned!


That sounds like the lecture I was at. Thanks for sharing that, fun times. There were some really great lectures back then. I remember when Exponential gave their talk right after Apple moved away from the PowerPC architecture. Everyone knew their processor was a dud in every commercial way (super high-power BiCMOS PowerPC chip), but almost felt sorry for the guy and were amazingly polite.


Sort of the New Coke theory of CPUs. I like it.

--

For anyone who wasn't there: In the 80s, Coca-Cola had lost its crown. Pepsi was the most popular soft drink, at least in the USA. Coca-Cola responded by changing their recipe, to great fanfare.

It's often held up as one of American business's greatest failures. People hated New Coke, and later that year they brought back the old formula, under the name Coca-Cola Classic.

I think that that summary of the story misses an important point, though. After the dust had settled. Coke had regained its crown, and has held it ever since. So, while New Coke itself failed on the market, the event's long-term influence on the company's financial outcomes was overwhelmingly positive. To the extent that some people have claimed that the whole thing was just an exceptionally clever PR stunt.


New Coke is a great example of design biases. Pepsi in the 80s was beating Coke year after year in 80s double blind taste tests of 2oz or so sample pours. Pepsi was hugely proud of this and it was a big part of their advertising.

Coke wanted to beat those tests, so they designed a soda that people loved much more than Pepsi under those same double blind studies. In an 2oz or so sample people adored New Coke over Pepsi.

In a 2oz pour (shot) the sweetness is a big differentiator, the one that stands out is pretty much always going to be the sweetest. (Like the Loudness Wars in music, there's a dumb human psychological bias in small samples for sweeter in taste or louder in volume as "better", despite both being awful for your health in quantity.) What Pepsi and Coke both seemed to forget in the 80s that focusing on these 2oz "micro-benchmark" test studies was that most people don't drink just 2oz at a time, the unit is generally closer to the 12oz can or more. When drinking more than say 2oz overly-sweet becomes a problem in how it lingers and you want a more balanced palette, which Coke Classic always delivered better than Pepsi.

The failure of New Coke will always be a great illustrated story of make sure you are designing for the right benchmarks (and sometimes micro-benchmarks especially are a trap).

I believe that deeper extension of this metaphor likely also applies to Itanium, they designed it for 2oz pour benchmarks where it sometimes got fantastic numbers, but most compilers got awful numbers in realistic real world workflows.


Hey, thanks for a new take on an old story. That makes much more sense than any other version I have heard over the decades.


FWIW, both New Coke and Coke "Classic" (aka new coke 1.1) both use high fructose corn syrup, which is substantially cheaper in the USA in industrial quantities due to huge tax-funded agricultural subsidies for corn (maize) farmers. These remain in place today (which is why everything in the US is sweetened with corn syrup instead of sugar).

"Old coke", as well as what is now known as "mexican coke" in the USA, used/use sugar.

New coke was the flag day for switching to a cheaper sweetener in the US market. When they switched "back" to coke "classic", they kept the new sweetener.


This article claims that Coca Cola switched to high fructose corn syrup in 1984 which was a year before the debut of New Coke

https://www.motherjones.com/food/2019/07/the-secret-history-...


When I first moved to USA from Denmark, back in the late 90s, the chemical, sickly syrupy taste of Coca Cola made me gag and I was very surprised in terms of how different it was from the European version. Then I found out that in Europe, they use sugar rather than corn syrup. It's amazing what the Americans will put up with.


Two things here.

First, people have done triangle tests on sugar- vs. HFCS- formulations of Coke (Serious Eats has a good one) and, while there is definitely a perceptible difference, the preferences aren't what you'd expect. People who go into these tests saying they prefer cane sugar Coke tend actually to reveal an HFCS preference in the test.

Second, formulations in different parts of the world vary in ways other than which sweetener they use. It's possible that regardless of the sugar involved, there might just be more of it in the formulation (or in the way it's delivered, since concentration systems also vary) in your least-preferred Coke instances.


Thomas' knowledge-dropping in a sibling comment is more in depth, but anecdotally: I prefer sugar coke after drinking HFCS coke in the USA for a few months, then prefer HFCS coke after drinking sugar coke in europe for a few months.

Non-lite/diet/zero coke is always sickly syrupy oversweet regardless of whether it's sugar or corn syrup. It's sort of what people are going for. I can't believe I used to drink liters of it a day growing up; I limit myself to a dozen servings a year or so (of the full sweet stuff) these days.


Might be a novelty effect there. We all tend to like new foods if we eat one food too much.


Well is the same with Carlsberg, they taste "different" around the world.


True, but the other side of the coin is that the massive economics of scale in chip manufacturing as well as around ISA ecosystems (software) meant that some massive consolidation of the industry was all but inevitable.

Had Intel not embarked on the Itanium project, we might still have ended up with something close to the world we have now, with x86-64 and ARM being the top dogs, and some "traditional RISC workstation" architecture surviving in a small high-end niche (POWER).


No question there are a lot of ways to win, probably many better than what transpired. But in the end, Itanium played a big role in Intel consolidating the CPU market and taking complete control for a significant period of time.


The competing unix manufacturers didn't need Itanium as an excuse to fold; they were completely non-competitive by the time the itanium came around. Customers didn't purchase a unix system because of the performance of the proprietary architecture but the entire ecosystem provided by manufacturers. DEC and its Alpha was on life support even though it was marginally more competitive against x86 precisely for this reason.

The manufacturers that intended to switch to Itanium were either part of HP (compaq & dec) and would have thrown in the towel otherwise or were desperate to remain relevant (SGI). They could never had tried to compete itanium or no itanium. Itanium actually made no impact on the strongest competitors, SPARC and POWER.

Not only did Intel lose billions on Itanium but it tied up development resources that couldn't be used against AMD and influenced the decision not to compete in the mobile space.


Your causality doesn't seem to line up. As you tell it, the workstation CPUs were already peaking in terms of complexity and cost, when Intel and HP announce a huge investment in the next generation of this class of CPU.

This doesn't signal that the segment is economically unviable, and if you're an existing player with an existing strong design, it signals an opportunity to invest in your own next generation and meet Intel/HP head-on with a superior brand (for this segment) and a more plausible next-gen design.

Intel and HP show up promising a clean sheet design in a decade doesn't seem like a time to roll up your market leading offering and go home.


Nothing about this history lines up with facts. Intel killed the RISC workstation CPUs with the Pentium Pro, not with the Itanium. PRO/Engineer and SolidWorks had been ported to Windows NT at the very beginning of that operating system's availability, and the performance on x86 was already better than RISC workstations, at a tiny fraction of the cost, by late 1995.


The Pentium Pro and eventually the AMD64 chips equalled or bested the workstations on many tasks by the end of the 90's and early 2000s. But the Itanium processor distracted them from even competing. SGI for example had at least one high-end MIPS processor under development (iirc "the beast") and they stopped the project and later were trying to ship Itanium servers. HP threw the towel in and joined forces with Intel.

Here is an article from 1999 on Sun booting SunOS on Itanium:

https://www.zdnet.com/article/sun-boots-solaris-on-itanium-h...

Compaq stopped making Alpha servers and switched to Itanium:

https://en.wikipedia.org/wiki/DEC_Alpha

Itanium was a huge misdirection and distraction for everyone in the workstation business and basically put them in such a bad spot they failed to even compete with Intel x86 past 2000 or so in any real way.


The Itanium wasn't just designed as a workstation CPU it was designed to be superior to existing high-end systems and then over time with volume be the eventual x86 replacement. The business case for itanium didn't work as a workstation cpu.

>>Intel and HP show up promising a clean sheet design in a decade doesn't seem like a time to roll up your market leading offering and go home.

They didn't roll up so much as have their existing market disappear. SGI, DEC and Compaq were dead. Sparc and Power gradually transitioned away from any workstation offering to servers.


Itanium was disappointing from the day one. Itanium never matched the performance of RISC or CISC competitors (POWER, SPARC, x86).

HP replaced Alpha with Itanium but that was a costly mistake.


Alpha was DEC, HP had PA-RISC. I had a PA-RISC in my office and had to have facilities come and adjust air conditioning airflow because the computer put off so much heat.


Not all of them did. I have a 712 PA-RISC workstation and its CPU is passively cooled with a spring loaded tiny cooling block. They didn't even bother to use paste.


I had a PA-RISC workstation too. No heat problems, but it wasn't particularly fast. But the "superdomes" (Refigerator sized computer in the very cold and loud server room....).

It was weird, I worked on those superdome machines for years without seeing them. and only got to see them when they had to debug some of my serial port code (it turned out the cable was wired wrong...)

HPUX had some really interesting Real Time Extensions. I was on a exploratory group for switching to Linux and scheduling control was going to be a problem (15 years ago...)


Yeah we had a superdome at work too. One of the earlier beige PA-RISC ones (the itaniums were black IIRC).

But those were in a class of their own... :) If you scale things up that big they're going to belch heat no matter the architecture.

I don't think PA-RISC was inherently worse at performance to Watt than other competing processors at the time.



They may have meant that HP acquired Compaq, which had acquired DEC. Itanium was a joint project between HP and Intel, IIRC.


HP had PA-RISC and Alpha.

HP bought Compaq 2001, Compaq had bought DEC earlier.


I follow your line of reasoning, but I think AMD had a much bigger role in killing off the various RISC architectures.

Itanium felt much more like a sideline activity for those that didn't believe x86_64/Linux was going to be as big as it was. They viewed it as a sort of "toy infrastructure" until it was too late.


> They didn't have an answer and I could see the look on John's face and at that point realized they might not understand what they are getting into.

Ayup. The DEC guys cheered when Intel announced that Itanium would be VLIW. They knew that Intel was about to sink a huge amount of engineering effort into a dead end that would never work.

Of course, nobody realized that all the business management chains were complete cowardly dipshits and that Intel spending a Gigabuck was sufficient to terrify them into completely abandoning the high-end market to Intel who milked it for years.

So ... Itanic was a horrible engineering failure but a great business success.


Itanium was a huge business success for anything AMD64. I think IA64 was too early and targeted the wrong market too be a success of its own. Also being a single source ISA is very unattractive. If the ISA had been open like RISC-V, and the target market was scale out servers and small computers rather than the nonexistent scale up HPC market and almost nonexistent scale up database market, things could have been very different. On the positive side maybe some open source ISA can emerge using EPIC type architecture now that the proprietary version is dead.


Though in hindsight a 64-bit version of x86 would’ve killed them all anyways. Had Intel invested money into that instead, they’d still win easily.


Might be a case of sunk cost fallacy. The article mentions that while vendors were asking for 64 bit, they didn't want to cannibalize Itanium, but were forced to anyway when AMD came out with their 64 bit pentium compatible CPU.


Little anecdote:

The Itanium led to one of a more remarkable episodes in my career. Around 2007 we were running a heavy workload on MS SQL Server in a fast growing business.

We faced a lot of outages due to DB overload. Instead of trying to investigate and understand the issue better and optimize the software, some external consultants were brought in and recommended to upgrade the hardware to an Itanium based monster. It was a massive piece of hardware with a price tag close to 7 figures.

The thing went live and performance decreased and issues increased. After a couple of weeks of trying to run on the Itanium we switched back to the old setup and then focused on software improvements.

Long story short - after about 8 weeks of dedicated troubleshooting and improvements the whole app became stable, was capable to double workload on the same hardware without outages for the next 12 months.

The Itanium took up a lot of space in the server room before it was dismantled and used as a paperweight. A lawsuit involving multiple parties (supplier,consultant,business) eventually got settled out of court in 2015 (?) long after I left there.

Farewell Itanium :)


It’s funny, I remember many such projects back when my title was “systems engineer”. I swear the appeal was always that company could spend $$ and get a working solution. It after all makes sense that a growing business would need more hardware. But the problem usually was a few terrible queries.

I swear the big benefit of cloud deployments is the teams ability to say “we tried throwing $$ at the problem, if we don’t want to spend $$$ we’ll need to do some work”. And have this convo play out over a day rather than months.


> I swear the big benefit of cloud deployments is the teams ability to say “we tried throwing $$ at the problem, if we don’t want to spend $$$ we’ll need to do some work”. And have this convo play out over a day rather than months.

My current job is on the tail end of hypergrowth and we are just starting to get our arms around the years of hacks and inefficiencies that made it possible to succeed. We've had a dozen conversations where we've decided

* to throw $10^2/day at a problem for two weeks so the engineers are free to deliver the features required to land $10^5/year in ARR, then work on perf

* to analyze the system, identify the one or two features that cost the most, and tackle those while leaving the rest alone

* and yes, to translate inefficiencies to real dollars and use that to force prioritization (we do that a lot ;-))

A team that understands cloud computing and can do some cost forecasting makes some amazing things possible.


> some external consultants were brought in

Is there a single example of that working anywhere ever?

I have never seen one.


I am that "external consultant".

I have recommended buying a huge piece of tin to run SQL Server on as a valid, cost-effective solution to a performance problem. Currently, EPYC CPUs are great value for money, programmers are expensive, and some workloads are too time-consuming to tune.

The customer implemented the change, and it worked.

I have also recommend a reduction in size of a too big SQL Server coupled with some judicious optimisation to reduce the load dramatically. Even expensive programmers can spend a few days of their precious time fixing glaring query issues.

The customer implemented this change also, and it also delivered the promised benefits.

I have the before-and-after metrics to prove that there was a huge benefit in both cases.

In both cases the issues were ongoing, had caused drastic outages, and the internal staff were not capable of resolving the issues on their own.

To be honest, 99% of my job is just to be the outsider that's not playing politics and not stuck in a narrow job description. I'm told to "fix it", so that's what I do. The internal staff have "roles and responsibilities", and they fight with other teams more than they cooperate. Some people actively hate each other. I come in as the neutral party and for a brief shining moment I can get everybody to row in the same direction.


This: To be honest, 99% of my job is just to be the outsider that's not playing politics and not stuck in a narrow job description. I'm told to "fix it", so that's what I do. The internal staff have "roles and responsibilities", and they fight with other teams more than they cooperate. Some people actively hate each other. I come in as the neutral party and for a brief shining moment I can get everybody to row in the same direction.

That is the single reason driving external consultant hire in many enterprises.


My pops did a consulting gig in the 90s for Rockwell, and he described it as being the sheriff dressed in black.

Always thought that was funny, and probably pretty accurate.


There was that time a bunch of Silicon Valley engineers went and fixed the healthcare.gov (Obamacare) site for the US govt after the original contractors did a terrible job. But then the first team were already contractors, so maybe the lesson there is, if your first external consultants don't work, just keep bringing in more?


the engineers who "fixed" the site all worked themselves to near-death "fixing" the site. Not sure that's a real win.


I didn’t know much about this from the contractors’ perspective. This Atlantic article [1] seems like a decent overview for anyone else interested.

[1] https://web.archive.org/web/20210705021438if_/https://www.th...


Loads, but successful consulting engagements don't lead to headlines. Source: 25 years as a consultant with many successful customer projects under my belt and zero that landed my name in the press.


Totally agree with this. External consultants are easily blamed. But the problem usually lies elsewhere - someone makes a decision without understanding the full scope of the challenge.


Yes, it works when your best people get tired of middle management interference and quit to form a consultancy that upper management eventually hire in desperation.


Some of the best anecdotes on HN are of the genre "company brought me back as a consultant for a multiple of what I made as an employee when they realized their mistake."


Selection bias. The bad cases are the ones you hear about --- lawsuits and politicking (office or government) resulting in airing grievances to the press.


Yes, this. I've never seen a "crisis averted by timely application of consultants!" post on Medium, but I'm sure many such tales exist, as I'm one of those consultants.


Fun fact, Intel did this exact thing once before, in 1981, with the iAPX 432: https://en.wikipedia.org/wiki/Intel_iAPX_432

They had made the 8008 and 8080, but those were awkward and ungainly chips meant to power calculators, of all things. iAPX 432 was a clean sheet 32 bit design designed for high level languages and real computers, kind of like mini Lisp machines, complete with native machine support for garbage collection. But it was taking a while to get out the door, and performance wasn't so great, so they hacked together a quick upgrade to the 8080... the 8088, which was used in the original IBM PC.

History takes over from there.


> They had made the 8008 and 8080, but those were awkward and ungainly chips meant to power calculators, of all things.

The 8008 was the (slower) single-chip implementation of the Datapoint 2200 terminal's processor. The DP 2200 was a serial terminal capable of running code on its own as a standalone machine, conceived in 1969 (originally as a drop-in replacement for the IBM 129 key punch), announced in 1970, and eventually introduced in 1971. Datapoint had commissioned the chip, but rejected the result, as it was slower than the discrete logic design, with rights for the chip-implementation remaining at Intel. This remarkable terminal has been with us ever since.

[0] https://en.wikipedia.org/wiki/Datapoint_2200


There were two competing schools of thought in the late '70s and early '80s.

One said that computer hardware should have hardware support for high level languages. That school led to the VAX, iAPX432, and 80X86. The other school said that computer hardware should be simple so that it could go faster, and that the compiler should be smart enough to map high level languages to simple hardware.

Intel was firmly in the first camp. It flirted with the second repeatedly: see the late '80s i860/i960. Itanium was the biggest bet placed: it was a RISC-like architecture with the added complexity of multiple dispatch encoded in it's Very Long Instruction Word architecture. The compiler could rearrange instruction order to keep hardware busy, AND (to some extent) instruction contents.

Itanium seemed like a reasonable bet, and Intel (and HP) had the clout to convince others. It was a factor in the decline of the Alpha and MIPS server market, and even Sun hedged its bets by porting Solaris.


>> That school led to the VAX, iAPX432, and 80X86.

The MC68000 ISA was meant to run C code.


I love the function call timing story:

> The iAPX 432 failure could best be summed up by a meeting of Intel marketroids and Tandem engineers who wanted to use the chip in their next generation machine, after the slides, the senior engineer asked: “How long does it take to execute a procedure call?”

> The presenter looked it up. “Two hundred and fifty microseconds.”

> Tom immediately walked out, followed by the majority of the Tandem software department. The presenter was poleaxed. “What did I say?”

Found in comments to https://www.youtube.com/watch?v=FvmTSpJU-Xc , but I saw it somewhere else, too, I just don't recall where. Doesn't anyone know?


Interestingly, it appears based on https://en.wikipedia.org/wiki/Intel_iAPX_432#The_project's_f... the procedure call instruction was a heavyweight thing designed to maximise features and it could branch much more quickly - but their compilers didn't handle that out of the box.


And they did it again with the i960 and i860. I'm told they were quite nice processors in their day (never having been close to one myself, that was before my time), but for whatever reasons, they did not go mainstream.


The i960 flopped in part because it was tied [0] to the I2O ('Intelligent I/O') project. (The 2 was always rendered as a subscript). I2O pushed a split-driver model in which the OS driver ('top half') did not talk directly to its hardware ('bottom half'), but queued I2O messages to the bottom half, which sat behind the i960 which did the proxy work.

IHVs had little interest in I2O as it would reduce Intel's costs for swapping out vendors and there was no demonstrable performance improvement. The latter was at least in part because the I2O infrastructure was immature in comparison to the IHV drivers. It eventually formed the model for I/O over Infiniband (where it did make sense).

[0] It's not clear if I2O was the original application for the 960 or it was a pivot for an otherwise homeless processor.


During the eighties there was a joint research project of Intel with Siemens, for a new processor architecture named BiiN.

For some reason, the BiiN project was terminated in 1988 and Siemens was not interested any more in it.

On the other hand Intel decided to not scrap the results of that project and they introduced the 80960 series based on the architecture formerly known as BiiN.

The commercial name 80960 was derived from their previous 8096 series of 16-bit microcontrollers, so 80960 was initially presented as higher-performance 32-bit replacement for the 8096 series, which was used in various embedded computers.

One interesting feature of BiiN was that it was the first monolithic CPU with an atomic fetch-and-add instruction (first used in 1981 in the NYU Ultracomputer project).

The 80960 inherited the atomic fetch-and-add from BiiN and then Intel added it to 80486, under the XADD mnemonic, together with the atomic compare-and-swap taken from IBM 370 and Motorola 68020 (CMPXCHG).

The applications for which 80960 was best known, like I2O and laser printers, happened significantly later than its initial introduction.


BiiN would eventually be explained as "Billions invested in Nothing".


> For some reason, the BiiN project was terminated in 1988 and Siemens was not interested any more in it.

You mean it was biinned.


I2O was definitely a pivot. The original i960Kx, i960Cx and i960Jx had nothing to do with I2O. I2O was introduced later with the i960Rx series.

I developed an i960RP design back in the late '90s for an MPEG-2 encoder PCI card. The encoder chips were also PCI, so the PCI bridge on the i960RP made for a nice design where all the PCI stuff was handled in one fell swoop.


The i960 was launched in 1984

https://en.wikipedia.org/wiki/Intel_i960

The I2O project wasn't until the mid 1990's

https://en.wikipedia.org/wiki/I2O


No, the commercial launch was in 1988.

In 1984 it was just the start of a research project named BiiN, of Intel and Siemens, which lasted 4 years, until 1988, when Siemens dropped out.

The series name, 80960, did not appear before the launch from 1988.


The i860 was lovely! I wrote some software for an i860-based hypercube back in the day, and it was the fastest thing on earth.

The reason it never went mainstream is easy enough to explain: expensive chips and (for the hypercube) a very different programming model.


The i960 was the main CPU in the Sega Model 2 arcade board, used in games like Daytona USA and Virtua Cop. At the time (early 1990s) the texture mapped 3D graphics in those games were pretty impressive.

https://segaretro.org/Sega_Model_2


I remember i860 accelerator cards being used with Fortran compilers, not really sure if this went anywhere, it's all a bit hazy exactly what happened, but the i860 was VLIW, so the compiler tech required for Itanium was already being explored for the i860.

I also believe that MMX was basically lifted from the i860, although again, i'm not sure.

It's funny, I tend to think of Intel as the boring chip company, due to the success and legacy that we like to moan about from ia-32, but they have actually made a fair few plays to try and move things on from there, fighting against the market which just wanted ia-32 compatible but faster processors.


I don't know about the i860, but accelerator cards for math-heavy workloads are still a thing, of course. I remember seeing a brochure for an add-on card with one or two Cell CPUs, Intel has (had?) their Xeon Phi, and of course GPUs are very popular for things other than graphics.

Intel, for better or worse, are a victim of their own success. On the plus side, their success gave them lots of money to throw at the problem of making faster x86 CPUs. It seems, though, that Intel is gradually running out of luck, with AMD and now Apple introducing strong competitors, and Intel's advantage in fabrication eroding. So CPU-/ISA-wise, things could get very interesting in the foreseeable future.


I've programmed both, though only shipped products that used the i960 (CA and KB). I believe the i960 tended to be used in things like printers but there weren't many design wins for the i860.

I managed to miss the magic that is the 88k thankfully. Who actually used it? Linotype? Tek workstations maybe?


I worked with the Intel i960CA parts and the i960MX parts, while working for Applied Microsystems Corporation (now defunct.)

Applied made high-end in-circuit emulators. My team worked on software for execution trace disassemblers. We had wide and deep memories and could record something like 16,384 bus cycles in emulator (trace) memory.

Once we had some bus cycles to analyze, we’d sort out what the processor had done when it ran (an execution trace) and show it. It was a great tool for answering questions like “How did I get here?” when your embedded code jumped off into the weeds.

The i960CA was relatively simple to work with. The i960MX was a lot more complicated, as I recall.


I believe Data General AViiON mainframes used m88k CPUs. One came through a used computer shop I worked at in the early 2000s.


Some NCD X terminals used M88k CPUs.


I’ve been trying to hunt down some X terminals recently. So far have only found some HP ones.


I used to have both a 15" all-in-one model, and a separate pizza-box style model with a 17" monitor. IIRC, both were 88k CPUs.


I think either the i860 or i960 was a CPU used in Adaptec PCI cards in the 90s.


I recall Wikipedia saying one of them (I tend to confuse the two) was popular in RAID controllers, so that sounds plausible.


While I was musing about such matters, now, the TI C80 MVP, that was an interesting one to write software for.


ibm used 8088 a 8 bit bus version of 8086?


Comment corrected.


>They had made the 8008 and 8080, but those were awkward and ungainly chips meant to power calculators,

I remember talking to Vic Poor when I used to work with him and calculators didn't seem to figure into the whole deal. It's probably worth a quick read through Datapoint history.


I think the poster is thinking of the 4004, which was originaly designed for a calculator. But that isn't really related to the 8008 onwards, which are based on a Datapoint computer terminal as you say.


People let their (understandable) hatred of Intel-the-company colour their technical judgement. Itanium was one of the more interesting architectures of its time, it fairly flew on expert-tuned assembly; I still believe we'll see a return to its ideas once the computing world finally moves on from C.

(Netburst is also unfairly maligned if you ask me; contrary to the article, enthusiasts have clocked those P4s up to 12GHz. As far as I know they're still, over a decade later, the fastest CPU for single-threaded sequential integer workloads that has ever been made; certainly the fastest x86-compatible processor for such. They're kind of the equal and opposite failure to the Itanium, ironically enough)


> People let their (understandable) hatred of Intel-the-company colour their technical judgement. Itanium was one of the more interesting architectures of its time, it fairly flew on expert-tuned assembly;

I know only few people who maintained software for Itanium, but from their reports it was a nightmare to debug code on. To have a chance to see what was going on, you would have to use special debug builds that disabled all explicit parallelism. Debugging optimized code was almost impossible, and user-provided crash dumps were similarly useless. Your only hope would be that the issue was reproducible in debug builds or on other architectures.

Needless to say, they hated it and were happy when ia64 was finally phased out.

> once the computing world finally moves on from C.

Yeah, it moves on from C... to JavaScript. Making compilers slow and complex doesn't mix well with JIT compilation.

One thing I have to give Itanium credit for is that due to EPIC it was totally safe from the speculative execution vulnerabilities like Spectre/Meltdown/etc. That was certainly a forward-looking aspect of it.


> you would have to use special debug builds that disabled all explicit parallelism

Oh god. Let me guess, when it crashes, you get a pointer to the word with the failed instruction in ... but no elaboration on which of the 3 instructions it was? Or is it worse than that and it fails to maintain the in-order illusion?


> Making compilers slow and complex doesn't mix well with JIT compilation.

Funny, I was just thinking the opposite: Compiler-driven parallelism loses against CPU-driven parallelism because the CPU has live profiling. With a JIT the compiler can have it too.

The debugging problem on the machine-code level becomes less of an issue when most people write higher-level code too.


> it fairly flew on expert-tuned assembly

There's your problem.

Given the bajillion programs out there already, how many companies wanted to dig into assembly instead of just waiting 18-24 months for Moore's Law to speed up their software?

It's all very well and nice to have nice hardware in theory, but if you can't compile existing code to be fairly fast, then in practice you just have some expensive sand (silicon) in the shape of a square.

> People let their (understandable) hatred of Intel-the-company colour their technical judgement.

So getting back to your first statement: no they didn't. Everyone was basically all-in on Itanium. All the Unix vendors (except Sun) dropped their own architectures and steered their customers toward Intel. Microsoft released software for it.

But it seems the market didn't like what they saw, and just kept on with x86—and then amd64 came out and gave 64-bits to everyone in a mostly compatible way.


> how many companies wanted to dig into assembly instead of just waiting 18-24 months for Moore's Law to speed up their software?

The people that bought Itanium-powered servers certainly weren't replacing them every 18-24 months. At the price they paid, you were looking at 5-8 years of computing before replacement. Or more.

My employer bought a pair of the final batch of Itanium servers. To replace 10-year old ones. This was an insurance purchase. The original plan was to shift all of that workload into the cloud, but that's neither going quickly enough nor is it saving any money. If you have a workload for which Itanium does well, it does it really well.


> The people that bought Itanium-powered servers certainly weren't replacing them every 18-24 months.

I was referring to the software vendors: why would they go through the effort of optimizing their code for this new architecture when they could simply wait a little while for the "old" one to get faster via Moore's Law?


> All the Unix vendors (except Sun)

cough IBM cough

They were never going to ditch power... Did they ever even have an Itanium product? I know they've had x86/x86_64, all sorts of power variants like Cell and god knows what.

I did briefly work on an Itanium system at IBM, but it was an HP box.



Oh interesting, they were going to roll it into xSeries.


I imagine then there will be a great resurgence of interest after Moore’s Law hits the atomic scale wall.


I Can't find any hit for 12ghz P4. I thought the record is around ~8GHz (And you can push modern processors in that ballpark).

I doubt that even a 8GHz P4 would be able to beat a lower clocked more modern design even on single threaded integer workloads. The P4 had a lot of glass jaws (the non-constant shifter, load replays on misses, very narrow decoder when running out of trace cache).


I've heard about 8+ GHZ Celerons (Netburst-based ones) and they were definitely on top a few years ago. I haven't kept track lately, though, and those records may have been beaten by now.


https://valid.x86.fr/records.html

I think that's still pretty much the bible for frequency records.


That is crazy fascinating. It seems windows xp and celeron and amd fx chips with 2 to 4 gigs of ram are where it's at.


> the computing world finally moves on from C.

The computing world has moved on from C, mostly. To Javascript. The main impact of that seems to be a couple of numeric conversion instructions on ARM?

(OK, not entirely fair: the computationally heavy stuff has moved away to GPUs. But if you ask the question for every button press a human makes on a computer where the dominant execution time is you might have some interesting answers, and for a lot of them it is going to be JITted Javascript)

I think it's fairly clear that for general purposes VLIW is not what either the programmer or the compiler writer wants to deal with. In-order execution is such a convenient mental model that people are willing to accept any tricks that keep it working.


The numeric conversion instructions you're thinking of are branded "JavaScript", but actually exist to emulate Intel x86 floating-point behavior. It just so happens that the ECMA specs call for said behavior because existing code relied upon it.


Nonsense. A ton of code is still written in C/C++. What do you think runs all of that Javascript?

The C world isn't moving to Javascript, it's moving to Rust, Zig and Go.


Kinda veering a bit off topic, but I’ve always seen go marketed as a systems languages alongside C, Rust, etc., but in practice I’ve only really ever seen it used to developer high level web applications.


Docker and k8s, flannel, etc are all written in go and something I'd consider "systems programming" - I mean they have to do some pretty complex coordination w/ the kernel to do thier work.


My understanding is that the problem with VLIW is that it exposes too much. Anything you expose via the instruction set becomes fixed permanently, so if you have say 4X wide VLIW there is no way to ever make it wider or change how things like dispatching work. The only way to do that would be to start pipelining and scheduling VLIW chunks, in which case you are back where you started.

Instruction level parallelism achieved by decoding a single stream and then sorting and scheduling requires more silicon and a bit more power, but low power high performance superscalar chips like modern ARM64 CPUs have shown that the cost is not that high and that you can go very wide. The M1's Firestorm cores are 8X wide from what I read, which is better than Itanium.

Since the whole superscalar architecture is hidden, it can evolve freely.

That being said I don't think VLIW was a horrible idea at the time, and it might still have a chance if it were revived in specialized high performance or ultra-low-power use cases. The mistake was betting the farm on it.

The other big thing we learned since then is that the important part of RISC wasn't reduced instruction set size, but uniform instruction size and encoding. That allows you to decode arbitrarily wide chunks of instructions in parallel without crazy brute force hacks like those required to do parallel decoding of the variable length X86 instruction stream. The problem with CISC isn't how many instructions there are, but the complexity of the encoding and the presence of a lot of confounding requirements that arise from instructions that to very different things at once (e.g. complex math with memory operands). You want the instruction stream to be trivial to decode and easy to schedule.

In the end the best approach seems to be a simple general purpose instruction set augmented with special instructions for common special cases that can be greatly accelerated this way (e.g. vector operations, floating point, cryptography, etc.), and all with a logical fixed length encoding that is easy to decode in parallel. Load-store architecture and a relaxed memory ordering model seem to also be performance wins since separation of concerns simplifies the scheduler. The future (for conventional CPUs) looks a lot like ARM64 and RISC-V.


My professor in college back in 1997 was doing research on maintaining binary compatibility between different generations of the same VLIW architecture. If you had a different number of execution units and stuff like that. He had a few ideas and one of them was preprocessing the compiled binaries and rewriting them. Some of them were having flags in the architecture for what generation of chip they were to have the OS do on the fly changes


In a world where all software is JITted (Java gang rise up), the fixedness of a VLIW ISA doesn't matter, because you always compile specifically for the target machine anyway. What you describe sounds like applying that strength of JITting to AOT-compiled code.

Vaguely related ideas from the distant past are ANDF:

https://en.wikipedia.org/wiki/Architecture_Neutral_Distribut...

And TaOS's VP Code:

https://sites.google.com/site/dicknewsite/home/computing/byt...


This is what IBM did with IBM i / AS/400 / System/38 and https://en.wikipedia.org/wiki/IBM_i#TIMI.

IBM i is on a POWER CPU today, but can still run System/38 binaries from the 70s, thanks to install-time compilation to whatever CPU the system is running this decade.


What types of languages do you see would enable more efficient VLIW compilers?

From my limited perspective, I find C one of the easier languages to write optimizing compilers for, and would therefore expect optimizing compilers to be the most efficient there. 40 years of collective experience of optimizing for C-like languages also helps of course.

Or is it the lack of explicit parallelism in the language that is limiting? Somehow I suspect the limited uptake of better suited languages to be a sign that they aren't very helpful most of the time, and most of the parallel operations people do is more like serving a lot of individually sequential transactions per second, which is something C and unix is pretty good at.


C is hard to optimise. Graydon Hoare gave a nice introductory compilers talk that went into some of the reasons why

http://venge.net/graydon/talks/CompilerTalk-2019.pdf


One well known obstacle to optimizing C is the difficulty of alias analysis. It's easier to do that for languages that don't have C's unrestricted pointers.


Doesn't the restrict keyword solve this?


The paper Why Programmer-specified Aliasing is a Bad Idea[0] evaluated the effectiveness of restrict in 2004. They found that adding optimal restrict annotations provided only a minor performance improvement, on average less than 1% across the SPEC2000 benchmarks.

[0] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.94....


How much of this is because nobody puts effort into these optimizations?

The Rust compiler has repeatedly found critical bugs in LLVM's restrict / noalias support, bugs that would impact C / C++ as well if any real-world C / C++ programs actually used it.

If compilers produce straight-up broken code in these situations, I can only imagine they're not putting a lot of effort into these optimization strategies.


> How much of this is because nobody puts effort into these optimizations?

restrict is rare in C and C++ but common in Fortran; array parameters in Fortran aren't allowed to alias. Intel and IBM both have great Fortran compilers so I would expect their C and C++ compilers to have good support for restrict.


I don't think anyone has ever used the restrict keyword, or understood what it does.


What? I use it whenever I have a function that takes two or more pointers if I know they can't refer to overlapping memory. And it's part of the signature for memcpy since C99


When he said "anyone" he meant "almost anyone". You're an outlier if you use `restrict` regularly.

I checked on grep.app, there are 10k results for `restrict` for C code, compared to 700k for `struct` (I know they're not directly comparable but that gives an idea.


That seems like about the right proportion to me - struct solves a much more common problem than restrict does. And to be fair, it is a lesser known feature. But user-the-name is implying that restrict is somehow difficult to use or understand, which I don't agree with at all.


That probably explains it. In many shops, C may as well have stopped at C89.

I've been working with C since the early 90's. I've never seen any code use restrict.


Also large chunks of libc use it as well, e.g. the printf family of functions.


derp.


So Rust comes to mind, right? Anything else?


FORTRAN or any language with lots of arrays and matrices.


I suspect if you want a HW architecture for running array operations you'll end up with something like a vector machine (e.g. ARM SVE(2) ) or a GPU rather than a VLIW CPU?


VLIW is basically a more flexible kind of vector machine.


And a traditional scalar architecture is more flexible still. The trick is to pick the correct set of tradeoffs for the targeted applications. I claim that for most array style workloads vector/GPU architectures are flexible enough, and offer better perf/watt and perf/chip area.


So, APL?


Absolutely yes that would make sense (or more likely the modern "derivatives" like J & K)


C is very "pointer heavy", and much code involves chasing linked lists and the like. This tends not to suit VLIW well.

Modern languages like Rust tend to produce more instructions for the same high-level logic, but those instructions are easier to schedule for superscalar CPUs. It typically ends up as a bit of a wash on CISC processors, but could be better than C/C++ on VLIW.

I guess we'll never know now...


C is pointer heavy if you write pointer heavy code


> What types of languages do you see would enable more efficient VLIW compilers?

I was thinking of languages where dependencies are more explicit and the idea of a global evaluation order isn't there in the first place. I'd be very interested to see a reduceron-style effort that implemented graph-reduction evaluation on a VLIW processor.

> Somehow I suspect the limited uptake of better suited languages to be a sign that they aren't very helpful most of the time, and most of the parallel operations people do is more like serving a lot of individually sequential transactions per second, which is something C and unix is pretty good at.

Heh, that was the idea that those barrel-processor SPARCs were designed around. But they weren't so successful in the market either in the end.


The TMS320C6678 (C66x architecture) DSPs still use VLIW and work pretty well. Like most DSPs, they're typically programmed using C, for which TI supplies optimised libraries for processor-intensive operations. IIRC, the compiler itself was fairly standard.


I had a netburst P4 for a while.

MATLAB simulations were comically faster on my lower clocked Pentium M laptop.


I malign Netburst because I owned one (actually still own it) and it was slower than the previous generation of processors (under certain loads) despite costing more.


> I still believe we'll see a return to its ideas once the computing world finally moves on from C.

If only for the reason that Itanium was one of the few architectures not affected by Spectre-family attacks.


The later ones are out of order, they are very likely affected, just no one cares enough to prove it.


Going "faster" by doubling pipeline stages doesn't gain anything but fat bonuses for marketing.


I had to maintain an IA-64 Linux system at a previous job, and it was such an odd duck. The OS did a decent job of abstracting away most of the weirdness, but at the end of the day it was just a very slow server. The compiler breakthroughs that Intel was counting on to make it competitive never happened, and since its unusual architecture made it bad at running code not specifically tuned to it, the end result was that nothing ran great on it. I’m sure that HP had some highly optimized code that ran like greased lightning, but that never worked its way out to the general public.

I admit that I’m glad Itanium finally died. It killed a lot of other interesting architectures and gave nothing in return.


For some periods (including during some of the Opteron era) Itanium 2 actually did okay.

https://www.realworldtech.com/forum/?threadid=27345&curposti...

It had very strong floating point performance and on spec int it was holding pace with Opterons, Athlons and P4s clocked a lot higher. Apparently a lot of the int performance was due to large fast caches (I think it had a bigger die size compared with the others) -- have a look at the 1.5MB version vs the 3MB version. But even there it wasn't doing so bad against the Sun and IBM processors which were out of order I think. So yes you're right some things did run pretty fast.

Unfortunately for Itanium, compiler techniques to make up for in-order execution had just about run out of steam at that point, while OOOE continued to scale up and improve steadily. Which is pretty much the opposite of what HP and Intel had predicted in the 90s, which is why they justified Itanium's approach.


> I’m sure that HP had some highly optimized code that ran like greased lightning,

unwarranted optimism. I recall HP Pentium Pro servers with loads of "built in" stuff and dual PCI buses: all the builtins were hung off the 2nd, chained bus; on one IRQ. The memory was severely limited, too, iirc; NUMA'd through one of 4 CPUs.

HP design seems to be all about locking performance away; it might be in that box in theory, but all their ingenuity was spent ensuring it stayed there.


> It killed a lot of other interesting architectures and gave nothing in return.

What sort of architectures (if any) got killed as a direct result of developing IA-64?

That aside, though odd and impractical, I found Itanium to be at least a technically interesting perspective (both in terms of architecture, ie VLIW, and in terms of the amount of technical work it inspired at HP and Intel, such as the Itanium C++ abi [1])

[1] https://itanium-cxx-abi.github.io/cxx-abi/abi-eh.html


>What sort of architectures (if any) got killed as a direct result of developing IA-64?

Alpha?


And HP-PA and, later, MIPS (as a mainstream computer). MIPS survives in some routers.


MIPS is very, very close to dead now.

To be fair though, I think its death is more attributable to ARM than Itanium. And RISC-V is killing off anything that remained.


MIPS the brand was bought for cheap from a bankruptcy, and resurrected by its new owner… to be an ARM partner building ARM chips.


Arguably it was the nail in the coffin for alpha


Itanium may have been the nail in the coffin, but by the time the hammer fell on that nail, Alpha was already basically dead due to bad business structure and incentives that were inherited from DEC. They had many of the same problems that Intel has been having recently (low yields, failure to keep up with performance increases) without Intel's long-standing business agreements or inertia to hold up the product line through a difficult period.


Alpha, HP-PA.


> In 2003, AMD launched their 64-bit, but Pentium-compatible, Opteron CPU. Everyone stopped buying Intel CPUs for a while. [...] almost everyone immediately embraced AMD's instruction set and no one but HP wanted anything to do with Itanium.

Huh, I never really connected the dots before, but TIL that this is why 64-bit images and package repositories of various Linux distributions were referred to as `amd64`...


Yup. It was a classic mishandled transition; they messed up the emulation https://www.zdnet.com/article/intel-scraps-once-crucial-itan... , and getting the benefits of the transition required a big ecosystem shift. You more-or-less had to use Intel's compilers.

Meanwhile AMD offered "x86, but faster and wider", which (once you booted a 64-bit operating system) could run either type of binary at native speed. Sometimes people really do just want a faster horse.


It also didn't help that the Itanium project was delayed many years. Actually the first one "Merced", was supposed to come out 95 or 96. Had it been on time, things might have looked quite differently. And indeed, I think it was the AMD64 which breathed new life into x86, killing both the Itanium as well as most classical RISC architectures, because it was a nice upgrade path from existing PCs.


This leads to an interesting situation, where Intel owns x86 and licenses it to AMD, while AMD owns the amd64 extension and licenses it to Intel. Should one company revoke their license, so would the other (I assume, naively), and boom - no more amd64/x86_64 CPUs for you and me. Intel could probably create a new extension to x86, but that would take some time.

And IIRC, AMD's x86 license evaporates automatically, if the company is acquired by someone else, which means the only company that could buy it without disastrous consequences is Intel, who probably would run into some anti-trust issues if they did.


What is it that ensures licensing is required? Patents? Because patents on x86 technology predating amd64 should be expired by now... and without patent protection, another company would be allowed to reverse-engineer it, no?


https://www.blopeur.com/2020/04/08/Intel-x86-patent-never-en...

Tl;dr: many new patents have been created over the years, for instance SSE extensions, and those haven't expired.


it was a great time. I remember buying a rackfull of AMDs at 1Ghz when the opteron first came out. They flew. They also consumed a ton of power and generated a ton of heat (I blew the circuit breaker for the whole floor of Evans Hall, in Berkeley, just turning them on).


> This annoyed them and they decided to make a new CPU that no one would want to use.

I thought this was sarcasm, but the tone continues throughout the article. Are people this really painfully misinformed? The ISA was brilliant. The execution, not so much. It was a far more scalable architecture, completely "clean" from x86 baggage (I'm talking about the ISA, not the thermal issues Madison and McKinley had with the floating point units). It even ran Windows and had two compilers (one from HP and one from Intel), and it decoded x86!

Then again, people were also hot on Transmeta at the time, so perhaps the author is confusing things.

> better known for the good designs you killed

Like?

Intel sat on Yamhill (64b "star-T" for x86) for a year, and when I was brought on, the project had issues in both instruction fetch and execution-unit area. It was not a good design, and was not ready because it was cobbled together at the last minute. Quite literally. AMD beat Intel to both 64-bit and 1 GHz, and we all (architects) knew it.

Looking at it one way, Itanium was a product that was ahead of its time. In 1995~97 Google didn't even exist yet, Amazon was just getting started. Server farms with their own power plants were something reserved for investment banks as redundant backup, US eCommerce was well under $1Billion/yr . I'm 100% certain that if Intel had brought the yields up on Itanium and taken an early loss to build market share, it would have been the dominant architecture in integer cloud today.

But in reality they had to beat AMD, because AMD was going after their cash cow (commodity x86) with a vengeance, and winning. Couldn't fight two wars at once, and thus, Itanium withered.

So looking at it another way, one could argue Itanium was too late because the internet scaled out fast in late 1990, and Itanium was caught with its pants down, so x86 filled in the gap.

Either way, Arm is more likely to take away server market from x86 than IA64, going forward, but that's a not a sure thing.

I could (30% chance) see VLIW making a comeback if capacity weren't such a problem, it is far more suited to e-commerce cloud instruction mixes, and with hypervisors and docker-like virtualization so much faster than VM, it could happen.


> I thought this was sarcasm, but the tone continues throughout the article.

It was sarcasm, but the tone continued because I think Itanium was an enormous waste of time and other resources.

> The ISA was brilliant.

I completely disagree. The ISA depended on enormous leaps in compiler technology in order to make it useful. However pretty it may have been, it couldn't be implemented in a performant way. It'd take ugly-but-practical over pretty-but-impossible any day of the week, especially when I get to use languages that abstract the ugliness away from me.


I think that statement suffers from hindsight bias: x86 is great now because of enormous leaps in compiler technology to work. Yes, compilers do need to evolve with CPUs, they don't just grow on trees!

Put another way: given a choice between stuffing an electric engine in a Ford Fiesta, and a Tesla, Itanium could have been the Tesla in the hands of another company, but the market went after the Fiesta retrofit because it was here-and-now, not coming-soon.

EDIT: changed from "you" to "that statement"; added analogy.


I don't believe that's historically accurate at all. x86 compiler performance has been good enough for a long, long time. Yes, newer versions are clearly a lot better than ones from the mid 90s, but the performance improvements would be described in terms of percentages, not orders of magnitude.

IA-64 required a complete rethinking of compiler design to solve NP-hard problems on a large scale just to get passable performance. Translating C to ASM that keeps execution units busy is radically different on IA-64 than pretty much anything else.


> x86 compiler performance has been good enough for a long, long time.

> NP-hard problems on a large scale just to get passable performance.

First, it is hard to refute a claim, or find meaning in it, when you start using terms like "passable" and "good enough".

Second, I think this overstates the need for optimal scheduling because Itanium SPEC performance was "passable" out of the gate.


I know that I don't have sufficient engineering background to debate you, but still I see ARM's Johnny-come-lately 64-bit implementation that has gone farther up and down than would ever have been possible for Itanium (Fujitsu with the top supercomputer, and any 64-bit phone illustrate the point).

So I am drawn to Linus Torvald's commentary on IA64:

"IA64 made all the mistakes anybody else did, and threw out all the good parts of the x86 because people thought those parts were ugly. They aren't ugly, they're the "charming oddity" that makes it do well. Look at them the right way and you realize that a lot of the grottyness is exactly _why_ the x86 works so well (yeah, and the fact that they are everywhere ;)."

https://yarchive.net/comp/linux/x86.html


I wish I knew which parts Linus was talking about. Segmented addressing? Can't be that. Assymetric registers? Probably not that. Vitual 86 mode? Meh. Variable instruction length? God I hope not. Begs the question: what oddities?

I think this is the most astute statement he makes: "yeah, and the fact that they are everywhere."

> ARM's Johnny-come-lately 64-bit implementation that has gone farther up and down

What do you mean? Arm is primarily embedded IP, where 64b only recently has come en vogue (unless you mean A5/A7?, I'm talking Cortex-M). Arm competitors are really Renesas, Synopsys, TI, and a number of Chinese chips, and 64b isn't a prioirity in most of that space. Not defending anyone, just not sure what you mean.


Did you read the post you are replying to?

Linus says to ignore the design mistakes like segmentation.

Then he says:

"the baroque instruction encoding on the x86 is actually a _good_ thing: it's a rather dense encoding, which means that you win on icache."


He did say that and he's wrong. x86 isn't particularly dense and RISC-V compressed is denser. However the insane encoding is burning real power, takes up real area, and is a real limiter for decoding. Dear Linus has an irrational love for x86.


He wrote those comments in 2003. The instruction decode logic takes up less space with each new chip and process shrink. These days I don't think it really matters much.


It does. Source: Intel engineering, privately off the record.

How wide is the widest Intel CPU? How wide is M1? Case closed.


Oh, I don't think the link was originally there, or I missed it. Yes, of course: that claim is like the first shot across the bow in every CISC vs. RISC debate.



Probably not: this is a boutique machine defined with an esoteric architecture, not a general-purpose integer computer for the cloud, which I was what I was talking about in my grand-parent-comment.


Perhaps the key point is the era of design.

The P4 was built with an enormous pipeline (~20 stage?) in the hopes of reaching a 10ghz goal that was never going to happen.

The Itanium was designed a decade before, programmed for parallelism that was also never going to happen.

The M1 is not an esoteric machine, and if Apple had bundled an Itanium of equal computational ability, all of those NOPs would burn a hole through the casing.

ARM's 64-bit implementation is obviously something special, esoteric or not it is more powerful than anything produced by Intel in several measures.


Did HP really start that VLIW design _TEN_ years before P4, because that would be 1985 and I find that unlikely? At Intel IA64 and P4 were developed at the about same time, with P4 taping-out a few years later; the former in Santa Clara, the latter in Oregon.

> The M1 is not an esoteric machine, and if Apple had bundled an Itanium of equal computational ability, all of those NOPs would burn a hole through the casing.

You're right, its not. And I never said it was. But a 6D mesh torus sure is.


Linus is a big fan of rep movs.


Given that we have magic compilers that convert mostly sequential CUDA into highly parallel SIMD assembly code, I don't think that the Itanium concept was too far off.

But all of the details had gross amounts of complexity. More modern VLIW architectures are more obvious about their parallelism... Without any of the weird 'bundles' that made the Itanium terribly complex.

I still think that 'simple decoder' has hope, and I point to NVidia Volta and Turing as my case in point. At the assembly level, the NVidia SASS compilers generate the read write dependencies. NVidia has proven that the compiler can indeed specify the dependencies at compile time, though they require a huge amount of software bulk to be able to do so (PTX pseudo assembly that compiles into SASS later)

The issue is that the Itanium was poorly designed and impractical. But the overall concepts seem to still be possible (indeed, more possible given today's technology). I'd be interested in a modern Itanium, but instead split as follows: where the compiler determines the dependencies but the core figures out the precise scheduling.


> In 2003, AMD launched their 64-bit, but Pentium-compatible, Opteron CPU. Everyone stopped buying Intel CPUs for a while. Within a few years Intel made their own 64-bit, but AMD-compatible, CPUs to avoid entirely losing the desktop and small server market. They were right earlier: almost everyone immediately embraced AMD's instruction set and no one but HP wanted anything to do with Itanium.

This is the only reason why Itanium failed.

Had AMD not been allowed to produce Intel-compatible CPUs, everyone would eventually be forced to eat Itanium regardless how they tasted.

One doesn't get to be picky when food supply dries out.


History may have unrolled quite differently at that point, if the only choice was to shift architecture, then we may have seen a more diverse server CPU estate being maintained. At that point Sun were still relevant, POWER systems were still being bought and deployed in larger proportions etc.


None of them would adopt Windows laptops running on POWER, just as one possible scenario.


True, but Apple were running Macbooks on power back then I think. I know windows was even more dominant then than it is now so perhaps large numbers of powerbooks was never on the cards.

But in the server space the story might have been different.

I take it intel were planning "Mobile Itanium" at some point?


On those days Apple market share was still of a company struggling to get out of insolvency, and they never had any significant market share outside North America during the last century.

Yes, on the server it would have been differently, however Linux distributions might have still killed the other UNIX vendors with their CPUs alongside, as it happened.


Not to mention AMD was in the business when IBM mandated second/backup supply source to Intel, and that's how AMD began producing x86 chips like Intel.


Everybody here seems to be focusing on the negatives, but there's also a few positive legacies from the Itanium architecture, all of them on the software side: EFI which became UEFI (like it or not, it's still better than the legacy BIOS), the GPT partition table format, and the C++ ABI now used for all architectures on Linux.


Sadly EFI killed OpenFirmware which could have given us open source boot code and portable drivers.


One very good thing came out of the Itanium project: the vendor-independent ABI for C++. IBM, HP, Sun, Intel, and GCC all signed on to rigorously specify the layout of C++ objects and the function call interface on the Itanium platform (GCC got funding to help with this) and the processor-independent parts of the standard were then used pretty much everywhere else (with adaptations for different instruction sets to handle the call interface).


> In 1994, Intel and HP looked around and saw a wide variety of successful server CPU architectures like Alpha, MIPS, SPARC, and POWER. This annoyed them and they decided to make a new CPU that no one would want to use.

I spent some time "optimizing" code to run on Itanium for "MegaBank". Who then dumped it. Then we swapped to AMD Opteron for while before Intel caught back up.

I suppose it was successful in that it created work.


I'm pretty surprised to hear "MegaBank" ran AMD. Not that Opteron wasn't awesome and better than what Intel had at most things, at least until Woodcrest if not Nehalem. But because MegaBanks are usually pretty conservative. More than a few still run mainframes and proprietary unixes.

Unless you worked in their HFT group, that is.


We wrote quant code that had to run on all architectures... for the HFT team [1], unfortunately, including a long drawn out debacle supporting Solaris 5.7 for an absurdly long time. Other members of my team were looking at IBM architectures as well, mostly because IBM were throwing money at us to use their kit.

[1] high frequency, not necessarily low latency, for some defintion of the word 'high'.


I didn't see anyone mention this, but this is completely wrong:

"This annoyed them and they decided to make a new CPU that no one would want to use."

Facts: 1. Itanium grew out of internal HP efforts to define their next generation architecture, presumably in response to their difficulty scaling the Precision Architecture.

2. Intel only got involved part way through.

3. Nobody ever "decides to make a CPU that no one will use".

Itanium for all its issues, was used in great number, but nowhere near what it was supposed to. The failure of Itanium was a bet of handling cache misses with the compiler (like Transmeta, my alma mater) and having a design by committee (lots of nice ideas in isolation, a disaster in aggregate).

There are so many technical problems with the ISA, too many features, too much performance draining complexity. Something as critical as registers is burried behind (IIRC) three levels of indirection (rotating windows, modulo scheduled registers, etc). The big bet on predication didn't help code density nor cycle time.

It's really a shame that Intel throughout history has botched ISA design after ISA design. That it can be done right is exemplified by Alpha, RISC-V, and ARM64. If the rumored acquisition of SiFive happens, then maybe they can finally get it.

EDIT: Typos


The grand plan of VLIW might not have worked out anyway, but I always wondered what would have happened, if Intel had moved the Itanic to current process nodes, even without huge modifications. By the transistor count, it could probably be power-efficient enough to drive mobile phones.

Also, on the time of its peak, the Itanium did perform quite well, I wonder what it could deliver, if it would be grown to use all of modern production processes.

Maybe VLIW cannot be made to work, but I do think that Intel missed the boat by not providing cheap development systems to enthusiasts. Like a PC-compatible motherboard with an Itanium for like 500$. The Linux crowd would have picked it up gladly. Instead, it was only available as espensive hardware, which reduced the potential developer audience (both from the availability of the development resource itself as in the target market).

The post mentions, that everyone picks up the new Apple Silicon processors, because they are nice. They certainly are, but the big point is, that a compatible ARM system can be picked up for as little as $35, the Raspberry Pi and its friends. And while Apple hardware isn't low-cost, it is increadibly cheap in comparison to the Itanium systems of that time.


Intel lost a lot by playing games rather than letting engineers build the best stuff they could and cash out checks on it.

Now they cry help to the same engineers they have turned away from in the past.


>> Intel lost a lot by playing games rather than letting engineers build the best stuff they could and cash out checks on it.

Which is strange since IIRC Intel engineers designed the PCI bus to replace ISA in spite of management saying that's for the PC makes to do. I would have thought they'd learn from that success, but apparently not.


I have worked for Intel sometime later, when they stopped being so elitist and actually started overdoing it in the other direction -- building everything else and the kitchen sink rather than focusing on their core competency.


I’ve seen it mentioned over the years that writing the necessary compilers was more than merely difficult. As I understood it, the dynamic nature of load latencies (hit L1 cache? L2? L3? Slow trip to DRAM?) means that some workloads just can’t be statically scheduled in an effective manner. Anything to that?


Essentially yes. In cases where memory latency is predictable like DSP workloads then VLIW processors actually tend to work really well. Itanium had some facilities to allow speculative loads but they had a lot of overhead in practice.


Don’t forget HP and Oracle have been in a lawsuit for a decade about Oracle dropping support for Itanium. Oracle is on the hook for 3 billion.

https://www.reuters.com/legal/transactional/oracle-loses-bid...


I note that Debian still has an unofficial ia64 port:

https://www.debian.org/ports/ia64/ https://wiki.debian.org/Ports/ia64


Minor historical footnote: HP paid Progeny (Ian Murdock's company he founded years after creating Debian) to help port Debian packages to Itanium. They gave us a first-generation computer, which was epically slow of course.

That contract helped keep Progeny alive after the venture capital funding dried up; the company's original plan was to allow customers to reboot their Windows desktop computers overnight to run as a distributed Linux supercomputer.


Gentoo also has instructions. Linux/ia64 works fine for the most part, it's not an architecture that causes much drama for userspace applications.

Servers are cheap on ebay and the older Intel boards (also sold by Dell, Fujitsu,..) can be upgraded to newer CPUs that are either less power hungry or faster with more cores and sort-of HT. HP are generally not upgrade-able and SGI/ia64 is a special case with lots of other custom hardware as usual.

Annoyingly many Linux / gcc developers want to remove ia64 support from their source trees because the architecture is no longer commercially relevant.

As a necrocomputing enthusiast it's quite sad, but not much one can do about.

If only this old junk was as popular as the various homecomputers...


Note at a kernel level it's now marked as orphaned. It also seems to just get random regressions, because nobody actually tests it.

https://www.phoronix.com/scan.php?page=news_item&px=Linux-Or...


That's because not everyone is always running bleeding-edge kernels.

We have found and fixed multiple regressions in the kernel on ia64. As of Linux 5.12, the kernel runs fine on most ia64, there is still one known regression that affects some machines.

Disclaimer: I'm Debian's maintainer for ia64 (and most other older architectures such as m68k).


Kind of curious why the article didn't mention HPs PARISC architecture. IIRC, HP switched away from PARISC to Itanium at the time. I guess because they saw that managing your own architecture comes at a huge cost? I very well remember how Itanium was heavily promoted by HP to existing PARISC customers as the natural upgrade path and how they played down x86 as unreliable PC derived stuff that no-one should put in a datacenter ;)


HP concluded PA-RISC (and all RISC, really) had reached a plateau. Wikipedia explains it well:

In 1989, HP determined that the Reduced Instruction Set Computing (RISC) architectures were approaching the processing limit at one instruction per cycle. HP researchers investigated a new architecture, later named Explicitly Parallel Instruction Computing (EPIC), that allows the processor to execute multiple instructions in each clock cycle. EPIC implements a form of very long instruction word (VLIW) architecture, in which a single instruction word contains multiple instructions. With EPIC, the compiler determines in advance which instructions can be executed at the same time, so the microprocessor simply executes the instructions and does not need elaborate mechanisms to determine which instructions to execute in parallel.[5] The goal of this approach is twofold: to enable deeper inspection of the code at compile time to identify additional opportunities for parallel execution, and to simplify the processor design and reduce energy consumption by eliminating the need for runtime scheduling circuitry.

HP believed that it was no longer cost-effective for individual enterprise systems companies such as itself to develop proprietary microprocessors, so it partnered with Intel in 1994 to develop the IA-64 architecture, derived from EPIC. Intel was willing to undertake the very large development effort on IA-64 in the expectation that the resulting microprocessor would be used by the majority of enterprise systems manufacturers. HP and Intel initiated a large joint development effort with a goal of delivering the first product, Merced, in 1998.[5]

(https://en.wikipedia.org/wiki/Itanium)


The only place I ever worked with Itanium was an HP shop. They wrote their custom accounting/booking program in the 90's using Pick/BASIC.

It was fun to play around with, but coming from Linux, HP/UX left a lot to be desired. I'm not sure if there is a way to run Pick/BASIC on Linux but it would have made sense for them. Costs would have went way down being able to use Intel machines instead of paying a premium for Itanium parts.


> Goodbye, Itanic. You were a strange, unloved little detour, better known for the good designs you killed than for any successes of your own. Few will miss you.

It does, albeit indirectly, and not by name. Poor Alpha and PA-RISC...


To see how expectations were gradually reset, see this chart which also appears on the Wikipedia page for itanium. https://helgeklein.com/wp-content/uploads/2010/05/itanium-sa...


It's a bit ironic that one of the reasons for Itanium's failure was that the compilers were not ready. I have no doubt that nowadays GCC and clang would immediately have good support for any new processor (especially from a big player like Intel). In any case the processor vendor would make sure themselves that there would be good compiler support (see Apple).

(Setting aside the question if good compiler support would have been possible with Itanium's architecture, or if the optimizations they thought of could not be implemented in principle. I don't know enough about it to tell for sure.)


gcc had already existed for 14 years by the time Itanium came out, and it had been widely ported to many different systems... that was kind of the whole point of the project. The real issue I feel is locked up in your parenthetical: it is difficult to put aside the question of whether good compilers for Itanium were even possible, as part of the entire problem was that the VLIW architecture concept was so incredibly different than CISC/RISC that it made porting the compiler very difficult (and while you could, can, and did somewhat easily do naive compilation that simply ignored the potential benefits, the resulting performance sucked). I was in college at the time and I remember VLIW being cited as a reason why compilers had a lot of hot research that needed to be done ASAP.


AFAIK Intel and HP spent well north of $1B on VLIW compiler research, with crickets to show for it all.

The people prophesying the second coming of VLIW (for general purpose code) seriously need to explain the compiler breakthroughs that have been done to make the whole affair worthwhile.


Yes -- and explain how code statically scheduled for the worst-case dependences at compile time can ever beat out-of-order and speculative execution that can see the actual dependences at runtime. For anything other than extremely regular codes, there's an information gap that can't be overcome.


> Intel and HP looked around and saw a wide variety of successful server CPU architectures like Alpha, MIPS, SPARC, and POWER.

I used a PA-RIRC/Alpha and Power machines in various work roles. All those architectures/OS fell to X86 speed improvements/low prices. All those other architectures couldn't compete as intel X86 got faster and faster and caught up with exotic silicon.

When MS released windows NT for x86 it was over. My Mom's former employer (PTC) released for windows NT a little late and lost ground to SolidWorks. Even Apple gave up using power an switched to x86.


We plan to get rid of this real soon now.

    # uname -a
    HP-UX antique B.10.20 A 9000/800 862741461 two-user license

    # model
    9000/800/K380


Please offer this machine to the debian-hppa port, post a message here:

> https://lists.debian.org/debian-hppa/

It might be worth to be hosted at the Open-Source Lab at Oregon State University where we just lost one Debian machine IIRC.


Unfortunately, this is being used for a corporate tracking application.

I have several of them, bit they run on 220v power and are the size of a small refrigerator.

The CPUs range up to 220MHz.


Are they in Europe?


Not even sad. What I am sad about is that it took Alpha and PA-RISC with it.


Is it too cynical to assert that this was the actual intent?


It was the intent of DEC and HP insofar as they joined the Itanium programme and EOL'd their own designs.


DEC was part of HP at that point, so it was the HP leadership that made the decision to ditch Alpha.


Compaq bought DEC and killed Alpha before both got folded into HP.


Alpha was the reason DEC was for sale. It was a very cool speedy architecture, but it took too long to emerge from the DEC management swamp.

Bitsavers has a series of memos showing how the Alpha predecessor Prism went down in flames when DEC decided to quick-fix its technology hole with MIPS.

Alpha was a kind of illicit skunk works leftover from that failed project. If DEC had pushed it out the door a couple of years earlier - not likely, but possible with a push - it might have eaten the rest of the industry.


I suspect that whatever the virtues of the Alpha architecture [1], DEC simply didn't have the muscle to stay in the game with exponentially increasing R&D costs.

[1] And it's not like Alpha was some shining beauty of ISA design either. Early versions lacking sub-word load/store, the absolutely crazy memory consistency model, ...


Wasn't the byte stuff due to patent kerfuffle with MIPS?


AFAIK, yes.


and MIPS, as a server architecture. In 1998, MIPS Technologies left SGI. Without a major systems house using it, MIPS was left to chase embedded markets. The R10000 had been used by SGI, Tandem, Pyramid, and NEC for high-end servers.


Itanium was so insignificant that while I recall that it was being EOL'd, I hadn't marked my calendar for it.

A year ago, Derbauer did a teardown and got some sweet die shots of one: https://www.youtube.com/watch?v=Lqz5ZtiCmYk

Nit: this blog's text is a bit small and thin.


Having had to deal with it, I scheduled a reminder. I didn't want to let the day slip past!

Ooh! The pretty pictures start around 11:03.

(Blog is mine; I'll see about tweaking the text.)


Itanium_Sales_Forecasts.png is my favorite image on all of Wikipedia https://commons.wikimedia.org/wiki/File:Itanium_Sales_Foreca...


Could one say that the Itanium helped to do in the Alpha and PA-RISC architectures?


I don’t know if it’s true, but a story I was told by an HP engineer claims that Itanium was more or less designed by HP. Which would mean that HP wanted to kill off the PA-RISC platform.


Yes, but the massive economies of scale in chip manufacturing and software compatibility would most likely have doomed Alpha and PA-RISC in any case.


This guy has been waiting 20 years to use that headline but, if he wants me to make room for him on my door, he'll have to do better than that.


I did my best!

And as with Intel, it just wasn't enough.


Itanium was one of the best processors I worked on and I worked on at least 20 different ones. It was way ahead of its time and the software was not ready and of course it cost your arm and leg and the next three generations, so it was really only ever taken up by large enterprises. I wish Intel and HP were more sensible in how they brought it to market.


This post, funny enough, absolutely gets my thoughts down entirely.

If I didn't know better I'd have accused myself of ghostwriting that post. That's amusing and horrifying.. but it's all accurate to what I think!


They say great minds think alike, but if you're thinking like me instead... I'm so, so sorry.


a vessel deliberately sunk is scuttled.


Can you call it Itanic if it was around for more than a decade because of how well it sold and how many supercomputers chose to use it? I am familiar with the pain of Itanium, and I understand how hard it was to work with. I've kernel debugged an operating system on it. Yet it was still successful enough to be around for this long. I wouldn't call it "Itanic."


> Can you call it Itanic if it was around for more than a decade because of how well it sold

It didn't sell well, though. At all.

> and how many supercomputers chose to use it?

Here's a chart of supercomputer architecture by year: https://commons.wikimedia.org/wiki/File:Processor_families_i...

Itanium showed up around 2001, peaked in 2003 at about 100 systems, and was nearly gone by 2007.

However good it might have been hypothetically -- and I contend that it wasn't -- it was a market disaster.


Is this also the end of HPUX? Can it run on anything else? It's hard to find information on that.


It certainly used to run on other architectures -- notably PA-RISC and Motorola 680x0.


Yeah but those versions have long been unmaintained. All modern HPUX code and third party apps are itanium only.

The last 10 years or so HP actually paid intel a lot to keep making and developing itanium because they're so dependent on it for hpux and their long term support contracts. If a port was realistic they would have done it


They definitely "can" port it, but would they? I believe HP is internally using Linux for more and more things -- the question is if there are enough customers paying for HP-UX to make porting worth it, or if they are legally able to sell the rights to it as they did OpenVMS. Only HP knows the full answer.


There's still a lot of legacy HPUX servers out there, mostly running SAP+Oracle combo, but they all slowly get phased out in favour of Linux or Windows boxes. Still, some of them will likely remain chugging along for years to come, because their performance will simply be sufficient for the task at hand.


I can well imagining HP milking existing HP-UX customers for all they're worth, but I'm finding it hard to justify spending much $$$ to develop it. So I would expect HPE to provide security patches and nothing more.


The Itanic sunk like 15 years ago, they just finally stopped making chick flicks about it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: