Intel Begins EOL Plan for Xeon Phi 7200-Series ‘Knights Landing’ Host Processors

rbanffy · on July 25, 2018

Well... It's not to be expected the old processors would remain available after the introduction of their improved successors.

seanmcdirmid · on July 25, 2018

What newer HPC product is replacing this? It seems like the Phi might be dead end, at least for the foreseeable future.

rbanffy · on July 25, 2018

It's the Knights Mill family - https://ark.intel.com/products/series/132784/Intel-Xeon-Phi-...

kev009 · on July 25, 2018

These seem pretty insane, you'd be better off with two POWER9s in just about any dimension including cost, TDP, thread count, cache, memory bandwidth, I/O bandwidth including PCIe lanes, FLOPS and integer performance.

mkj · on July 25, 2018

The power9 chips don't have gigs of memory locally attached do they? Knights Mill has 16G/socket it looks like.

Knights Mill without an apostrophe - Mill is a verb?

kev009 · on July 25, 2018

With CAPI (cache coherent interconnect) I don't see why you couldn't do this in a more open way. i.e. optane, HBM2, whatever comes next

dragonwriter · on July 25, 2018

> Mill is a verb?

It is, in fact, though that's not where the name comes from. (The verb for the actions a mill performs is also “mill”.)

kps · on July 25, 2018

The naming convention came from places, starting with Knights Ferry, CA.

rbanffy · on July 25, 2018

> Mill is a verb?

In English, you can verb anything.

rbanffy · on July 25, 2018

Not sure about that. The local HBM is pretty cool and these parts are priced in the 3K range. With the added benefit of virtualization in the newest generation, they can be used as a general purpose machine. A 22-core POWER9 costs about 2500, but doesn't come with 16GB of HBM in the package. You get 88 faster threads against 288 slower ones.

I'd love to be able to justify a POWER workstation for me, but the implicit guarantees of a x86-compatible machine is too important to ignore (for my workloads).

acqq · on July 25, 2018

Can you please elaborate? How do you come to such a conclusion? I'd like to read some realistic comparison which would demonstrate the results in both cases, compared to the power consumed.

kev009 · on July 25, 2018

You can validate pricing on https://www.raptorcs.com/content/base/products.html. I am one of the first buyers with a dual 22 core system, so I'd be happy to run any numeric benchmarks if people have ideas.

Basically you get 88 threads per socket, generally will have "bonus cache" (this is a 24-core die so you get free cache if the fused off cores are not adjacent) so 120MB per socket, much higher clock speed, rediculous I/O bandwidth to PCIe 4.0, CAPI and NVLink. The speeds and feeds are simply higher than anything intel has been able to muster to market.

If you have enough volume, getting the La Grange POWER9 CPU Google and Rackspace are using would be even better than the Sforza Raptor's Talos II uses -- that gets you 8 DDR4 channels. I know basic pricing rumors and it is good.

keldaris · on July 25, 2018

If you get some free time to run benchmarks, the Phoronix open benchmark suite (OpenBenchmarking.org) or even just basic LINPACK would be a great start. There's such a lack of public hard data on these systems it's hard to evaluate where they stand in terms of cost/performance even for very standard workstation/HPC workloads.

dragontamer · on July 25, 2018

HPCG seems more realistic than Linpack.

Based on the Phoronix tests I've seen, AMD EPYC or AMD Threadripper is better for cost / performance.

But Xeon Phi is an accelerator over PCIe. Its more comparable to a NVidia Tesla V100. You write specialized software to run on the Xeon Phi (even though it shares the x86 instruction set, its architecture is alien so it requires the programmer to specifically optimize for the Phi platform).

rbanffy · on July 25, 2018

> But Xeon Phi is an accelerator over PCIe.

It used to be that way, but those products were discontinued. Modern Phi's are CPUs that can operate without a host computer.

imtringued · on July 25, 2018

I'm not sure why you are comparing them on a threads per socket basis.

Each Xeon Phi core has 4 threads which results in 288 threads per socket. [1]

At those core counts you are often no longer limited by the CPU. Memory bandwidth starts playing a more significant role which is why the Xeon Phi also has 16GB of high bandwidth memory with more than 400GB/s. Does the Power9 system use a similar memory technology?

[1] https://en.wikipedia.org/wiki/Xeon_Phi#Knights_Mill

dragontamer · on July 25, 2018

Power9 is cool, but its really not comparable. Power9 has a huge L3 cache, 10MB every 2 cores. So a 22-core machine would have 110 MB of L3 cache.

In contrast, Xeon Phi is connected to HBM2 / Stacked Memory. So while it doesn't have as large of an L3 cache, its main-memory is significantly faster.

Finally, you can stick a Xeon Phi into each PCIe x16 slot of your servers. Really, Xeon Phi is a competitor to NVidia's Tesla V100, if anything.

shaklee3 · on July 25, 2018

I'd be curious to see the standard memory bandwidth tests that anandtech does. I can't find anything about that online. They claim to have higher bw than Intel and AMD, but that will depend on how many threads are running.

https://www.cs.virginia.edu/stream/

I'm fairly certain it won't perform very well as-written unless the compiler is mature and translates to simd.

Btw, we met at the dpdk conference.

ksec · on July 25, 2018

Do you know if SuperMicro are going to have any POWER9 systems? ( I think they had announcement but there are no information on their webpages )

Given the insane amount of bandwidth, I wonder if Netflix will use this for their Open Appliance.

If we are sticking to x86 it looks like we have to wait till Zen3 from AMD to get something equivalent, that is 2019 or 2020.

dragontamer · on July 25, 2018

Wait, what's wrong with EPYC?

EPYC 7601P is 32-cores for roughly $2200 or so. And unlike Power9, the motherboard is "only" $550 or so (Power9's cheapest motherboard is like $1000 from Raptor Talos II Lite).

From what I've seen, EPYC roughly has the same Monero-mining speeds as Power9. All other benchmarks are in favor of EPYC, as x86 optimized code (Z-lib, H264, Apache, Postgres... etc. etc.) doesn't perform as well on Power9.

Power9 scales higher, but that gets cost-prohibitive. I think the typical person will only buy computers that cost less than $15,000 (Dual EPYC 7601 / SuperMicro system / 256GB of RAM)

rbanffy · on July 25, 2018

There's nothing wrong with it, as much as there is nothing wrong with plain Xeon. My thing with them is that they are boring - they look too much like the computer I use to work. In order to be inspiring, you don't need only fast. You need different. POWER is different in the sense it has both the fast cores and threads of a Xeon, but coupled to a different ISA and things like NVLink. Phi is different because of the integrated HBM and the crazy number of threads. Both invite us to build applications differently and that's what makes them exciting.

CyberDildonics · on July 25, 2018

Each Phi core can do two 512 bit SIMD operations per cycle.

seanmcdirmid · on July 25, 2018

For deep learning applications, not general HPC (at least, from what I understand).

Is anyone actually using these and finding them better than GPUs?

desertrider12 · on July 25, 2018

NERSC's Cori machine is the biggest machine of Knight's Landing (10,000 nodes), 10th on TOP500. Don't know of any planned Knight's Mill machines.

Generally, Xeon Phis get fewer FLOPs than GPUs but are better at certain tasks, e.g. graph algorithms with lots of divergent control flow. Also it's possible to run pure MPI programs on them and get decent performance, and there's a lot of pure-MPI HPC software out there.

wickberg · on July 25, 2018

In terms of KNL deployments, LANL's Trinity system (#9 on June 2018) is slightly larger. Trinity has 9,984 KNL nodes, vs 9,688 in Cori.

desertrider12 · on July 25, 2018

Yep, I missed that one.

rbanffy · on July 25, 2018

Both have AVX512 with, AFAIK, similar performance. The Knights Mill ones may have some extra extensions for lower precision floating point vector ops. The newer ones also have much better IO capabilities and virtualization.

martinpw · on July 25, 2018

Knights Mill has only half the double precision performance of KNL. So not really a successor, more heading off in a different direction.

fer · on July 25, 2018

I hate how the "Deep Learning optimized" actually means "you don't really need all that precision in the activation function of your neurons anyway".

jarym · on July 25, 2018

Isn't that what all optimisation is about? Making trade-offs of one kind or another - usually ditching stuff you can do without for stuff you'd like more of.

rbanffy · on July 25, 2018

I'd be perfectly fine with analog operations all the way to the final readout. Neurons are analog after all.

eksu · on July 25, 2018

Intel has GHC (Haskell) running on these in-house if I recall correctly.

wyldfire · on July 25, 2018

I think Phi is targeted at workloads like scientific fortran applications that no one could be bothered to redesign to take advantage of OCL/etc style programming models. They're 'better' than GPUs in that you can still use x86 and you're not stuck between the current choices: abusive-market-leader and low-quality-deliverables.

jcranmer · on July 25, 2018

Intel is building out a exascale supercomputer for 2021 based on ... something. The architecture is under very tight wraps, so it's all up to speculation, which generally seems to lean towards GPGPU (a more traditional one than Xeon Phi at least) or some sort of FPGA.

So something is likely to come out to replace the Xeon Phis, we just don't know what it is yet.

rbanffy · on July 25, 2018

I would bet they are using something similar to the Xeon Phi, perhaps with more external memory channels and larger per-core caches. Maybe some PTX-like matrix operations and wider vector units.

I'm sure Intel is doing a ton of homework investigating what is actually being used and what are the bottlenecks in current Intel users in HPC.

Jweb_Guru · on July 25, 2018

That's really sad, I always wanted to get my hands on some of these. For certain tasks (that work poorly on GPUs in the same cost ballpark, particularly ones that require large amounts of local memory bandwidth) they had rather excellent price / performance ratios. But I guess that didn't coincide with what people wanted.

shaklee3 · on July 25, 2018

The tools killed it. By all accounts it was terrible to program for. It wasn't as simple as writing a standard x86 piece of software.

Jweb_Guru · on July 25, 2018

That may be (though I think that was more the older models), but I think there's currently basically nothing else on the market that hits the sweet spot it was aiming for. If you want similar or better performance with that much bandwidth and concurrency you are looking at GPUs priced an order of magnitude higher. And if I had been working with them I would certainly have written custom stuff just for the Phi anyway--there's not much point in working with a machine like that if you just want to run the same old stuff, and taking advantage of the resources properly would be difficult without architecting around its design anyway. So while I'd be happy to hypothetically look into some other hardware that was priced and performed similarly to KNL, but had better tooling, I don't think such a thing actually exists.

nalllar · on July 25, 2018

Only the 72xxF processors are being discontinued, not the PCIE cards.

These are compatible with normal x86 code.

See https://www.servethehome.com/intel-xeon-phi-x200-knights-lan...

TheCondor · on July 25, 2018

Well, for good performance, no it’s not like your desktop. The first generation or two were more like packaging experiments, like a toolkit for developing specialized silicon parts. Like things you’d sort of prototype on a phi and then build custom silicon for and if you used intels custom shop, maybe your firmware would be done. Or something like that. That is my guess, I think if I was intel, I’d be thinking of ways to corner the custom accelerator market as that’s where it looks like things are going. Phi lacked fpga type stuff though.

I think the deep learning stuff really took over though and people wanted phis for other things that it just doesn’t fit. It’s an even more odd looking beast through those lenses.

shaklee3 · on July 25, 2018

Intel's problem has always been the libraries in the hpc world. Nvidia understood this early on, and they dumped an ungodly amount of money into getting people onto their platform by investing in libraries and tools. Intel thought that by just releasing a card with a bunch of Xeon cores and not a whole lot of accompanying Library support would do the trick.

Their mkl library is a good example of that. There are a bunch of different ways/libraries to accomplish the same task, and it's not always clear which to use. Further, the documentation is basically just an API printout with very few examples. You can't rely on stackoverflow to document things for you.

This isn't unique to Intel either. I'm looking at you AMD. It wasn't until recently that rocm started picking up steam and amd realizing is important -- long after their Enterprise GPUs were out.

dragontamer · on July 25, 2018

> Intel thought that by just releasing a card with a bunch of Xeon cores

Atom cores, actually.

> This isn't unique to Intel either. I'm looking at you AMD. It wasn't until recently that rocm started picking up steam and amd realizing is important -- long after their Enterprise GPUs were out.

Between 2010 and 2016, AMD reverse-mortgaged their headquarters and fired roughly 20% of their staff. The highly successful "Small Cat" line of CPUs was cancelled (basis for PS4 / XBox One / and Laptops). There was a significant chance they'd go bankrupt.

AMD finally makes money these days, which is probably why ROCm is able to get the proper investments that it deserves. But yes, its way late. But their company definitely couldn't afford to compete against NVidia before Ryzen.

shaklee3 · on July 25, 2018

It's a modified Atom core, and is closer to the Xeon cores. It was the first to have the AVX-512 instruction set, followed by the Xeon.

agumonkey · on July 25, 2018

reminds me of pre ps4 design, Sony was fond of overengineering with mild tooling, now they're on 90% standard hardware

biztos · on July 25, 2018

Nice to see Yolo County get some love.

https://en.wikipedia.org/wiki/Knights_Landing,_California

biztos · on July 25, 2018

I didn't expect upvotes for this but I'm surprised by the downvotes.

Anybody care to tell me what's wrong with commenting on the name of the processor series when that name is in the headline?

(And Knights Landing is a lovely little place which most people who aren't from Northern California may never have heard of. Worth a visit, especially if you fish.)

adwn · on July 25, 2018

Because it's off-topic. Same as commenting on Java the island on an article about the JVM.

(I didn't downvote you, though)