AVX-512 is in Alder Lake P-cores but not E-coree. AVX-512 is disabled in Alder L...

paulmd · on Nov 28, 2022

Early Alder Lake could but they fused it off in the later ones and newer microcode also blocks it on the older processors.

Raptor Lake is basically "v-cache alder lake" (the major change is almost doubling cache size) so it's unsurprising they still don't have an answer there, and if they did it could be backported into Alder Lake, but they don't seem to have an immediate fix for AVX-512 in either generation.

Nobody really knows why, or what's up with Intel's AVX-512 strategy in general. I have not heard a fully conclusive/convincing answer in general.

The most convincing idea to me is Intel didn't want to deal with heterogeneous ISA between cores, maybe they are worried about the long-term tail of support that a mixed generation will entail.

Long term I think they will move to heterogeneous cores/architectures but with homogeneous ISA. The little cores will have it too, emulated if necessary, and that will solve all the CPUID-style "how many cores should I launch and what type" problems. That still permits big.little architecture mixtures, but fixes the wonky stuff with having different ISAs between cores.

there are probably some additional constraints like Samsung or whomever bumped into with their heterogenous-ISA architectures... like cache line size probably should not vary between big/little cores either. That will pin a few architectural features together, if they (or their impacts) are observable outside the "black box".

StillBored · on Nov 28, 2022

Well, intel seems to sorta be proving a point I had a long time about arm's big little setup. Which is that its only needed because their big cores wern't sufficiently advanced to be able to scale their power utilization efficiently. If you look at the intel power/perf curves, it the "Efficient" cores are anything but. Lots of people have noticed this, and pointed out its probably not "power Efficient" but rather "die space efficient under constrained power" because they have fallen behind in the density race, and their big cores are quite large.

But i'm not even so sure about that, avx-512 is probably part of the size problem with the cores. We shall see, your probably right that hetrogenious might be here to stay, but I suspect a better use of the space long term is even more capable GPU cores offloading the work that might be done on the little cores in the machine. AKA, you get a number of huge/fast/hot cores for all the general purpose "CPU" workloads, and then offload everything that is trivially parallelized to a GPU that is more closely bound to the cores and shares cache/interconnect.

paulmd · on Dec 1, 2022

it's funny because this is sort of addressing a point I made elsewhere in the comments.

https://news.ycombinator.com/item?id=33778016

Like, serious/honest question, how do you see Gracemont as space-efficient here? It's half the size of a full Zen3 core yet probably at-best produces the same perf-per-area, and uses 1.5x the transistors of Blizzard for similar performance (almost 3x the size, bearing in mind 5nm vs 7nm). that's not really super small, it's just that Intel's P-cores are truly massive, like wow that is a big core even before the cache comes in.

For years I thought it would be cool to see an all-out "what if we forget area and just build a really fast wide core" and that's what Intel did. And actually, for as much as people say Apple is using a huge "spare no expenses" core, it's not really all that big even considering the area - you get around 1.5-1.6x area scaling between 5nm and 7nm as demonstrated by both Apple cores and NVIDIA GPUs, and probably close to AMD's numbers as well. So just looking at it at a transistor level, Apple is using 2.55 x 1.6 = 4.08mm2 equivalent of silicon and Intel is using 5.55mm2, so Apple is only using 75% of the transistors of Intel's golden cove p-core...

But in the e-core space, it's become a meme that Gracemont is "size efficient rather than power efficient" and I'm just not sure what that means in practical terms. Usually high-density libraries are low-power, so those two things normally go together... and it's certainly not like they're achieving unprecedented perf-per-area, they're probably no better than Zen3 in that respect. Where is the space efficiency in this situation if it's not libraries or topline perf-per-area?

zeusk · on Nov 28, 2022

Raptor lake is not basically v-cache alder lake.

And Intel is still deciding on future of AVX512, internally there is already a replacement that works with atom cores (which are size and power bound).

pantalaimon · on Nov 28, 2022

> AVX-512 is disabled in Alder Lake CPUs because the world is not ready for heterogenous ISA extensions

It was already opt-in (disabled unless you also disable efficiency cores), that is no justification to make it impossible to use for people who want to try it out.

But I suppose Intel just doesn't want people to write software using those new instructions.

fomine3 · on Nov 29, 2022

If Intel allow to enable AVX-512, they need to validate functionality on every chip. Some chips may dropped (or reused as i3) due to this. There's not much reason to do so for AVX-512 that only enthusiasts enable.