Will be interested to see how this first(ish) gen of Intel's disaggregated chips pan out. I've been needing to replace my laptop and these seem like they have the potential to be extremely nice for a mid range machine with long battery life. The new scheduler hierarchy is especially interesting given how much of the physical chip they can avoid powering on at all for most simple tasks. For a lot of light use cases the entire "real" CPU and GPU parts of the silicon can be completely dark since the SOC has two tiny cores to run things and other necessary parts things like the video decode silicon were separated from the GPU.
Eh, I have a sneaking suspicion the compute dies won't be shut down as much as you'd think, and that there will be some extra power usage from crossing the dies like desktop Ryzen parts (though hopefully not nearly as severe).
A good Process Lasso config is probably worth the time investment. Instead of "trusting" the scheduler, you could force everything non time sensitive onto the efficiency island, maybe by default.
The 3D Foveros packaging technology is critical as it allows some path lengths to be much shorter than if you had to traverse that same path but only in the horizontal 2D plane.
Very excited to see how this plays out in practice.
I thought Meteor Lake was tiled and 2D? Intel has EMIB and such for very good bridges, but they are still bridges.
If it is 3D stacked with TSVs, thats a whole other can of worms. AMD's X3D on Ryzen 7000 creates heat/clockspeed issues, and they reportedly canceled a 3D variant of the 7900 GPUs due to similar issues.
I'm not sure I follow. It's almost guaranteed that all chiplets are still on some global system bus just like they were on their monolithic dies. Unless Intel has taken sudden great strides with their SoC security architecture, there are likely still all the old problems (plus a bunch of new fun ones!). Taking it off die is a response to the physical scaling issue, it's really not meant as a security enhancement.
> [Meteor Lake] introduces the Intel Silicon Security Engine (ISSE), a dedicated component focused solely on securing things at a silicon level ... The Converged Security and Manageability Engine (CSME) has also been partitioned to further enhance platform security.
> The “South” IO fabric is ordered, but non-coherent and PCIe-based. It is home to Wi-Fi and Bluetooth, PCI Express connections, Sensing, USB 3/2, Ethernet, the Power Management Controller (PMC), and Security controllers.
Does "Sensing" refer to human presence based on camera and radio (Wi-Fi, UWB) imaging?
Intel Visual Sensing Controller (IVSC), codenamed "Clover Falls", is a companion chip designed to provide secure and low power vision capability to IA platforms. The primary use case of IVSC is to bring in context awareness. IVSC interfaces directly with the platform main camera sensor via a CSI-2 link and processes the image data with the embedded AI engine. The detected events are sent over I2C to ISH (Intel Sensor Hub) for additional data fusion from multiple sensors.
The company didn't detail how it goes about this, but technologies already exist to combine visual input from the PC's cameras; radio from the PC's antennas, audio from its mic array; to form a picture of its surroundings.
With an initial focus on respiration detection, we hope to extend the technology to detect other physical activities as well. Intel Labs will demonstrate an early prototype of breathing detection ... The solution detects the rhythmic change in CSI due to chest movement during breathing ... The respiration rates gathered by this technology could play an important role in stress detection and other wellness applications.
Interesting to see how efficient these are for office/coding (e.g. typing into vscode) tasks. Will the cpu tile be off most of the time or will it take some years before applications and OS are tuned to avoid cpu tile wakeups.
Also how good will the p-cores be compared to previous gen?
Are the avx-10 instructions going into this generation?
Per the AVX10.1 Instruction Set Reference, p. 1-2 (355989-001US rev 1.0)[0], AVX10 support will begin with 6th gen Xeon processors (based on Granite Rapids), which are due next year. Client support is not called out, so your guess of Lunar Lake (late 2024-early 2025) is probably a good guess.
No, the instruction set supported by Lunar Lake has been published by Intel a few months ago and it is almost the same as that of Arrow Lake S, i.e. without AVX10 or AVX-512 support (Arrow Lake S supports more instructions than Arrow Lake, e.g. it has SHA-512 secure hash instructions).
It is likely that Panther Lake, to be launched in 2025, probably in the second half of the year, will be the first Intel CPU supporting the 256-bit subset of the AVX10.2 ISA version.
The 2024 Granite Rapids will probably be the only Intel CPU supporting the AVX10.1 ISA version (with full 512-bit support), because all the following will start from AVX10.2.
AVX10.1 is just Sapphire Rapids' AVX-512 renamed, so arguably SPR (and early Alder Lake) already support AVX10.1, just without declaring the relevant CPUID bits.
Client support depends on E-cores supporting it, and Intel have specified that they'll start with AVX10.2. I don't believe that any core has been announced with AVX10.2 support yet, and we the latest we know is Lunar Lake's ISA support.
Granite Rapids has a few extra AVX-512 instructions (including those added by Tiger Lake, but omitted in Sapphire Rapids and Alder Lake), so Sapphire Rapids does not support all of AVX10.1. Therefore neither Sapphire Rapids nor Emerald Rapids may turn on the AVX10 CPUID bits.
Nevertheless, the differences between the AVX-512 instruction sets of Granite Rapids and Sapphire Rapids are small and of little importance.
I couldn't find any new AVX* instructions added to Granite Rapids (I see PREFETCHI and some AMX additions, neither of which fall under the AVX category), and VP2INTERSECT isn't listed under AVX10 or Granite Rapids.
I still wonder how Apple was able to achieve such an incredible performance per watt ratio compared to Intel and AMD. Anybody knows how they let Apple do it?
1. Arm is generally more efficient than x86.
2. Apple uses TSMC's latest nodes before anyone else.
3. Apple doesn't chase peak performance like AMD and Intel. CPU speed and power consumption is not linear. Intel has been chasing 5GHZ+ speeds the last few years which consumes considerably more power. Apple keeps their CPUs under 3.5GHZ.
This is not entirely true in general sense. Yes, a typical ARM CPU is more energy efficient indeed, but theoretically nothing prevents x86 to be nearly as efficient.
The main reason why Apple silicon is more efficient is that Apple silicon is a mobile chip basically, and competition on mobile is harsh, so all the producers had to optimize their chips a lot for energy efficiency.
On the other hand until apple silicon and recent AMD ascension there was a monopoly of Intel on a laptop market with no incentive to do something. Just look at how fast Intel developed asymmetric Arm-like P/N-core architecture right after Apple Silicon emerged. Let's hope this new competitor will force more energy efficient x86 chips to be produced by intel and amd eventually.
> This is not entirely true in general sense. Yes, a typical ARM CPU is more energy efficient indeed, but theoretically nothing prevents x86 to be nearly as efficient.
The very complex instruction set does. You can easily throw multiple decoders at Arm code, but x86 scales badly due to the variable length. Current cores need predecoders to find instruction boundaries which is just not needed with fixed width instructions and even then can only decode simpler instructions with the higher numbered decoders.
> With the op cache disabled via an undocumented MSR, we found that Zen 2’s fetch and decode path consumes around 4-10% more core power, or 0.5-6% more package power than the op cache path. In practice, the decoders will consume an even lower fraction of core or package power.
which is funny because people are always like "uh why do i need to understand asymptotics when machines are so fast". well the answer is the asymptotics catch up to you when the speed of light isn't infinite or when you're timing things down to the nanosecond.
Arm is practically as complex as x86... It supports multiple varieties (e.g. v7, thumb, thumb2, jazelle, v8, etc), lots of historical mistakes, absurdly complex instructions even in the core set (ltm/stm), and a legacy that is almost as long as the x86. It even has variable length instructions too...
Only jazelle and thumb v1 are dropped from most v8 non-ulp cores, and then only half dropped: they still consume decoding resources (e.g. jazelle mode is actually supported and the processor will parse jvm opcodes, just all of them will interrupt). We are stuck with the rest as much as intel is stuck with the 8087: It is about time they could do some culling, but not without backlash.
I'm not sure this holds. X64 decodes instructions (which is awkward) and stores the result in a cache, then interprets the opcodes from that cache. So the decoding cost only happens on a cache miss, and a cache miss on a deeply pipelined CPU is roughly game over for performance anyway.
One big thing is that Apple has (almost) bought out TSMC's N3 node, so they're the only one with chips made on the most advanced manufacturing process available.
It's difficult to compare because honestly most reviewers just suck at making meaningful comparisons.
You can't compare a chip running at 3ghz with one running at 5ghz. It just doesn'tell you anything useful about the architecture, only what the company configuring the chip thought mattered.
Being "only" 30% faster but using twice the power at 5ghz, for example, is entirely expected. Chances are the M1 couldnt even run that fast, or it would end up using just as much power if it did.
Intel would squash an internal project like that, or drown it in politics. You could sit here all day with examples of "why did big company let little company become successful"
Little-ish? PA semi was only 150 people and acquired for < $300 million back in 2008. Intel's market cap was 150 billion back then. Impossible to say how PA semi would have fared, but as a division, it's still way smaller.
Most reviewers base it on Cinebench which is a poor indication of CPU performance for anything except Cinemark. Cinebench uses Intel Embree Engine which is hand optimized for x86. In addition, Cinebench favors CPUs with many slow cores - which is not how most software will perform. This is why AMD heavily marketed Cinebench for Zen1 launch and why Intel heavily markets it now for Alder Lake/Raptor Lake. In fact, Intel's little cores are basically designed to win at Cinebench.
Furthermore, AMD CPUs will rate at 25w but can easily boost up to 40w+ watts. It's up to the laptop maker.
Well, in purely military terms, technically Intel and AMD are only a few miles from Apple and their engineering corps is likely far larger. They could all march over there with broadswords if they really wanted to.
Completely off-topic, but: I think the state of the art in castle design (pre modern explosives anyway) was a star/bastion[1], since that allowed defenders to have overlapping firezones, especially useful once an attacker reaches the walls. With a circular design like Apple's HQ, as attackers get closer to the walls fewer and fewer defensive positions can see them until you can only see them from right above.
Clearly the move is to put all AMD and Intel engineers on the inside of the circle. That way they would be visible from all locations on the ring at all times.
Intel basically hit the clock speed limit and diverged to multiple cores. However, they still make x86 based chips, not ARM. They owned an ARM license for a while and got rid of it. For whatever reason, Intel felt like putting all there money on x86 was their only option. For a while they were making Atom chips for mobile, but at some point that design was hobbled because Intel has always been about the 60%+ margins on server chips. You cannot sell the cheaper chips at the same margins. It's not that Intel couldn't technically figure stuff out, it's that they couldn't see past those 60% margins.
For a while Intel's process knowledge was supposed to be better, even if the design was less efficient, but that turned out to be a mirage around 10nm or so. Intel now without a process advantage is probably never going to regain it's monopoly, and so far hasn't really transformed itself to do anything other than build those high-margin chips.
Once upon a time, I wanted to use one of the chips from a company they bought in networking, but Intel's model is to make the chip and let other companies make a product to take it to market. Intel doesn't want to make a market, just sell into it. You can see that with their attempt at TV where they stopped when they didn't want to spend money on content. So the chip I was interested in didn't get much R&D or a product and it more or less disappeared, another wasted investment.
But your list of Intel codenames is fantastic. It’s a great resource. Lots of beautiful names in there. Still, I wonder why they haven’t gone with crater lake such a beautiful lake and right in their wheelhouse in terms of geography.
Intel is taking two pages from the Apple ARM book: smaller cores but bigger caches (for more performance and less power) and main memory on the chip (for more performance and less power).
Best guess is that someone in marketing thinks calling them Tiles will make Intel look better because people won't realize they're just following behind AMD in this respect.
AFAIK this will be the first chip where multiple processes are combined into one die, at least for consumer devices. AMD's chiplets use separate dies from multiple node processes all on one substrate so maybe they don't want to confuse it with that.
I just want a search that shows what I'm looking for when I've typed the first three characters of the search term (as, e.g. the windows start menu does now), but still shows that result when I type the 4th character before my brain processes the fact that the result is there (you know, since my responses aren't that fuckin fast) and all the results change up.
I choose to believe this behavior somehow drives ad revenue to Microsoft and is not incompetence. Otherwise, why would they throw away the previously working behavior?
> search bar that can finally find files properly like back in Windows 7
I don’t think that quality is ever coming back. No matter what, they’re going to be connecting to bing for the top results / ads, so you’ll always have a bunch of latency and will never get back to Win 7 levels of local only performance.
It’s sad and the AI, which is mostly useless based on my experience, is going to suck up even more CPU cycles and add even more latency.
For me, it takes 5 seconds for the start search to respond on first use. My 12th gen i5 with NVMe storage and Win 11 literally runs worse than my 4th gen i7 with a first gen SSD and Win 7.
Microsoft has usurped a decade of computing gains and spent them on ads and tracking. Don’t expect anything that benefits the user in the near future.
It's an embarrassment that sub-second feature-rich file search isn't built in to Windows.
Fortunately there's a truly excellent third-party utility that is probably the second thing I install on any new Windows install (after Chrome): https://www.voidtools.com/support/everything/
I think the Windows Shell Team (hey we got RAR support recently) just withered on the vine when the grand idea of a query able file system build on top off SQL Server in Post XP Windows called "Cairo" collided with the memory/CPU limitations of the time.
My desktop now has 24 cores (8P/16E) and now is the right time to rethink the OS.
Yep. I just bought the latest AMD hotness in laptop form. After giving the abomination known as Windows 11 a spin for a few days since it's installed by default, for the first time ever I'm running Linux as my daily driver and I couldn't be happier.
For me it just there to make sure my PC boots - get's regular patches for security issues and not corrupt my file storage and play sound/videos.
The bulk of time is spent inside a JetBrains IDE/Visual Studio Code/Chrome - my interaction with the OS is just launching programs and hitting the shutdown button and Windows 11 is just great at that.
I think it matters. W11 OOTB is a horrible ad and telemetry ridden mess. You have to spend ages ripping out/disabling that rubbish, and it can suddenly turn back on after an update.
For a work OS, I want sane OOTB defaults, or a "disable all of this" option in the initial setup. These settings should be respected and not overridden in an update.
Obviously I think it does. Operating systems and the windowing environment should help you be efficient and get the maximum use out of your machine. Windows no longer does this; it has now primarily become a vehicle to resell Microsoft services and nag you ad nauseam. Even ChromeOS has less ads nowadays.
"Everything" should be a standard on every Windows computer. I've found files that I thought completely lost to the ether, including actual Ethereum after I had lost my key deep in my file directories after an accidental drag and drop.
Every machine I get my hands on gets Search Everything and Terra Copy. I usually start new machines by installing some stuff through https://ninite.com because Windows still doesn't have a proper package manager.
Often I think some of that stuff is strategically made to be just good enough to discourage competition and so it never actually becomes good enough to be mainstream.
Look at how WinGet was launched with just enough effort to kill AppGet. It was a big announcement that was the equivalent of “avoid this space or we’ll crush you” and then what? Nothing innovative has happened since they killed the innovator (AppGet).
Instead the AI will be made part of the unkillable core "security" services and actually be used to find ways to reroute Windows telemetry around DNS blockers, autoconnect to all smart appliances in the house and teach the dog to report on your most intimate habits.
In win11 I am unable to even find apps (properly installed via signed msi) by typing it's full name.
Searching for setting screens is also a pain in the ass, especially if you use different language. MS recognizes only their own translation, not the most intuitive text, not English text ... you just have to know
Or you could use literally any other search program that works wonderfully, without the indexing process using an eyebrow raising amount of CPU? Including Microsoft's own shockingly fast file search in VSCode.
> feature rich version of Ryzen's IO die
The interconnect Intel is using is more expensive/sophisticated than AMD's (but less expensive than the TSVs for the X3D chips), so hopefully its pretty good in laptops?
AMD's IO die setup burns tons of idle power, which is why the laptop parts are still monolithic.
Well you've got Lunar and Arrow on the horizon. Allegedly Nova and Panther after that, but that's just Tech News speculation. There doesn't appear to be any sign that they're ditching the naming convention.
I'm more partial to the FPGAs. Sundance Mesa is a cool codename.
Last week Intel has confirmed that Panther Lake is the code name of their desktop CPUs that will be launched in 2025 and which will be made using the Intel 18A CMOS process.
Therefore this is no longer a speculation.
It can be assumed that these are the first CPUs that will implement the 256-bit subset of the AVX10.2 ISA, finally extending the coverage of AVX-512 to all Intel products, but with the restriction to 256-bit registers and operations.
Panther Lake will be preceded by Lunar Lake for low-power mobile devices, either in late 2024 or in early 2025.
Before that, there will be Arrow Lake S for desktops and Arrow Lake for laptops. Despite the single name, these 2 implement different instruction sets and they will be made with different manufacturing processes, Intel 20A for Arrow Lake and an undisclosed process for Arrow Lake S (which might be made at TSMC, according to rumors, because it needs bigger tiles than what would be possible in Intel 20A).
While Meteor Lake seems to be just a shrink of Raptor Lake, which will have a much better performance only due to the greater energy efficiency provided by the Intel 4 process and due to the much better GPU made at TSMC, Arrow Lake is expected to introduce a new improved CPU microarchitecture, which is supposed to compete successfully with Zen 5.
The article (or Intel) do not disclose up to how many cores that new architecture is designed for, and I am certain Intel would say something like "With our P-, E-, LE-cores designed architecture(tm) the core count does matter anymore".
Also the SOC with built-in AI engine. Oh boy, I wonder how long it will take for AI-assisted malware, or botnets to emerge. Exciting times!
They are using off the shelf cores that have to be good in everything from netbooks and industrial boxes to server workloads. Apple, meanwhile, is laser targeting high volume, premium, media heavy laptop-ish TDPs and workloads. And they can afford to burn a ton of money on die area, a bleeding edge low power process, and target modest clockspeeds like no one else can.
this is such a weak argument. just because it's not in a laptop does not mean that a CPU should be accepted as being a horrible waste of electricity. making datacenters as efficient as laptops would not be a bad thing. i'm sure people operating at the scale of AWS and other cloud providers would be beyond happy to see their power bills drop for no loss in performance. i'm guessing their stockholders would be pleased as well.
Datacenters are actually exactly as efficient as laptops.
They consume more only because they do not stay idle, like laptops.
The CPU cores in the biggest server CPUs consume only 2.5 W to 3 W per core at maximum load, which is similar or less than what an Apple core consumes.
The big Apple cores are able to do more work per clock cycle, while having similar clock frequencies and power consumption to the server cores, but that is due almost only to using a newer manufacturing process (otherwise they would do more work while consuming proportionally more power).
The ability of the Apple CPU cores to do more work per clock cycle than anything else is very useful in laptops and smartphones, but it would be undesirable in server CPUs.
Server CPUs can do more work per clock cycle by just adding more cores. Increasing the work done per clock cycle in a single core, after a certain threshold, increases the area more than the performance, which diminishes the number of cores that could be used in a server CPU, diminishing the total performance per socket.
It is likely that the big Apple cores are too big for a server CPU, even if they may be optimal for their intended purpose, so without the advantage of a superior manufacturing process they might be less appropriate for a server CPU than cores like Neoverse N2 or Neoverse V2.
Obviously, Apple could have designed a core optimized for servers, but they do not have any reason to do such a thing, which is why the Nuvia team has split from them, but they were not able to pursue their dream and then they went back to designing mobile CPUs at Qualcomm.
> i'm sure people operating at the scale of AWS and other cloud providers would be beyond happy to see their power bills drop for no loss in performance
- The datacenter CPUs are not as bad as you'd think, as they operate at a fairly low clock compared to the obscenely clocked desktop/laptop CPUs. Tons of their power is burnt on IO and stuff other than the cores.
- Hence operating more Apple-like "lower power" nodes instead of fewer higher clocked nodes comes with more overhead from each node, negating much of the power saving.
- But also, beyond that point... they do not care. They are maximizing TCO and node density, not power efficiency, in spite of what they may publicly say. This goes double for the datacenter GPUs, which operate in hilariously inefficient 600W power bands.
It's all tradeoffs. Desktop users are happy for 20% more performance at 2x power draw - and they get the fastest processors in existence (at single thread) as a result.
Data centres want whatever gets them the most compute per dollar spent - if a GPU costs 20k you bet they want it running at max power, but if it's a 1k CPU then suddenly efficiency is more important.