I have seen far too many people making comments on MacPro "Pro" Chip.
A hypothetical 32 Core CPU and 64 Core GPU is about the max Apple could make in terms of die size reticle limit without going chiplet and TDP limit without some exotic cooling solution. Which means you cant have some imaginary 64 Core CPU and 128 Core GPU.
We can now finally have a Mac Cube, where the vast majority of the Cube will simply be a heat sink and it will still be faster than current Intel Mac Pro. I think this chip makes most sense with the A16 design on 4nm next year [1].
Interesting question would be memory, current Mac Pro support 1.5TB. A 16 Channel DDR5 would be required to feed the GPU, but that also means minimum memory on Mac Pro would be 128GB at 8GB per DIMM. One could use HBM on package, but that doesn't provide ECC Memory protection.
[1] TSMC 3nm has been postponed in case anyone not paying attention, so the whole roadmap has been shifted by a year. If this still doesn't keep those repeating Moore's law has not died yet I dont know what will.
If someone had said in 2018 "Apple is going to release a macbook pro with an ARM chip that will be faster at running x86 applications than an x86 chip while having a 20 hour batter life", then a lot of people would have responded with a lot of very clever sounding reasons why this couldn't possibly work and what the limitations would be on a hypothetical chip and how a hypothetical chip that they can imagine would behave.
I think the evidence is that Apple's chip designers don't care about anyone's preconceived and very logical sounding ideas about what can and can't be done.
So what we know is that there is going to be a Mac Pro, and that whatever it is is going to absolutely destroy the equivalent (i.e. absolute top of the range) x86 PC.
If you want to be right, take that as a fact, and work backwards from there. Any argument that starts with "I don't think it can be done" is a losing argument.
I'm so old I remember the brief time when Alpha ran x86 code faster than x86 so history is repeating. I am eagerly awaiting M1 Plaid vs. Sapphire Rapids vs. Genoa next year.
"If an elderly but distinguished scientist says that something is possible, he is almost certainly right; but if he says that it is impossible, he is very probably wrong." — Arthur C. Clarke
> Any argument that starts with "I don't think it can be done" is a losing argument.
But plenty of interesting opinions start that way -- interesting because someone with expertise thinks so, and/or because the opinion is followed by well-reasoned arguments.
OTOH, "it can't be done" is often based on current assumptions about hard limits. Those assumptions have been broken routinely and consistently for hundreds of years now.
It's not hard to imagine that "Jade 2C-Die" mean two compute die chiplets and "Jade 4C-Die" means four chiplets which makes 40 CPU cores and 128 GPU cores about the same size as Sapphire Rapids. It could have 256 GB RAM on a 2048-bit bus with 1,600 GB/s of bandwidth.
Chiplet isn't a silver bullet and without compromise. You will then need to figure the memory configuration and NUMA access. Basically the Jade 2C-Die on chiplet analogy doesn't make sense unless you make some significant changes to Jade-C.
Jade 4C-Die on a single die only make sense up to 40 Core ( 4x Current CPU Core or 32 HP Core and 8 HE Core ), unless there are some major cache rework there aren't any space for 128 GPU Core.
But then we also know Apple has no problem with only 2 HE Core on a MacBook Pro Laptop, why would they want to put 8 HE Core on a Desktop?
> Interesting question would be memory, current Mac Pro support 1.5TB
I wonder whether it might have on-chip RAM and slotted RAM?
Not sure what the implications to software and performance would be there, but the whole point of the new Mac Pro was an admission that the customers in that market need components they can chop and change as required. They obviously still care about that market, otherwise they would have just discontinued the Mac Pro and left it at that (instead they discontinued the iMac Pro—the intended successor to the Mac Pro—because a modular desktop is a better fit).
It would be super weird to do a total 180 on that now, especially considering they went to the effort of building things like the Afterburner cards. The move to ARM was surely in motion before work started on the latest Mac Pro design so I'm really interested to see where things go here. "Modular tower" and "tightly integrated SoC" seem completely at odds with each other.
The most elegant solution would be to have a motherboard with cartridges. You're buying as many CPU/GPU/AI/RAM/SSD modules as you like and getting anything you want. Wanna crazy build farm? Put plenty of CPUs and some RAM. Wanna GPU farm? You've got it. Want terabytes of RAM? No problem.
Not sure if it's possible to implement. But I'd love to see it done this way.
> about the max Apple could make in terms of die size reticle limit without going chiplet
I probably have no idea what I'm talking about as I'm a software guy. But — I remember that company that made that crazy ML accelerator where an entire silicon wafer is one chip. How'd they do that? Why can't Apple and others do the same/similar?
Wafer Scale Engine doesn't break the reticle limit, if you search for a picture on Cerebras, you can still clearly see the square die within the wafer. From a very high level overview, they are basically doing cross interconnect with each die on wafer. Scribe line used to be physical separation for each die, now they built interconnect on top so the whole wafer becomes a huge mesh of AI chip.
It works for because you can think of each die as lots of Neutral Engine and cache only. You can easily rework around each process defects and solve the yield problem. And then you have to solve packaging, power and cooling.
>Why can't Apple and others do the same/similar?
When you have a complex SoC. Not every defect can be worked around, that is why the larger the die the lower the yield. A reason why you see in GPU they fused off certain GPU core and Memory Controller. It isn't because of market segmentation, it is necessity coming from manufacturing. To avoid this they have chiplet. Making each part a small die and interconnect them together. But as mentioned, chiplet isn't a silver bullet. Your design must have chiplet in mind for interconnect with AMD Zen. You cant just dump two pieces of M1 Max together on the same interposer and expect it to work. They could, but it wouldn't be efficient. And that sort of defeat the purpose. An analogy would be asking the current AMD Ryzen APU, which is an SoC with GPU and Memory controller, you stick two of them together and somehow expect it to work 100% faster.
I'd bet it's the plumbing. Getting data in and out, supplying something around 10-100 kiloamps, attaching it to anything without it ripping itself off via thermal expansion, getting the heat out - all of that sounds miserably difficult.
Like sure you can do some outrageous and expensive things to make it all work in small quantities and for huge sums of money. But you can't build a mass-market laptop that way.
I'd guess that the upper limit with present-day and near-future packaging technology for commodity hardware is ~1000 mm^2.
I dont follow HBM closely so I am guessing it could be an On Die ECC correction like DDR5 instead of Full ECC in both DIMM and Memory Controller. But I could be wrong.
A hypothetical 32 Core CPU and 64 Core GPU is about the max Apple could make in terms of die size reticle limit without going chiplet and TDP limit without some exotic cooling solution. Which means you cant have some imaginary 64 Core CPU and 128 Core GPU.
We can now finally have a Mac Cube, where the vast majority of the Cube will simply be a heat sink and it will still be faster than current Intel Mac Pro. I think this chip makes most sense with the A16 design on 4nm next year [1].
Interesting question would be memory, current Mac Pro support 1.5TB. A 16 Channel DDR5 would be required to feed the GPU, but that also means minimum memory on Mac Pro would be 128GB at 8GB per DIMM. One could use HBM on package, but that doesn't provide ECC Memory protection.
[1] TSMC 3nm has been postponed in case anyone not paying attention, so the whole roadmap has been shifted by a year. If this still doesn't keep those repeating Moore's law has not died yet I dont know what will.