Preface: If you have actually encountered applications that must be run on 8-socket systems because those are literally the only fit for the application... I would love to hear about those experiences. With the advent of Epyc Rome, most use cases for these 8-socket systems vanished instantly. It would be fascinating to hear about use cases that still exist. Your experiences are obviously different than mine.
If you need more than 8TB of RAM, with the right application design you can probably do better with fast Optane Persistent Memory or Optane SSDs, and an effective caching strategy. You can have many dozens of terabytes of Optane storage connected to a single system, and Optane is consistently low latency (though not as low latency as RAM, obviously).
If you need more compute power, you can generally do better with multiple linked machines. You can only scale an 8-socket system up to 8 sockets. You can link way more machines than that together to get more CPU performance than any 8 socket system could dream of.
----------
I didn't expect you to read and respond so quickly, so I had edited my previous comment before you submitted your reply.
This was a key quote added to my previous comment:
>> So you really have to be in a very obscure situation which can't fit onto a dual socket Rome server, but can fit within a machine less than 2x larger. (28 cores * 8 sockets is less than 2x larger than 64 cores * 2 sockets)
In response to your current comment,
> If the interconnect is your bottleneck, you spend money on the interconnect to make it faster. Basic engineering: you attack the bottleneck.
Exactly. Using a dual-socket Epyc Rome system would be more than half as powerful as the biggest 8-socket Intel systems, but it would reduce contention over the interconnect dramatically, which means that many applications that are simply wasting money on an 8-socket system would suddenly work better.
This also goes back to my comment about using accelerators instead of an 8-socket system.
The odds of encountering a situation that just happens to work well with Intel's ridiculously complicated 8-socket NUMA interconnect, but can't work well over a network, and can't work well on a system half the size and requires enormous amounts of RAM to keep the cores fed, the odds seem vanishingly small... and in that case, we still have to consider whether an accelerator (GPU, FPGA, or ASIC) could be used to make a solution that is a better fit for the application anyways, and if so, you'll save large amount of money that way as well.
So, to make buying an 8-socket system make sense, the application must require performance that is...
- less than twice a dual socket Epyc Rome system, but greater than one dual socket Epyc Rome system can handle
- not dependent on transferring huge amounts of data around the interconnect
- dependent on very low latency communication between NUMA nodes
- needs enormous memory bandwidth for each NUMA node
- needs huge amounts of RAM on each memory channel (so you can't just use HBM2 on a GPU to get massive amounts of bandwidth, for example)
- etc.
It's a niche within a niche within a niche.
As I said in an earlier comment, I probably should be more impressed instead of being so cynical about the usefulness of such a machine. They are engineering marvels... but in almost every case, you can save money with a different approach and get equal or better results.
That's why 8-socket server sales made up such a small percentage of the market, even before Epyc Rome came in and completely obliterated almost all of the very little value proposition that remained.
Don't forget that Rome also has a wildly nonuniform interconnect between the core complexes, and the system integrator gets much less control over it than Intel's UPI links. When you really need to end up with a very large single system image at the application layer, the bigger architecture works out to be much cheaper than 256gb DIMMs or HPC networking.
8-socket CLX nets you 1.75x the cores, and 3x as many memory channels vs. a 2-socket Rome system. It also scales to a single system image with 32 sockets if you use a fabric to connect smaller nodes:
I mean, I buy AMD Threadripper for my home use and experimentation. I'm pretty aware of the benefits of AMD's architecture.
But I also know that in-memory databases are a thing. Nothing I've touched personally needs an in-memory database, but its a real solution to a real problem. A niche for sure, but a niche that's pretty common actually.
Whenever I see these absurd 8-socket designs with 48TBs of RAM, I instinctively think "Oh yeah, for those in-memory database peeps". I never needed it personally, but its not that hard to imagine why 48TBs of RAM beats out any other architecture (including Optane or Flash).
> its not that hard to imagine why 48TBs of RAM beats out any other architecture (including Optane or Flash).
Agree to disagree.
In-memory databases are common yes, but it is pretty hard to imagine practical situations where an in-memory database can't handle a few nanoseconds of additional latency.
All else being equal, of course more RAM is nice to have. All else is not equal, though, so this is all highly theoretical.
If you need more than 8TB of RAM, with the right application design you can probably do better with fast Optane Persistent Memory or Optane SSDs, and an effective caching strategy. You can have many dozens of terabytes of Optane storage connected to a single system, and Optane is consistently low latency (though not as low latency as RAM, obviously).
If you need more compute power, you can generally do better with multiple linked machines. You can only scale an 8-socket system up to 8 sockets. You can link way more machines than that together to get more CPU performance than any 8 socket system could dream of.
----------
I didn't expect you to read and respond so quickly, so I had edited my previous comment before you submitted your reply.
This was a key quote added to my previous comment:
>> So you really have to be in a very obscure situation which can't fit onto a dual socket Rome server, but can fit within a machine less than 2x larger. (28 cores * 8 sockets is less than 2x larger than 64 cores * 2 sockets)
In response to your current comment,
> If the interconnect is your bottleneck, you spend money on the interconnect to make it faster. Basic engineering: you attack the bottleneck.
Exactly. Using a dual-socket Epyc Rome system would be more than half as powerful as the biggest 8-socket Intel systems, but it would reduce contention over the interconnect dramatically, which means that many applications that are simply wasting money on an 8-socket system would suddenly work better.
This also goes back to my comment about using accelerators instead of an 8-socket system.
The odds of encountering a situation that just happens to work well with Intel's ridiculously complicated 8-socket NUMA interconnect, but can't work well over a network, and can't work well on a system half the size and requires enormous amounts of RAM to keep the cores fed, the odds seem vanishingly small... and in that case, we still have to consider whether an accelerator (GPU, FPGA, or ASIC) could be used to make a solution that is a better fit for the application anyways, and if so, you'll save large amount of money that way as well.
So, to make buying an 8-socket system make sense, the application must require performance that is...
- less than twice a dual socket Epyc Rome system, but greater than one dual socket Epyc Rome system can handle
- not dependent on transferring huge amounts of data around the interconnect
- dependent on very low latency communication between NUMA nodes
- needs enormous memory bandwidth for each NUMA node
- needs huge amounts of RAM on each memory channel (so you can't just use HBM2 on a GPU to get massive amounts of bandwidth, for example)
- etc.
It's a niche within a niche within a niche.
As I said in an earlier comment, I probably should be more impressed instead of being so cynical about the usefulness of such a machine. They are engineering marvels... but in almost every case, you can save money with a different approach and get equal or better results.
That's why 8-socket server sales made up such a small percentage of the market, even before Epyc Rome came in and completely obliterated almost all of the very little value proposition that remained.