The problem is that performance achievements on AMD consumer-grade GPUs (RX7900XTX) are not representative/transferrable to the Datacenter grade GPUs (MI300X). Consumer GPUs are based on RDNA architecture, while datacenter GPUs are based on the CDNA architecture, and only sometime in ~2026 AMD is expected to release unifying UDNA architecture [1]. At CentML we are currently working on integrating AMD CDNA and HIP support into our Hidet deep learning compiler [2], which will also power inference workloads for all Nvidia GPUs, AMD GPUs, Google TPU and AWS Inf2 chips on our platform [3]
The problem is that the specs of AMD consumer-grade GPUs do not translate to computer performance when you try and chain more than one together.
I have 7 NVidia 4090s under my desk happily chugging along on week long training runs. I once managed to get a Radeon VII to run for six hours without shitting itself.
The Radeon VII is special compared to most older (and current) affordable GPUs in that it used HBM giving it memory bandwidth comparable to modern cards ~1TB/s and has reasonable FP64 (1:4) throughput instead of (1:64). So this card can still be pretty interesting for running memory bandwidth intensive FP64 workloads. Anything affordable afterward by either AMD or Nvidia crippled realistic FP64 throughput to below what a AVX-512 many-core CPU can do.
On the other hand, for double precision a Radeon Pro VII is many times faster than a RTX 4090 (due to 1:2 vs. 1:64 FP64:FP32 ratio).
Moreover, for workloads limited by the memory bandwidth, a Radeon Pro VII and a RTX 4090 will have about the same speed, regardless what kind of computations are performed. It is said that speed limitation by memory bandwidth happens frequently for ML/AI inferencing.
Even the single precision given by the previous poster is seldom used for inference or training.
Because the previous poster had mentioned only single precision, where RTX 4090 is better, I had to complete the data with double precision, where RTX 4090 is worse, and memory bandwidth where RTX 4090 is the same, otherwise people may believe that progress in GPUs over 5 years has been much greater than it really is.
Moreover, memory bandwidth is very relevant for inference, much more relevant than FP32 throughput.
You might find the journey of Tinycorp's Tinybox interesting, it's a machine with 6 to 8 4090 GPUs and you should be able to track down a lot of their hardware choices including pictures on their Twitter and other info on George his livestreams.
EPYC + Supermicro + C-Payne retimers/cabling. 208-240V power typically mandatory for the most affordable power supplies (chain a server/crypto PSU for the GPUs from ParallelMiner to an ATX PSU for general use).
How do you manage heat? I'm looking at a hashcat build with a few 5090, and water cooling seems to be the sensible solution if we scale beyond two cards.
The ASRock Rack ROMED8-2T has seven PCIe x16 slots. They're too close together to directly put seven 4090s on the board, but you'd just need some riser cables to mount the cards on a frame.
It looks like AMD's CDNA gpu's are supported by Mesa, which ought to suffice for Vulkan Compute and SYCL support. So there should be ways to run ML workloads on the hardware without going through HIP/ROCm.
I strongly disagree with your comment. First of all, ALL kids must survive, not most. Safety and safety of knowing are two different things. Lastly, this tech looks a lot less intrusive than the current another watch that everyone is getting for their kids — this one appears to have more activity-engaging features.
I don't think anyone would really want to live in a world where we've done what's necessary such that literally all kids survive the various accidents and perils they might face out in the world. Such a world would be sanitized into oblivion.
This is just a variation on the "security vs. freedom" stuff. You can have perfect security if you don't allow for any freedom. But hopefully we can agree that a world with no freedom isn't one we want to live in.
But sure, let's step back from the extreme that you introduced. Are the downsides of pervasive 24/7 tracking and surveillance worth the (possible and as-yet unproven) increase in good outcomes? I can see that many people here seem to think it is, but I don't agree.
For what it's worth, the last major earthquake here killed 63 people, the significant majority of which were on various freeways and bridges that collapsed. All infrastructure has undergone significant retrofitting since then.
[1] https://www.jonpeddie.com/news/amd-to-integrate-cdna-and-rdn.... [2] https://centml.ai/hidet/ [3] https://centml.ai/platform/