The $10k price is for an A100 with 40GB ram, so you need 8 of those. If you can get your hands on the 80GB variant, 4 are enough.
Also, if you want to have a machine with eight of these cards, it will need to be a pretty high-spec rack-mounted or large tower. To feed these GPUs, you will want to have a decent amount of PCIe-4 lanes, meaning EPYC are the logical choice. So that's $20k for an AMD EPYC server with at least 1.6kw PSUs etc etc.
You don't need a "decent amount" of PCIe-4 lanes. You just need 16 of them. And they can be PCIe 3.0 and will work just fine. Deep learning compute boxes predominantly use a PCIe switch. e.g. the ASUS 8000 box, which handles eight cards just fine. You only need a metric tonne of PCIe bandwidth if you are constantly shuttling data in and out of the GPU, e.g. in a game or exceedinyl large training sets of computer vision data. A little latency of a few hundred milliseconds moving data to your GPU in a training session that will take hours if not days to complete is neither here nor then. I suspect this model, with a little tweaking, will run just fine on an eight way RTX A5000 setup, or a five-way A6000 completely unhindered. That puts the price around $20,000 to $30,000. If I put two more A5000s in my machine, I suspect I could figure out how to get the model to load.
It also sounds like they haven't optimized their model, or done any split on it, but if they did, I suspect they could load it up and have it infer slower on fewer GPUs, by using main memory.
Which will work just fine with NVIDIA SWITCH and a decent GPU compute case from ASUS or IBM or even building your own out of an off-the-shelf PCIe switch and consumer motherboard.
Also, if you want to have a machine with eight of these cards, it will need to be a pretty high-spec rack-mounted or large tower. To feed these GPUs, you will want to have a decent amount of PCIe-4 lanes, meaning EPYC are the logical choice. So that's $20k for an AMD EPYC server with at least 1.6kw PSUs etc etc.