8xGPUs per box. this has been the data center standard for the last 8ish years.
furthermore usually NVLink connected within the box (SXM instead of PCIe cards, although the physical data link is still PCIe.)
this is important because the daughter board provides PCIe switches which usually connect NVMe drives, NICs and GPUs together such that within that subcomplex there isn't any PCIe oversubscription.
since last year for a lot of providers the standard is the GB200 I'd argue.
Fascinating! So each GPU is partnered with disk and NICs such that theres no oversubscription for bandwidth within its 'slice'? (idk what the word is) And each of these 8 slices wire up to NVLink back to the host?
Feels like theres some amount of (software) orchestration for making data sit on the right drives or traverse the right NICs, guess I never really thought about the complexity of this kind of scale.
I googled GB200, its cool that Nvidia sells you a unit rather than expecting you to DIY PC yourself.
usually it's 2-2-2 (2 GPUs, 2 NICs and 2 NVMe drivers on a PCIe complex). no NVLink here, this is just PCIe - under this PCIe switch chip there is full bandwidth, above it's usually limited BW. so for example going GPU-to-GPU over PCIe will walk
GPU -> PCIe switch -> PCIe switch (most likely the CPU, with limited bw) -> PCIe switch -> GPU
NVLink comes into the picture as a separate, 2nd link between the GPUs: if you need to do GPU-to-GPU, you can use NVLink.
you never needed to DIY your stuff, at least not for the last 10 years: most hardware vendors (Supermicro, Dell, ...) will sell you a complete system with 8 GPUs.
what's nice on GH200/GBx00/VR systems, is that you can use chip-to-chip NVLink between the CPU and GPU, so the CPU can access GPU memory coherently and vica versa.