GPU-to-CPU interface >900GB/sec NVLink 4. What kind of interconnect is that ? Is...

robomartin · on April 12, 2021

Well, PCIe 6 x16 will do 128 GB/s. Of course, the real question is how many transactions per second you get. For the PCIe 6 16 lanes it's about 64 GT/s.

Speaking in general terms, data rate and transaction rate don't necessarily match because a transaction might require the transmitter to wait for the receiver to check packet integrity and then issue acknowledgement to the transmitter before a new packet can be sent.

Yet another case, again, speaking in general terms, would be the case of having to insert wait states to deal with memory access or other processor architecture issues.

Simple example, on the STM32 processor you cannot toggle I/O in software at anywhere close to the CPU clock rate due to architectural constraints (to include the instruction set). On a processor running at 48 MHz you can only do a max toggle rate of about 3 MHz (toggle rate = number of state transitions per second).

jabl · on April 13, 2021

> Speaking in general terms, data rate and transaction rate don't necessarily match because a transaction might require the transmitter to wait for the receiver to check packet integrity and then issue acknowledgement to the transmitter before a new packet can be sent.

PCIe has the optional "relaxed ordering" feature, allowing sending new packets before the ACK has been received from preceeding ones. Not sure precisely how this works, if there is some TCP-like window scaling algorithm in play or not..

rincebrain · on April 12, 2021

Well, according to [1], NVIDIA lists NVLink 3.0 as being 50 Gb/s per lane per direction, and lists the total maximum bandwidth of NVSwitch for Ampere (using NVLink 3.0) as 900 GB/s each direction, so it doesn't seem completely out of reach.

[1] - https://en.wikipedia.org/wiki/NVLink

Aissen · on April 12, 2021

With 50Gb/s per lane, that would be 144 lanes to reach 900GB/s. Quite impressive.

rincebrain · on April 12, 2021

Fascinatingly, NVIDIA's own docs [1] claim GPU<->GPU bandwidth on that device of 600 GB/s (though they claim total aggregate bandwidth of 9.6 TB/s). Which would be what, 96 and 1536 lanes, respectively? That's quite the pinout.

[1] - https://www.nvidia.com/en-us/data-center/nvlink/

freeone3000 · on April 12, 2021

Depends on how big you want to make it. If they're willing to go four inches, that'd do it with existing per-pin speeds from NVLink 3.