A 1-bit SRAM cell consists of 6 transistors (or 4 transistors and 2 resistors). ...

A 1-bit SRAM cell consists of 6 transistors (or 4 transistors and 2 resistors). While a single transistor doesn't have a capacity for memory, it is trivial to build small circuits of transistors that have memory.

The fundamental problem in modern computers isn't memory: it's in moving data around. The speed of light in a vacuum gives you only a few cm of distance to move information in a single clock cycle, and the actual electronic propagation inside the processors is substantially slower. In fact, the governing factor of the size of L1 cache is the time it takes to actually read a value. At the scale of supercomputers, the topology of the interconnect has major implications for the actual performance on HPC applications.

Saying that bringing memory closer is the determining factor in speed ignores the fact that the size of memory has implications in the time to access it. The innovation in CPUs has been about minimizing latency essentially by developing better heuristics in what it might be. GPUs innovate by not trying to minimize latency but instead trying to overprovision cores and rely on batched memory access (consequently, GPUs are not good at handling codes that rely on irregular memory access patterns).