Hacker News new | past | comments | ask | show | jobs | submit login

Folding@home hit 2.43 exaflops earlier this year [0]. I'm surprised massively distributed computing isn't being looked into with more fever. It looks like it had somewhere around a 670,000 GPUs running in parallel with ~1.4 million CPUs.

Users would need to be incentivised to install distributed computing software, but I think it has promise.

[0]: https://archive.vn/20200412111010/https://stats.foldingathom...




Distributed computing has high flops, but very low connection speeds.

LUMI has 200Gbit connections between nodes, or roughly 25GBytes/sec: faster than PCIe 3.0 x16 (15.8 GBps).

In effect: supercomputers can share "remote memory" as if it were local (RDMA protocol). As such, you can treat the entire RAM-space as if it were unified (your 64-bit pointers can be unified across the entire supercomputer, your data-structures can be distributed and always accessed through a 200Gbit-pipeline).

--------

As it turns out: you need a very high I/O connection to truly sustain supercomputing workloads. A lot of these things turn out to be just crazy big matrix-multiplication problems that require a fair amount of coordination between all nodes.

You can't share a problem like that on distributed compute. At best, you can only share problems that can fit on one machine (under 32GBs of RAM). In contrast, these supercomputers can work on 100+ TBs of shared-RAM problems with 100,000+ TBs of shared storage (such as simulating quantum effects). The shared storage is accessed at 2TB/s speeds, and is accelerated with Flash SSD cache layers.

---------

As some people say: the job of a supercomputer is to turn everything into an I/O constrained problem. As such, a HUGE amount of money is poured into making I/O as fast as reasonably possible. You don't want your PFlop-scale machine to be throttled by slow storage or communications.


Broken record time, but it's the latency that's most important, not the bandwidth which you might have with generic Ethernet. Also trades-off in the fabric topology; I don't remember if Cray are still using Dragonfly.


Depends on the problem. Codes that run on supercomputers need low latency communication and access to a lot of data. Think physical simulations.


Anyone know or hve ballpark guesses, what % of physics computing is done on big hpc systems as opposed to medium/small hpc (or non hpc) clusters and single machines ? Eg counting by # of simulated experiments that get published or written up.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: