The index type for CSR need only index nodes. It's the offsets (one per row) tha...

VHRanger · on Jan 5, 2021

Thanks for the correction. I was aware of the speed of distributed cluster reads, but I might be overestimating the speed of NVMe random seeks.

> The other issue is that NVMe is more expensive than DRAM

How are you calculating that?

When I upgraded my home server the other week, 16GB of DDR3-1600 cost nearly as much as 1TB of decently fast NVMe (HP SX920).

Would you still think an on-disk mmap'ed CSR graph library be useful for medium-scale usecases given modern SSDs? Or do you think such latencies kill any use

jedbrown · on Jan 5, 2021

Unless your data is at rest, you're paying for bandwidth, not capacity. SRAM > HBM > DRAM > NVMe when you're buying bandwidth. Seriously, when we have a low-latency network, the most cost-effective way to run some simulations and graph algorithms is out of L3 on EPYC (where mid-tier parts have over 1 TB/s at a fraction of the energy and cost of an A100/MI100).

Your NVMe quotes 3.2 GB/s, versus 25 GB/s (theoretical) for a stick of DDR4-3200 (which is in the same price range). A standard 2-socket EPYC server can give you 300 GB/s from DRAM, but you'd need several servers servers packed with NVMe to provide that bandwidth.

The rationale for persistent storage (like NVMe) as an algorithmic device is either (a) you need it to be persistent or (b) you have physical or algorithmic constraints that prevent you from using more parallelism and you're willing to pay 5-10x for the privilege of executing from NVMe.

VHRanger · on Jan 5, 2021

When you say "you're paying for XXX" you mean cloud prices (in terms of $ to complete a similar job)?

I was referring to running it on your own metal, but your argument might hold there.

Obviously from a cloud provider I'd just take a massive pile of RAM rather than NVMe drives.

jedbrown · on Jan 5, 2021

Cloud pricing is competitive and thus a good proxy for relative on-prem costs if you keep the on-prem hardware busy. If you want to buy the capability to run the job on-prem, but the hardware sits idle most of the time, then NVMe might be optimal. However, you should be aware that these constraints have greatly increased the cost of the job itself.

sischoel · on Jan 4, 2021

Yes, I had something mixed up here - you are completely right about CSR matrices, they should not use more memory than an adjacency list.