Why I'm Lukewarm on Graph Neural Networks

sischoel · on Jan 4, 2021

I don't know much about graph neural networks although it is a topic that I want to study in the next few months.

But what bothered me in your article is what you wrote about graph data structures.

NetworkX is indeed very slow, this is due to two facts: - NetworkX is a pure Python implementation and does not relay on some methods written in a faster language like C. - They use dictionaries to represent the graphs, which may have some advantages when mutating graphs, but of course have much worse locality than an adjacency list or a some sparse matrix format.

But even for python there are much faster libraries such as igraph. The data structure in igraph is an edge list.

A lot of single core graph libraries use an adjacency list internally, and while it is true that the index list for each vertex can be somewhere arbitrary in memory, they usually do not behave like a linked list, unless you graph is really sparse. One of the most used operations in graph algorithms is to iterate over the neighbors of a vertex, and for this, adjacency lists are very good.

They also have a small advantage over CSR matrices for adding or removing edges, and they might use slightly less memory, as their index type only needs to be able to index all vertices and not all edges, so they need half of the space, which is better for caches.

VHRanger · on Jan 4, 2021

Right, the data structure depends on the tradeoff being made.

Do you add or remove nodes as often as you traverse the graph? Then a dictionary/network-of-pointers make sense.

But generally, if you're doing that a lot, a database also makes sense which is why I don't like the tradeoff NetworkX made at all.

If you don't add nodes often at all, but traverse from edge-to-edge a lot, a CSR representation makes sense.

Adjacency array representations have other tradeoffs vis-a-vis CSR representaitons and are kind of an in-between solution.

tehjoker · on Jan 4, 2021

There's graph-tool but it is annoying to compile sometimes: https://graph-tool.skewed.de/

jedbrown · on Jan 4, 2021

The index type for CSR need only index nodes. It's the offsets (one per row) that need to index edges.

The blog's portrayal of NVMe vs distributed memory is a bit off:

> With modern NVMe drives random seeks aren’t slow anymore, much faster than distributed network calls like you do when scaling the linked list-based graph.

Distributed memory machines has 1 microsecond off-node latency and 15-20 microsecond reductions (with thousands of nodes). These latencies have been pretty flat for the past 20 years in supercomputing, and have been widely available at cloud providers for the past few years. NVMe is not as fast as it's made out to be. https://twitter.com/SwiftOnSecurity/status/99079794835421184...

The other issue is that NVMe is more expensive than DRAM or HBM when you care about bandwidth (and graph/network analysis is almost always memory bandwidth limited, even on HBM). Suppose you need to do 1000 traversals of a 1 TB data structure. This takes about 10 seconds with 100 HBM devices ($1 at on-demand pricing, assuming amortized startup), but days of saturating NVMe bandwidth.

VHRanger · on Jan 5, 2021

Thanks for the correction. I was aware of the speed of distributed cluster reads, but I might be overestimating the speed of NVMe random seeks.

> The other issue is that NVMe is more expensive than DRAM

How are you calculating that?

When I upgraded my home server the other week, 16GB of DDR3-1600 cost nearly as much as 1TB of decently fast NVMe (HP SX920).

Would you still think an on-disk mmap'ed CSR graph library be useful for medium-scale usecases given modern SSDs? Or do you think such latencies kill any use

jedbrown · on Jan 5, 2021

Unless your data is at rest, you're paying for bandwidth, not capacity. SRAM > HBM > DRAM > NVMe when you're buying bandwidth. Seriously, when we have a low-latency network, the most cost-effective way to run some simulations and graph algorithms is out of L3 on EPYC (where mid-tier parts have over 1 TB/s at a fraction of the energy and cost of an A100/MI100).

Your NVMe quotes 3.2 GB/s, versus 25 GB/s (theoretical) for a stick of DDR4-3200 (which is in the same price range). A standard 2-socket EPYC server can give you 300 GB/s from DRAM, but you'd need several servers servers packed with NVMe to provide that bandwidth.

The rationale for persistent storage (like NVMe) as an algorithmic device is either (a) you need it to be persistent or (b) you have physical or algorithmic constraints that prevent you from using more parallelism and you're willing to pay 5-10x for the privilege of executing from NVMe.

VHRanger · on Jan 5, 2021

When you say "you're paying for XXX" you mean cloud prices (in terms of $ to complete a similar job)?

I was referring to running it on your own metal, but your argument might hold there.

Obviously from a cloud provider I'd just take a massive pile of RAM rather than NVMe drives.

jedbrown · on Jan 5, 2021

Cloud pricing is competitive and thus a good proxy for relative on-prem costs if you keep the on-prem hardware busy. If you want to buy the capability to run the job on-prem, but the hardware sits idle most of the time, then NVMe might be optimal. However, you should be aware that these constraints have greatly increased the cost of the job itself.

sischoel · on Jan 4, 2021

Yes, I had something mixed up here - you are completely right about CSR matrices, they should not use more memory than an adjacency list.

enriquto · on Jan 4, 2021

Indeed, the best support for large graphs in Python is found inside scipy.sparse

lmeyerov · on Jan 5, 2021

I've found sparse good for many-columns (10K+). However, for graphistry's graph workloads, we do most scipy-size graphs at much higher speeds via cudf + cugraph. Similar apis, more speed.

Ex: Preprocess many cols with scikit sparse (genomes, many-featured fraud, ...) -> cudf (= gpu dataframes) for most of workload -> infer relationships with gpu UMAP then gpu k-nn -> graphistry for visual analysis.

We're starting to experiment with bigger-than-memory (TB etc) scale via the gpu-enabled dask libs. I definitely recommend trying. If interested in adding visual analytics here, these'll be hitting our free + self-hosted layers soon, and drop a contact method if you'd like early access. Exciting times!

jim22k · on Jan 5, 2021

Anyone who is serious about graph analysis in Python using linear algebra should check out GraphBLAS (graphblas.org). The inclusion of semirings makes things like SSSP and BFS extremely elegant on sparse adjacency matrices.

The main implementation is SuiteSparse::GraphBLAS, a C library which has two Python bindings (search for grblas or pygraphblas). Disclosure: I'm the author of grblas. Both are available on conda-forge for easy installation.

If you want to try it out without installing, here is a binder link: https://mybinder.org/v2/gh/metagraph-dev/grblas/HEAD?filepat...

VHRanger · on Jan 5, 2021

Thanks!

What's the underlying data structure of the graph?

mrtranscendence · on Jan 4, 2021

NetworkX isn't a bad library, nor is it only suitable for "babies". Yeesh. Actual people work on this stuff, you know, and they may have different goals, requirements, etc than you. This whole post reeks of someone who's so in love with being a maverick speaking uncomfortable truths that they've lost sight of this human element.

If you're offended by (say) research papers that fail to break enough new ground to satisfy you, sorry about your luck, but again, not everyone's optimizing the same function as you. You'll probably do better bringing people around when you're not implying (or outright saying) their work is shit.

dimatura · on Jan 4, 2021

I have to say I agree. NX is totally fine for the applications it was built for. I appreciate the simple API and the flexibility of using pure Python. There's a reason it's so popular.

I'm not an expert in graph neural networks, so can't really say much about the novelty of Node2Vec. But I do think it's often misguided to judge scientific work as trivial or incremental in retrospect. Specially in a relatively young field like deep learning, where four years (Node2Vec is from 2016) is a long time.

VHRanger · on Jan 4, 2021

I never said NX is bad, but it is for "baby" graphs. NX can't scale past a few hundred thousand nodes.

I appreciate NX's place in the ecosystem, but its implementation leaves a large gap to be filled for an intermediate library that is a Pandas analogue for graphs.

pred_ · on Jan 4, 2021

From the post:

> NetworkX is a bad library.

dallathee · on Jan 4, 2021

There is igraph, graph-tool, snap and even a CUDA enabled one (cuGraph, they try to follow networkx API).

eachro · on Jan 5, 2021

Has the author looked at how GNNs are used in say predicting molecular properties[1] or molecule generation[2]? GNNs are the state of the art on those tasks and those problems are certainly much harder than the Cora/Citeseer/etc semisupervised node label prediction tasks.

[1] https://arxiv.org/abs/1704.01212

[2] http://proceedings.mlr.press/v80/jin18a/jin18a.pdf

beaupletga · on Jan 4, 2021

I need to handle (for work) a graph with 40 million nodes and more than 130 million edges. As expected, networkx couldn't handle more than a million nodes so I had to search for python libs which might handle that much data.

This is why I've been using your lib (https://github.com/VHRanger/nodevectors) for at least 2 weeks now as well as these 2 other libs: https://github.com/louisabraham/fastnode2vec and https://github.com/sknetwork-team/scikit-network. What do they have in common? They handle sparse graphs (using CSR representations).

Having a graph with several million nodes isn't just some edge case, social graph for instance grow way faster than anyone could expect. So I do totally agree that there is a gap between popular graph libs which handle very small graphs and real life libs which need to handle way bigger graphs.

Btw, thanks for the work you've done with 'nodevectors'.

PS: I'm not criticizing either networkx which is very handy and quite good when prototyping a solution.

Labo333 · on Jan 18, 2021

Hi, I'm the author of [1]

nodevectors implements many more algorithms than fastnode2vec that focuses on node2vec. I'm pretty sure fastnode2vec is the fastest node2vec implementation because it uses CSR format, JIT compilation but also supports multiprocessing. The sampling algorithm has been improved compared to the paper (and to all other implementations) and allows truly linear memory consumption.

On the other hand, nodevectors has a lot of very cool methods like ProNE so you should definitely try it on your data. However, the original ProNE code is probably faster as it is written in C++ and uses multiple cores while nodevectors just uses the installed BLAS (you can probably get a massive speedup by installing OpenBLAS).

[1]: https://github.com/louisabraham/fastnode2vec

VHRanger · on Jan 5, 2021

Cheers.

Please raise an issue on my library's github or drop me an email if you run into issues using my lib.

I'm always happy to help others on their projects.

leecarraher · on Jan 4, 2021

I suspect we will not see a revolution in deep learning from GNN without a corresponding hardware(FPGA?) or optimization method(HSIC? HTMs?) advance.

GNNs trained by backprop are making many of the same mistakes that LSTMs did: solve one important problem(exploding/vanishing gradients), but introduce a bunch of hyperparameters that bring along their own set of problems. GRUs are successful in my opinion, because they remove some of those tunable parameters.

Of course as the post suggested, not being able to tune something, often loses out against some more tuned and curated solution. GNNs being the newest version, having tons of parameters to tweek. In the end do we get a better, ultimately more generalizable solution? Or do we just get more hyperparameters to tune, and spend more time and money for a modest, unrepeatable gain.

sooheon · on Jan 5, 2021

That hardware advance may be closer than expected: https://www.graphcore.ai

orange3xchicken · on Jan 4, 2021

A lot of valid criticism of this post on the reddit ml subreddit:

https://www.reddit.com/r/MachineLearning/comments/kqazpd/d_w...

lostmsu · on Jan 4, 2021

By no means I am an expert in deep learning, but lately I've been considering graph NNs snake oil of neural networks. There are no impressive results on any common tasks and that picture is supported by a recent paper claiming transformers to "contain a graph network inside".

astrophysician · on Jan 4, 2021

In my opinion there aren't many deep learning applications that are fundamentally snake-oil, it's more that people who don't quite understand the limitations and advantages of particular methods that end up being (witting or unwitting) snake-oil salespeople.

Deep graph NN's are very useful for lots of data that are inherently graph-structured (another commenter above mentioned chemistry applications, and there are lots of other examples). Whether or not for the time-being they give SOTA results on common datasets, being able to work directly with graph-structured data is quite appealing in many cases.

timr · on Jan 4, 2021

Graph NNs are widely used in the chemical domain -- molecules are easily modeled as graphs, so it's a natural fit.

VHRanger · on Jan 4, 2021

So are graph embedding methods, which the post discusses

dumb1224 · on Jan 4, 2021

The post can't be accessed with a Internal Server Error. Or is it just my ISP? I often confuse myself with other representation learning methods such as graph kernels https://en.wikipedia.org/wiki/Graph_kernel

Are there any relations between them at all?

VHRanger · on Jan 4, 2021

No, my blog went down from hitting the front page of HN

enriquto · on Jan 4, 2021

And for large molecules (e.g., proteins) graph methods really start to shine.

Der_Einzige · on Jan 5, 2021

So wrong oh my God!

This beauty got an outstanding paper award at COLING 2020.

GNNs are not applied to "common tasks" because they are primarily suited for environments where graph structure is present. In this case, predicting how the Indian congress will vote.

https://www.aclweb.org/anthology/2020.coling-main.426/

mrfox321 · on Jan 4, 2021

Google maps improved their worse-case residuals of traffic estimation using GNNs (messaging passing Nets, not graph embeddings).

vladTheInhaler · on Jan 4, 2021

A bunch of fully connected layers also "contains a CNN inside", but empirically CNNs lead to better performance on image classification tasks. When you can cut away an enormous, and mostly irrelevant part of the parameter space, that has a lot of value.

fedd · on Jan 5, 2021

I tried to implement a graph neural network back in 2006...

https://github.com/fedd/weighscore

SeeDave · on Jan 5, 2021

If you want to feel old:

2020 - 2006 = 14. 2006 - 14 = 1992. When the 'World Wide Web' first launched....

FL33TW00D · on Jan 5, 2021

Interesting post, I’m working on a whole graph classification problem at the moment. If GNNs aren’t the future, then where else should I be looking?

sanxiyn · on Jan 5, 2021

You should look at label propagation. Combining Label Propagation and Simple Models Out-performs Graph Neural Networks: https://arxiv.org/abs/2010.13993

sooheon · on Jan 5, 2021

Linked paper is focused on node classification, parent is asking for whole graph prediction.

VHRanger · on Jan 5, 2021

For whole graph classification, I think GNNs are still the future.

Der_Einzige · on Jan 5, 2021

This is like someone going "I installed NLTK and it's slow and doesn't support transformers or word2vec!". In fact, I think the author of the article may have actually done exactly that.

GradientAssent · on Jan 4, 2021

Anyone have a mirror?

whymauri · on Jan 4, 2021

Also posted on Reddit:

https://www.reddit.com/r/MachineLearning/comments/kqazpd/d_w...

VHRanger · on Jan 4, 2021

Author here.

Seems like another post of mine (my take on BTC) was randomly posted and became popular at the same time I posted this one.

I'm trying to get the wordpress host's cache to handle the load.

mdawg · on Jan 5, 2021

Nice article

osipov · on Jan 4, 2021

You can use https://web.archive.org/web/20210104154304/https://www.singl... since the original website is experiencing HN bear hug.