This is really well executed. I am always surprised that more graph engines are not designed using the internal architecture here. Similar designs actually work well in more distributed environments. The design is close to a graph engine I wrote for a supercomputer in 2009, particularly how the parallelization and scale-out work, which performed well. The overhead of networking is significant but can be partially mitigated with aggressive latency hiding techniques.
The article demonstrates how efficiently a very large graph can be processed on a single machine but the architecture has analogues that work nicely on large compute clusters too. It is just a good way to process graphs.
Stupid question by someone who finds this very interesting but lacks some background information: What kind of processing are we talking about? What do you do with the graph? Searching? Finding paths? Mutating contents? Changing the graph structure? Aggregating data about the graph?
The purpose seems to be presupposed, so I guess it's some common use case that I should know.
The major motivation for Giraph (Facebook) and GraphJet (Twitter) was the ability to do graph based recommendations such as Friend of Friend scoring (Facebook) or Who To Follow (Twitter). Basically looking at the followers or friends of each of your followers or friends and using some kind of scoring algorithm to pick which users you will most likely be interested in.
Another common example is doing some kind of clustering (community detection) of vertices like for example labeling the users on the social graph by likely political affiliation or likely hobbies. These labels could be used as features for advertising recommendations later.
This begs the question if this scales linearly. The 1 Trillion Edge Graph on Xeon Phi ( Knight Corners ) and Intel SSD. How well would it do on Knight landing ( The Current Gen )? Or Knight Hill, next gen with 10nm. And with Optane Memory instead of SSD.
Could we handle Facebook's edge graph on a single machine?
The article demonstrates how efficiently a very large graph can be processed on a single machine but the architecture has analogues that work nicely on large compute clusters too. It is just a good way to process graphs.