Mosaic: processing a trillion-edge graph on a single machine

jandrewrogers · on May 30, 2017

This is really well executed. I am always surprised that more graph engines are not designed using the internal architecture here. Similar designs actually work well in more distributed environments. The design is close to a graph engine I wrote for a supercomputer in 2009, particularly how the parallelization and scale-out work, which performed well. The overhead of networking is significant but can be partially mitigated with aggressive latency hiding techniques.

The article demonstrates how efficiently a very large graph can be processed on a single machine but the architecture has analogues that work nicely on large compute clusters too. It is just a good way to process graphs.

jacquesm · on May 31, 2017

Do you have any papers about your work online? That sounds super interesting.

jandrewrogers · on May 31, 2017

Unfortunately not. I've recently started working on a series of articles and papers to close that gap though.

sologoub · on May 31, 2017

Not the OP, but googled and this was the second after LinkedIn: http://www.jandrewrogers.com/2015/10/08/spacecurve/

jonathanstrange · on May 31, 2017

Stupid question by someone who finds this very interesting but lacks some background information: What kind of processing are we talking about? What do you do with the graph? Searching? Finding paths? Mutating contents? Changing the graph structure? Aggregating data about the graph?

The purpose seems to be presupposed, so I guess it's some common use case that I should know.

faizshah · on May 31, 2017

The major motivation for Giraph (Facebook) and GraphJet (Twitter) was the ability to do graph based recommendations such as Friend of Friend scoring (Facebook) or Who To Follow (Twitter). Basically looking at the followers or friends of each of your followers or friends and using some kind of scoring algorithm to pick which users you will most likely be interested in.

Another common example is doing some kind of clustering (community detection) of vertices like for example labeling the users on the social graph by likely political affiliation or likely hobbies. These labels could be used as features for advertising recommendations later.

mfukar · on May 31, 2017

Anything - even traversal. Basically, whatever combination of functions you can feed to the EP/LR/GR to achieve whatever task.

ksec · on May 31, 2017

This begs the question if this scales linearly. The 1 Trillion Edge Graph on Xeon Phi ( Knight Corners ) and Intel SSD. How well would it do on Knight landing ( The Current Gen )? Or Knight Hill, next gen with 10nm. And with Optane Memory instead of SSD.

Could we handle Facebook's edge graph on a single machine?

mfukar · on May 31, 2017

Good question - sounds like you have a paper to write.

amelius · on May 31, 2017

> But, with Mosaic, we are able to process large graphs, even proportional to Facebook’s graph, on a single machine.

What exactly does the word "proportional" mean here?

faizshah · on May 31, 2017

On the order of one trillion edges like Giraph.