Thanks a lot for the explanation :) My concern was that I saw the blog did the r...

mrry · on Feb 6, 2015

I would wager that an appropriately-sized Spark, GraphLab, or Naiad cluster could outperform a laptop when running PageRank on this graph. Distributed graph processing research got stuck in a rut when the gold standards for evaluation were the Twitter crawl [1] and uk-2007-05 web graph [2]. Now that the Common Crawl has made a much bigger graph available, I hope that we'll see it used in more performance evaluations—with a COST baseline, of course!

I'd also wager that those Spark, GraphLab, or Naiad programs would be far from optimal, by at least an order of magnitude. My ideal would be to see someone attack the distributed graph processing problem with the same eye for systems performance issues that the TritonSort folks brought to the distributed sort problem [3] and MapReduce [4].

[1] http://law.di.unimi.it/webdata/twitter-2010/

[2] http://law.di.unimi.it/webdata/uk-2007-05/

[3] http://www.alexras.info/pdfs/tritonsort-tocs13.pdf

[4] http://www.alexras.info/pdfs/themis_socc12.pdf