Anyone use Cayley in prod? An old job used Neo4j, and the graph concept was grea...

LukaAl · on April 8, 2017

Tried to use it in production a couple of years ago hosting a mirror copy of Freebase with mixed results:

- There were a couple of issue loading the data that we fixed and contributed back the patch

- Loading the data was really slow, and it got slower every time a new entry was added (Loading the full freebase dump required 1 week on a very beefy machine with SSD. Used LevelDB)

- Then the queries were relatively slow. Without going too much into details, we were using the data to analyze texts and extract entities, and the relationship between them, and even parallelizing the queries, they were relatively slow (depending on complexity between 0.1 and 1 sec on average). We solved the issue implementing a robust caching layer in front of it and carefully planning the queries.

- In general, it was stable and performant enough for a backend service. But we were pushing really the envelope of what it could do.

All in all, I would say that I was happy with it. In comparison, I tried a year earlier to use Neo4J in a similar role and I give up after 2 weeks because I wasn't even able to get it loading part of the dataset without crashing on a similar hardware.

frik · on April 8, 2017

What's the best way to load Freebase in 2017? Cayley with Postgres storage? Or some other RDF/graph DB? Or ElasticSearch? Or dump it in Postgres/MySQL? I am not interested in complex queries, but simple queries that execute reasonable fast.

pawanrawal · on April 8, 2017

We have it loaded on a Dgraph instance. In case you want to play around with it at https://play.dgraph.io

frik · on April 9, 2017

The movie subset, or the whole Freebase?

The Freebase Film Data has only 21M facts. Freebase 1.9 billion facts.

pawanrawal · on April 9, 2017

This is just the film data.

frik · on April 9, 2017

I would be interested if Dgraph can handle the full Freebase dataset. (250 GB RDF)

How long does it load? What's the avg query response for very simple searches (like who is the US president)?

mrjn · on April 9, 2017

(Dgraph author) That's a good point. I think I'll load one instance up with the entire Freebase data, run it on freebase.dgraph.io, and blog about how and whys etc. Expect that in the next couple of weeks.

mring33621 · on April 9, 2017

How is Dgraph licensed? I see both Apache and AGPL in GitHub.

mrjn · on April 9, 2017

Dgraph follows MongoDB licensing. The clients are all in Apache, and the server code is AGPL. This doesn't affect anyone using Dgraph for commercial purposes; but if they make changes to the server code, they'll have to release them under AGPL. Blog post here: https://open.dgraph.io/post/licensing/

dyu- · on April 9, 2017

Looking at the commit, they switched from asl to agpl.

ashwin95r · on April 9, 2017

Benchmarks for loading freebase data in Cayley vs Dgraph. https://discuss.dgraph.io/t/differences-between-dgraph-and-c...

Dgraph was 10X faster.

frik · on April 9, 2017

What's up with this toy dataset? The movie subset is just 21 mio facts. (21million.rdf.gz)

Can someone run the benchmark for the real Freebase (1.9 billion facts)?

Also LevelDB/Bolt is not suitable for this, better use MongoDB or Postgres or MySQL as Cayley data store.

mrjn · on April 9, 2017

Expect freebase.dgraph.io in a couple of weeks.