Does anyone have a good list of books, tutorials, howtos and whatnot about graph databases in general, preferably with python examples (but any language is good really...)?
I've used graphdbs in the past but a nice collection of patterns and best practices would be nice - upping my game on this topic is a current interest of mine!
I've been meaning to update the Bulbs docs for Bulbs/Titan. It's essentially the same as Bulbs/Neo4jServer and Bulbs/Rexster, except Titan does indexing a bit different.
Could anyone explain to me what it means "native" versus "non-native" graph processing in that slide show? Ditto for "native" versus "non-native" graph storage. I simply have no idea what I'm supposed to picture when I see that.
Also, on the neo4j.org page, the claim that "graph data model['s] expressiveness supersedes the relational model" seems a little bit spurious, seeing as, as I understand it, the relational model and graph data are both anchored in first-order predicate logic, and therefore should be able to do the same things essentially (although Codd-style RDBMS with a little bit more fuss regarding the necessary schemas).
Could anyone explain to me what it means "native" versus "non-native" graph processing in that slide show?
One of the leading native graph processing engines is GraphLab (http://graphlab.org/); however, the creator of GraphLab, Dr. Joey Gonzalez, is now focused on GraphX, which is essentially GraphLab built on Spark (http://spark.incubator.apache.org), which is a non-native analytics platform.
Building a graph-processing engine on a general processing system like Spark makes pre-processing and post-processing much easier.
So essentially, it's totally meaningless marketing bullshit? As much as I favor memory optimizations, I think that merely trying to linearize the access patterns is completely futile in the case of graph databases. On that level of brute-force approach to speeding things up, you'll most likely gain more performance by using lower-latency memory modules, or simply by using different data structures to accommodate for your specific cache line sizes and latencies, then by trying to linearize generic graphs.
Finally, yes -- there is no theoretical expressivity gains between RDBMS and property graphs (and, RDF graphs). Nor is SQL (Turing Complete versions) any less expressive than Gremlin (Turing Complete path recognition). The only argument you can make is that graphs are more (or less) effective in terms of conciseness of expression and speed of execution at particular problems. Typically (as expected), its the difference between problem datasets that look like networks (graphs) and those that look like spreadsheets (tables).
I just joined a project using neo4j. They're using the latest version so we've had to build our own python tooling. It's still very immature but hopefully we'll get it into an opensourcable state.
Modelling in graphs is new to me so I was wondering if anyone had any tips or pointers.
It shouldn't take too much to update Bulbs/Neo4j to Neo4j 2.0 -- add the Gremlin Plugin on Neo4j Server (which isn't installed by default anymore) or swap out the Bulbs built-in Gremlin scripts for Cypher equivalents, if Cypher will let you do everything you need...
I'm working on a project that requires a tree datastructure (basically a graph but with only single-direction parent-child relationships) and the number of nodes will stay under a thousand. I could have chosen a graph database, but for my level of complexity I just used postgres and a table that has foriegn-key relationships to itself.
>I could have chosen a graph database, but for my level of complexity I just used postgres and a table that has foriegn-key relationships to itself.
Good point. Relational databases have been used for BOM (Bill-Of-Material) modelling (a manufacturing application) for ages. DB records representing a manufactured product or component can have fields that point to other record(s) in the same table, which can be child components of the product. E.g. airplane -> engine, wings. Engine -> engine parts. Wings -> wing parts. Etc. And this can be recursive.
Another such example is when you want to model an employee entity, where a manager (who has employees - or reports) is also an employee.
I had a look into python client library for neo4j a year ago, and couldn't find a way to perform multiple graph writes in a single transaction, because the only API available was the http one. Has that changed since ?
Here's how to use server-side Gremlin scripts in Python with Rexster, which is TinkerPop's open-source server that runs multiple graph databases, including Neo4j...
As @espeed said, Gremlin will work (or just Groovy + the Java API). Cypher can handle this as of Neo4j 2.0 using the transactional endpoint. I'm not sure whether the old(er) batch HTTP endpoint kept the writes in one tx- I believe it did, though batching them in one HTTP call was frustrating.
Tinkerpop people are pushing too hard Gremlin DSL/API/whatever which is AFAIK only useful in some situation somewhat complex and more or less a nice way to write some common queries. But in simple situations any language with the raw Graph API can do the job. And there is still no drivers for Python in Rexster. I tried, but it was too complicated. Rexster itself is too complicated.
Neo4J with their own query language made things even more complicated. Instead of a “Graph that can be queried with your preferred language” you get a “Graph that can be queried with something that looks like SQL but is not“
ArangoDB is nice for people that want to do JavaScript full stack. Which is not the case of people doing Python.
Also, there is nobody marketing graphdbs just saying “it solve the general problem“. period.
The only thing that may hold you back from using graphdbs are performances but in a lot of situtations you don't care especially in situations where you want to be flexible and to move fast. That's where GraphDBs shine a lot. Of course there is also the graph/tree problem solving space but this is taken for granted.
GraphDB actors market a lot the specialized database aspect of graphdbs, nonetheless graphdbs are good even for solving generic webdev problems.
The one thing I thought was missing from the python tooling for Neo4J was that because *.cyp files are so new they aren't yet handled by the standard documentation toolchain.
Tried neo4j , and I find it handy using python and py2neo . Since my laptop is limited in memory , I couldn't visualize graphs properly from web interface.
What is the secret ingredient of graph databases? The presentation linked from the presentation mentions physical addresses instead of IDs. I get that that would be a speedup, but I would expect it to be more like a constant factor?
Then maybe you can save all links from a node in the node, so you can get all the links with one read access. Fine. But as soon as you get to the second or third level, I would expect the magic to be gone. Say every node has 100 links. OK, so the first 100 links you get in constant time c. But to get the second level, you already need 100 requests (one for each node and it's attached link list). So 100c time. For the third level you need 10000 reads, 10000c time. The next level would be 1000000 requests.
Just saying I'd expect things to get ugly with a graph database pretty fast, too (not as fast as with a relational db, but still).
I haven't really coded a big graph based app, but my expectation would be that get really good performance, a hand coded solution would always be required. For example trying to squeeze as much of the relevant data into memory in a compressed way. Am I wrong?
Oh and also I am not sure how good relational DBs are at query optimization. Just because the visible model is "one row per link" doesn't mean the db couldn't do some intelligent caching internally.
Data models that use deep "JOIN"s are way faster on a graph database. You're right about branching factor- if you always traverse all relations, any database will be slow. In most cases, however, you don't.
I've used graphdbs in the past but a nice collection of patterns and best practices would be nice - upping my game on this topic is a current interest of mine!