Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't know anything about graph databases, but ... why do we want to use SQLite as one? Doesn't the graph database community have something better to offer?

SQL has a knack for doing really easy things well, and moderately complicated things badly. I would assume by default that anything involving graphs should not involve SQL. Recursive CTEs to me scream "you're about to spend hours debugging something trivial".



> I would assume by default that anything involving graphs should not involve SQL.

That would be a questionable assumption. SQL databases are a widely tested tool, and SQL itself can allow you to augment your "graph" with constraints and semantics that many graph-focused systems have trouble with. CTE's, while not entirely trivial, are not overly complex; they're not something you'd spend "hours" debugging.


I don't feel that way at all. I think a lot of that feeling comes from the fact that everyone's so dependent on ORMs they just rarely write SQL. Once you start doing it more you realize how powerful and elegant it can be.


> Doesn't the graph database community have something better to offer?

Probably not in the same kind of resource-constrained way, no. Whilst it might be a suboptimal graph database, it's also not going to require 2GB of RAM to run...


An interesting upcoming graph database is oxigraph. It's written in Rust and might be able to cope well in 2GB.

https://github.com/oxigraph/oxigraph/


Ooh, ta, that looks handy, especially with supporting SPARQL. If it can load my 1.1GB TTL, I'll be beyond happy.


I’d love to learn more about your use case... I think SPARQL and RDF are pretty interesting technologies, but haven’t seen many real applications using them.


I'd describe SPARQL and RDF as general interfaces focused on interop, not "technologies" per se. You could use any technology to provide a SPARQL/RDF endpoint, including a relational database.


My current use case is that I'm working on something which is basically things in groups (which may also be in groups) and seeing if things like "items in group A which are linked to group C via linkages to items in group B" are easier with SPARQL + graph database than they are with Postgres.

(Previous use case was converting music industry data into RDF and providing query interfaces on top. That 1.1GB TTL is a CWR file converted to RDF as a stress tester.)


> "items in group A which are linked to group C via linkages to items in group B"

Relational databases are actually quite convenient for such cases because you can model each "group" of items via a database table, including potentially an associative table which provides "linkage" between items in the database. Graph-based models are generally more limited than that, e.g. RDF and SPARQL are limited to simple triples which link a "source" and a "target" entity (both of which are essentially non-typed) according to a fixed "predicate". You can sort of materialize triples and endow them with extra information, but it gets clunky.


> Relational databases are actually quite convenient for such cases

I believe that someone may have done this in a nice way but every time I've encountered it (3 times thus far), it's always ended with complex SQL and tables being bent out of shape to try and keep performance.

> RDF and SPARQL are limited to simple triples which link a "source" and a "target" entity according to a fixed "predicate"

But can also infer transitive relations based on those predicates - `A canSee B`, `B canSee C` => `A canSee C` - which is handy when you're trying to discover those relationships in your data.


> But can also infer transitive relations based on those predicates - `A canSee B`, `B canSee C` => `A canSee C` - which is handy when you're trying to discover those relationships in your data.

You can do this sort of inference in a view if you use relational databases. (A view is a sort of "virtual" table based on the result of some database query. Many databases can also materialize views for improved performance, though this can make it a bit challenging to manage updates.)


> You can do this sort of inference in a view if you use relational databases.

You'd need to use recursive queries though, I think, and there be dragons.

> [materialized views] can make it a bit challenging to manage updates

We're not using them with Postgres because refreshing materialized views is very much a blunt hammer and it'd cause more hassle than it would solve. Which is annoying because they've been great when I've used them previously.


If we are just going by resource usage, I'm sure that RedisGraph (as it's "just" a Redis module) would also fit the bill.


Isn't that keeping everything in memory?


It seems it can persist to disk just like normal Redis stuff but only work with in-memory graphs - tricky if it's too big to fit, I suppose.

cf https://github.com/RedisGraph/RedisGraph/issues/152


It's keeping everything in memory (index and entities) with disk persistence, but doesn't have a big memory overhead when running with little data.


> why do we want to use SQLite as one

Because SQLite is ubiquitous. It's simply everywhere. And it doesn't need a dedicated server to run.


I would also add the following - another good reason is that it's probably going to be the most reliable, best tested part of any stack you decide to use.


My experience with graph databases so far has been that they are tuned more for nice schemas and queries than for graph analysis.

I've actually been considering migrating away from neo4j to sql with recursive queries for performance reasons.


> SQL has a knack for doing really easy things well, and moderately complicated things badly. I would assume by default that anything involving graphs should not involve SQL

A schema may have only one, or anyway, comparatively very few instances, of recursion.

In this case, using two separate data stores would be too much overhead; if the CTE doesn't do anything particularly fancy, that is, if it just retrieves records recursively associated via keys, it's not implicitly hard to debug (actually, there isn't much to debug).

A point in case is GitLab's users management, which dropped MySQL also because of the lack of CTEs (at the time).


Some of the early RDF Graph Stores were indeed based on SQL databases (Jena RDB/SDB) but you're correct in that they didn't scale particularly well.

There are pure graph stores out there, AllegroGraph, OpenLink Virtuoso (although this is a strange hybrid of SQL + Graph technologies) and others - and for more advanced graph query constructs like path finding there are optimisations that are difficult/not well supported in SQL.


Yes why not? If you have a data model where parts of the data is graph-like, and other classic relational, why would you not want to store the entire thing in the same database? If your domain is completely graph-oriented, and you have complex requirements on your database to support graph-oriented operations, it would be one thing. But evaluating use of a well known and mature SQL database is no bad starting point.


> Yes why not? If you have a data model where parts of the data is graph-like, and other classic relational, why would you not want to store the entire thing in the same database?

SQL is an implementation of relational algebra, and databases that implement it are relational databases for storing relational data.

Pointing at the name of a thing isn't an argument, I admit it, but asking "what could go wrong if we want to use a system tailored to relational data to deal with non-relational data?" strikes me as the sort of question that might turn out to have a compelling answer 6 months in when it is too late to easily change the database.

If I were involved in a project, I would be representing very hard that the data comes out of the relational DB using "SELECT * FROM table WHERE conditions" and then the clever graph things start happening. If the data needs to be read from disk using clever graph based algorithms, then use a clever, graph-based database. There are a bunch out there according to Wikipedia.


Graph data is not necessarily "non-relational". A lot of real-world data just can't be described as "purely graph-like" (no complex semantics or constraints) or "purely relational" (no recursive queries, etc.); often it's convenient to merge both feature sets. This is exactly what modern SQL with CTE's allows.


> Pointing at the name of a thing isn't an argument, I admit it, but asking "what could go wrong if we want to use a system tailored to relational data to deal with non-relational data?" strikes me as the sort of question that might turn out to have a compelling answer 6 months in when it is too late to easily change the database.

This would suggest that pretty much every existing RDBMS has made some very bad decisions since JSON/XML types, arrays, and all sorts of other non-relational features have long been supported.

Given the nature of SQLite you probably aren't dealing with petabytes of data.


If the project involves spending 6 months writing code involving navigating graphs, then SQLite probably isn't the right choice. If it's one or two queries, that's a different situation


Sure, but (and I am apologetic to be labouring on this point) if you have a small one-off blob of data that needs to be processed as a graph, SQL is the worst language to do the processing and a database is the worst place to be doing it. Get it into memory, use a programming language that supports recursive functions.

This is the basic argument here. If the situation is delicate enough we need the database to be doing graph operations, why would we pick SQLite? If it isn't, why would we pick SQL over Python?

I don't think it is bad that SQLite supports this, just ... what are the circumstances where this is a good idea? Is there a reason to do graph-style algorithms in SQL?


The problem that prompted me to make this extension to SQLite was a web-page request, so it needs to be processed with low latency. I initially tried loading the graph into memory and processing it that way. But there are approx 100K nodes in the graph, only a few dozen of which are relevant to the answer. It took a lot of time to pull in 100K nodes from disk. Running the query entirely in SQL is not only much less code to write, debug, and maintain, it is also much faster.


Thanks for adding it. I learned window functions using SQLite, looking forward to learn recursive queries next.


> SQL is the worst language to do the processing and a database is the worst place to be doing it. ... Is there a reason to do graph-style algorithms in SQL?

I don't think it's such a far-off fit from the relational database problem. Any time you have a table with a relation to itself, there's the potential to do graph-style algorithms on that data.

> If the situation is delicate enough we need the database to be doing graph operations, why would we pick SQLite?

Perhaps it has nothing to do with the situation being "delicate" and it is just a simple matter of it being less work and less lines of code to use a graph-style query in SQL, rather than re-implementing the graph algorithms in your application code or having to bring in an entirely new database system just to process one query.


I agree. There is nothing wrong relationally with self-referencing tables. Support for traversing such data models may have been poor in the past but with the support added, what is the problem?


I write SQL for a living and am time woodworker (not good at all). Sometimes I get pieces of wood cut by industrial machines at the shop and sometimes I use my manual saw. I don't have money or space to set up a professional wood cutting machine in my workshop. Hand held saw not the best for cutting wood but sometimes it just makes sense. You are right, graph database things need to be done in a graph database but speaking for myself I have never set up or used a graph database. It is easier (not right) for me to learn recursive statements than to set up and learn to use a graph database.


It's an unlikely comparison in the first place; who is considering, on one hand, using SQLite right on the client device, and on the other, maybe spinning up a graph database that will almost certainly not just run on the client device? I submit that it's next to nobody. You can run SQLite on a server, but nearly nobody does that either (and not without reason). The idea that recursive CTEs are some sort of arcane technique that's impossible to debug is also not my experience with them at all.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: