Another cool feature: recursive queries using the WITH common table expressions.
Say you have a table fs <id, name, parent_id> representing a hierarchical filesystem, and you want to print the full path of each file, here's how you can do it in a single query:
WITH RECURSIVE path(id, name, parent_id, path, parent) AS (
SELECT id, name, parent_id, '/', NULL FROM fs WHERE id = 1 -- base case
UNION
SELECT fs.id, fs.name, fs.parent_id, parentpath.path ||
CASE parentpath.path WHEN '/' THEN '' ELSE '/' END ||
fs.name as path, parentpath.path as parent
FROM fs INNER JOIN path AS parentpath ON fs.parent_id = parentpath.id
) SELECT id, name FROM path;
I don't know the details but won't that be quite inefficient? (it's not a construct I've seen before - never used Postgres in anger)
You're effectively working with a tree and there are much more relational friendly ways of doing that in SQL.
I know I'm just picking on this specific use but I can't help imagining that recursive querying will always be slow. Would love to hear how it's implemented if that's not the case.
But that could end up being quite deep couldn't it? Is it not like stacking up an unknown number of correlated queries? Could you even screw it up and have an infinite joining condition?
The main advantage of using common table expressions is the improved readability and ease in maintenance of complex queries, after a while using them coming up with a solution for a complex query is quite easy.
Regarding the performance of them, it depends on what you are trying to accomplish, some times theres some performance penalties but in my personal experience (using them in SQL Server) I have never run into a case where the performance isn't good, well that's not entirely true, in cases where you need to return large datasets CTE are never the best solution.
If you have self-referencing rows, you're going to wind up with two options: an inefficient recursive query or inefficiently issuing N+1 queries. The recursive query would wind up being faster simply because there's a lot less overhead. That said, I don't know what additional optimizations or penalties are going on in the system, but I have never converted a situation from N+1 queries into a recursive query and found a performance degradation.
Of course other options should always be considered. Joe Celko has a book on storing trees in the database I've been meaning to pick up.
No doubt - the various tree methods all have their drawbacks too (more to manage when manipulating the tree).
It's all going to depend on your usecase but in general these sorts of path operations tend to be more read and less manipulation. You'll almost certainly get much faster lookups if you're not using recursive queries (as you can normally just use an index).
With all due respect, this is one of those "it's impressive because it was done at all" sorts of things. Seriously, if this is what your storage system forces you to do to compute the equivalent of:
It's true, it is kind of a PITA. Users of our VLA observation preparation tool can nest scans inside scan loops, and this is represented in the database with the self-referencing PK. In the tool we never really need to do the nasty recursive select, but occasionally I need to do them to do reports, and it's never a great joy.
That said, I'm glad I have the power, and I wouldn't throw away Postgres and switch to something else just because something else might store hierarchies more naturally. Postgres is not the perfect tool for every use case, but having hierarchical data by itself isn't enough reason to throw it away.
Say you have a table fs <id, name, parent_id> representing a hierarchical filesystem, and you want to print the full path of each file, here's how you can do it in a single query: