Another cool feature: recursive queries using the WITH common table expressions....

aidos · on Sept 6, 2012

I don't know the details but won't that be quite inefficient? (it's not a construct I've seen before - never used Postgres in anger)

You're effectively working with a tree and there are much more relational friendly ways of doing that in SQL.

I know I'm just picking on this specific use but I can't help imagining that recursive querying will always be slow. Would love to hear how it's implemented if that's not the case.

dude_abides · on Sept 6, 2012

All that WITH RECURSIVE does is allow a WITH query to refer to its own output. Beyond that there is no overhead.

Here is the query plan I got for my above query:

  QUERY PLAN
  CTE Scan on path  (cost=292.79..304.41 rows=581 width=104)
    CTE path
      ->  Recursive Union  (cost=0.00..292.79 rows=581 width=72)
          ->  Index Scan using fs_pkey on fs  (cost=0.00..8.27 rows=1 width=40)
                Index Cond: (id = 1)
          ->  Hash Join  (cost=0.33..27.29 rows=58 width=72)
                Hash Cond: (public.fs.parent_id = parentpath.id)
                ->  Seq Scan on fs  (cost=0.00..21.60 rows=1160 width=40)
                ->  Hash  (cost=0.20..0.20 rows=10 width=36)
                      ->  WorkTable Scan on path parentpath  (cost=0.00..0.20 rows=10 width=36)

aidos · on Sept 7, 2012

But that could end up being quite deep couldn't it? Is it not like stacking up an unknown number of correlated queries? Could you even screw it up and have an infinite joining condition?

alfet · on Sept 6, 2012

The main advantage of using common table expressions is the improved readability and ease in maintenance of complex queries, after a while using them coming up with a solution for a complex query is quite easy. Regarding the performance of them, it depends on what you are trying to accomplish, some times theres some performance penalties but in my personal experience (using them in SQL Server) I have never run into a case where the performance isn't good, well that's not entirely true, in cases where you need to return large datasets CTE are never the best solution.

fusiongyro · on Sept 7, 2012

If you have self-referencing rows, you're going to wind up with two options: an inefficient recursive query or inefficiently issuing N+1 queries. The recursive query would wind up being faster simply because there's a lot less overhead. That said, I don't know what additional optimizations or penalties are going on in the system, but I have never converted a situation from N+1 queries into a recursive query and found a performance degradation.

Of course other options should always be considered. Joe Celko has a book on storing trees in the database I've been meaning to pick up.

justincormack · on Sept 7, 2012

Its a reasonable book from memory, although there are problems with all the methods.

aidos · on Sept 7, 2012

No doubt - the various tree methods all have their drawbacks too (more to manage when manipulating the tree).

It's all going to depend on your usecase but in general these sorts of path operations tend to be more read and less manipulation. You'll almost certainly get much faster lookups if you're not using recursive queries (as you can normally just use an index).

ajross · on Sept 6, 2012

With all due respect, this is one of those "it's impressive because it was done at all" sorts of things. Seriously, if this is what your storage system forces you to do to compute the equivalent of:

   path(x) { return x ? path(x->parent) + "/" + x->name : "" }

... then you're using the wrong storage system. Yikes.

fusiongyro · on Sept 7, 2012

It's true, it is kind of a PITA. Users of our VLA observation preparation tool can nest scans inside scan loops, and this is represented in the database with the self-referencing PK. In the tool we never really need to do the nasty recursive select, but occasionally I need to do them to do reports, and it's never a great joy.

That said, I'm glad I have the power, and I wouldn't throw away Postgres and switch to something else just because something else might store hierarchies more naturally. Postgres is not the perfect tool for every use case, but having hierarchical data by itself isn't enough reason to throw it away.

willlll · on Sept 6, 2012

My favorite thing to do with recursive WITH is the mandelbrot set. I used that one as an example on http://embedclip.herokuapp.com/