Hacker News new | past | comments | ask | show | jobs | submit login
A Different Type of SQL Recursion with PostgreSQL (github.com/vb-consulting)
98 points by vbilopav on Sept 16, 2023 | hide | past | favorite | 10 comments



Good comparison but seems to be missing how fast they run. If they're about the same, the skill set and expectations of the devs tips the scales. However if one is noticeably faster than the other, especially on a frequently run query, devs need to adapt to understanding the faster variant despite their initial comfort level.


A follow up on this article was written today

[Recursion with PostgreSQL, Follup 1, Perfomances](https://github.com/vb-consulting/blog/discussions/4)

I managed to get some really good perfomances (757K records in 5 seconds).

Also, method 1 contained a nasty bug :(

That's why next time I'll try to focus on debugging and testing.


Wonderful!

Just some possibilities for optimization of the recursive CTE. (I haven't tested it and don't know if it'd automatically improve things.)

    WITH RECURSIVE _recursive_cte AS MATERIALIZED ( … )
https://www.postgresql.org/docs/current/queries-with.html#QU...

Basically makes the temp table from your other implementation for you.

Then you could use built-in cycle detection to be able to simply the syntax while also allowing UNION ALL instead of UNION for some extra speed.

    CYCLE id SET is_cycle USING path
https://www.postgresql.org/docs/current/queries-with.html#QU...

> As far as I know, SQL was never even intended to do that in the first place.

49 years ago, certainly not. Postgres still can't beat a dedicated graph database for non-trivial hierarchies, but a lot of work has already been done in the last five years to get it a lot closer to that milestone.


Toy examples struggle here because if things are simple it is much cleverer to use multiple queries from a real programming language that can be debugged. Recursion in SQL is more challenging to debug than a loop in a language with flow control. And if the problem isn't a toy problem, trying to solve it with Postgres as the graph engine is probably going to open up a new can of worms that need to be dealt with.

It takes a strange situation for Postgres' recursion features to be a good idea. All 3 solutions would be red flags to me if I saw them in a real codebase - and accompanied by of a large number of other unmaintainable SQL functions. This is the fast way to turn a small for loop into a week of Senior Dev billable hours. Someday I'll see an exception to that rule of thumb, so far no luck.


> Someday I'll see an exception to that rule of thumb, so far no luck.

I used to think so too, then I found a use case. TBF, there exists an alternative that doesn't use recursion, but you wouldn't like that either, because it'd require a stored procedure / SQL UDF.

Requirement: Split a large sequential key-space (of a SERIAL PK) into the smallest number of non-overlapping, exhaustive key ranges where no range contains more than N keys. Extracting a range is very fast, so communicating with the app layer for coordination between two successive ranges would introduce too much latency overhead. But the DB also has a small timeout (order of minutes) on how long any statement may execute.

Breaking it down: we need iteration to start extracting the next key range after we've extracted one key range. But this iterative process cannot synchronise with app layer at the end/beginning of each iteration.

Solution: Synchronise with app layer / transaction manager every M iterations. If M is too large for statement timeout, reduce M and try again.

This can be done by either procedural code or by iteration via recursion in a single pass over the data. There exist alternative methods that can do this, but require multiple passes. In fact, the first attempted solution used one of those alternatives: bucket based partitioning.

But runtime performance objectives demanded the lowest latency, and the redundancy of extra passes could not be optimised away by PG 12. The single pass iteration-via-recursion method was twice as fast, and the recursion skeleton code was simple enough — equivalent to a recursive function with one tail call and an accumulator — so it was shipped.


It seems like the first variant is overly complicated. When you write about set theory you should be able to figure out that you don't have to rely on the effects of the union operator, the result you want is a simple difference with previous levels of the recursive cte, or a terminal query that groups by the table and shows the minimum level


A follow up on this article was written today

[Recursion with PostgreSQL, Follup 1, Perfomances](https://github.com/vb-consulting/blog/discussions/4)

I managed to get some really good perfomances (757K records in 5 seconds).

Also, method 1 contained a nasty bug :(

That's why next time I'll try to focus on debugging and testing.


this is where you really want to consider something like datalog instead.

but wouldn't an abstraction (I think like 'connect by') that computes the fixed point or transitive closure be really useful in many of these cases and involve less cognitive overhead (even if its less general)


I hadn't written SQL in 10ish years (while working in embedded, etc where I didn't need it.) and when I came back SQL had added the recursive stuff to the standard and OMG, is it ever awkward.

Their rolled out linear English-like syntax makes stuff like this so awkward, and this article is evidence of that, including the reach for the imperative style with the procedure and loops and so on.

I'd love for someone to rewrite the examples in here in Datalog (or RAI's "Rel") just to show how much clearer and brief they become.


> Recursive CTE queries are powerful but confusing and limiting. This old-school procedural programming approach may be a much better fit for developers untrained with SQL, and that is, in my opinion, the majority. On the other hand, the procedural approach may be something most developers feel comfortable with.

No please. Recursive CTEs are really easy to understand, and they are tail-recursive loops, so they are just loops, and they are easy to understand as such if loops is what you understand. Writing a loop in procedural SQL is not better, just a) wordier, b) not easily amenable to relational algebra. (b) is a big deal.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: