If you use Postgres, CTEs are optimization boundaries that make temp tables - you may end up inlining the CTEs or making them into regular temp tables so you can put indexes on them.
I've heard this but I haven't heard why this is. Do you know if this is deliberate or just falls out of the implementation? I wonder if they'd consider an explicit syntax to have them inlined. I love CTEs for organizations, would love to not think about them affecting performance someday.
It's deliberate, and sometimes it's helpful to have a way to force the ordering of work in the database.
One potentially simple option is to have the database temp space stored in RAM, by registering a ramdisk. That also helps large sorts, which implicitly make temp tables.
You can also manually inline everything, but if you have a lot of joins, you may end up tuning queries by changing join order. By default, if there are >8 joins it uses the order you give it. If you set the 8 higher the planning step takes a little longer - fine if it's for ETL but not ideal for a web app.
> One potentially simple option is to have the database temp space stored in RAM, by registering a ramdisk. That also helps large sorts, which implicitly make temp tables.
PostgreSQL sorts small amounts of data in RAM by default, choosing to spill to disk only for larger amounts of data. This threshold is tunable:
If you want it to sort everything in memory all the time, a high value for `work_mem` in postgresql.conf should do it. Alternately, like most parameters, you can set it:
* on a per-user basis (ALTER USER _ SET work_mem='2GB')
* on a per-session basis using SQL within the app (SET work_mem='2GB')
* on a per-session basis using `libpq` environment variables (export PGOPTIONS="-c work_mem=2GB"), or
* on a per-transaction basis using SQL (BEGIN; SELECT set_config('work_mem', '2GB', true); COMMIT;)
I've used each of these mechanisms to turn various knobs over the years. For example: one database has a small-ish global `temp_file_limit`, which was preferred and worked fine for years until a certain overnight job started failing. Rather than raise the limit globally, I changed it just for the single query in question.
> You can also manually inline everything, but if you have a lot of joins, you may end up tuning queries by changing join order. By default, if there are >8 joins it uses the order you give it.
This… isn't true. Pick two tables and compare:
EXPLAIN SELECT * FROM a JOIN b ON b.a_id=a.id;
EXPLAIN SELECT * FROM b JOIN a ON a.id=b.a_id;
PostgreSQL chooses the same plan for both queries. It may do `a` then `b`, or `b` then `a`, but in either case the plan will be stable given the set of analyzer statistics.
When the number of JOINs get large, the query planner does not consider every join order, since the number of possible orderings grows too large to search. This is where GEQO comes in: