Another postgresql performance gotcha: Find all the coupons that are expired (90...

mahkoh · on Jan 18, 2022

They are not equivalent since `created_at + INTERVAL '90 DAY'` can overflow for every single row whereas `now() - INTERVAL '90 DAY'` is a constant for the purpose of the query execution.

CWuestefeld · on Jan 18, 2022

Yes - this is a common restriction in any DB I've used, certainly in MS SQL Server. The idea is that your queries need to be "SARGable": https://en.wikipedia.org/wiki/Sargable

magicalhippo · on Jan 18, 2022

The DB we use (SQLAnywhere) doesn't consider now() a constant either, so no indexes considered just due to that alone #thingsilearnedinproduction

Hjfrf · on Jan 18, 2022

Wow, that's a dealbreaker for me.

OJFord · on Jan 18, 2022

But that would never be desirable, so it's just another reason to do the other?

smt88 · on Jan 18, 2022

Yeah this seems very logical to me. I wouldn't call it a "gotcha".

qwertox · on Jan 18, 2022

What does "can overflow for every single row" mean in this context?

mahkoh · on Jan 18, 2022

The maximum value in a postgres timestamp is `294276-12-31 23:59:59.999999`. Overflow means that `created_at + interval '90 days'` exceeds this value. This causes an error.

qwertox · on Jan 19, 2022

Ah, so a traditional overflow. It's just that it didn't come to my mind that dates this huge would actually be used, so I was wondering if it was some other thing being referred to. Thanks.

dec0dedab0de · on Jan 18, 2022

I was wondering the same thing, but after staring at it a bit I think the problem is that one of them is doing math on the values from every row and the other is doing the math once.

created_at + INTERVAL '90 DAY' < now() says that for every row take the created_at column, add 90 days to it, and then see if it is less than now()

created_at < now() - INTERVAL '90 DAY' says take now() subtract 90 days, and then see which rows are less than the result.

Atleast, thats my guess. I rarely do any db stuff directly.

qwertox · on Jan 18, 2022

Is "overflow" a term used to express "computed for every row"?

I can see where the optimization would come from, when comparing `created_at` with a fixed value `now() - INTERVAL...` (assuming PostgreSQL is smart enough to evaluate it only once and reuse it for all the index comparisons), but the word "overflow" throws me out of the lane.

cosmotic · on Jan 18, 2022

This smells less like an optimization developers should make and more like a bug or low-hanging-fruit improvement to the engine.

pingsl · on Jan 18, 2022

These 2 predicates are totally different.

The predicate in the 1st statement is actually an expression "created_at + INTERVAL '90 DAY'", it's not column "created_at".

Some databases allow users to create indexes on expression. So if you want to write the 1st statement, you need an index on expression, not a normal index.

jfbaro · on Jan 18, 2022

Wow, is there any public list or documentation about these common cases and how to make them faster in PG? I would expect the PG query optimizer to fix this automatically, but as it doesn't, having this documentation would be of great use for many developers. Thanks for sharing!

cldellow · on Jan 18, 2022

Sites like https://use-the-index-luke.com/ capture a lot of wisdom around tuning. But IMO, it's easier to learn from doing.

So write your product, then start monitoring it as you release it to production.

Postgres can track aggregate metrics for queries using the pg_stat_statements extension [1]. You then monitor this periodically to find queries that are slow, then use EXPLAIN ANALYZE [2] to dig in. Make improvements, then reset the statistics for the pg_stat_statements view and wait for a new crop of slow queries to arise.

[1]: https://www.postgresql.org/docs/current/pgstatstatements.htm...

[2]: https://www.postgresql.org/docs/current/using-explain.html

scwoodal · on Jan 18, 2022

When releasing a new application (or feature) I've always loaded each table in my development environments database with a few million rows. Tools like Python's Factory Boy [1] or Ruby's Factory Bot [2] help get the data loaded.

After the data is loaded up, start navigating through the application and it will become evident where improvements need to be made. Tools like Django Debug Toolbar [3] help expose where the bad ORM calls are or also by tailing Postgres log files.

[1] https://github.com/FactoryBoy/factory_boy

[2] https://github.com/thoughtbot/factory_bot

[3] https://github.com/jazzband/django-debug-toolbar

inopinatus · on Jan 18, 2022

It can't "fix" it because it isn't broken; they are not the same predicate.