An unexpected find that freed 20GB of unused index space in PostgreSQL

mjw1007 · on Feb 1, 2021

Summary: if you have an index on a column which is mostly NULL, consider using a partial index covering only the records where it's non-NULL.

latch · on Feb 1, 2021

Another benefit of partial indexes is to limit a unique constraint:

create index users_email on users(email) where status != 'delete'

SomeHacker44 · on Feb 1, 2021

Be very careful, then, as the optimizer will (usually?) not use the index if the condition is not part if the query.

jackTheMan · on Feb 1, 2021

you can create views for that, then it will be always part of the query.

cakoose · on Feb 1, 2021

That's the part they sold in the title, but there's a bunch of other useful stuff for someone operating Postgres in production, something I'll need to do in a few months.

mulander · on Feb 1, 2021

Partial indexes are amazing but you have to keep in mind some pecularities.

If your query doesn't contain a proper match with the WHERE clause of the index - the index will not be used. It is easy to forget about it or to get it wrong in subtle ways. Here is an example from work.

There was an event tracing structure which contained the event severity_id. Id values 0-6 inclusive are user facing events. Severity 7 and up is debug events. In practice all debug events were 7 and there were no other values above 7. This table had a partial index with WHERE severity_id < 7. I tracked down a performance regression, when an ORM (due to programmer error) generated WHERE severity_id != 7. The database is obviously not able to tell that there will never be any values above 7 so the index was not used slowing down event handling. Turning the query to match < 7 fixes the problem. The database might also not be able to infer that the index can be indeed used, for example when prepared statements are involved WHERE severity_id < ?. The database will not be able to tell that all bindings of ? will satisfy < 7 so will not use the index (unless you are running PG 12, then that might depend on the setting of plan_cache_mode[1] but I have not tested that yet).

Another thing is that HOT updates in PostgreSQL can't be performed if the updated field is indexed but that also includes being part of a WHERE clause in a partial index. So you could have a site like HN and think that it would be nice to index stories WHERE vote > 100 to quickly find more popular stories. That index however would nullify the possiblity of a hot update when the vote tally would be updated. Again, not a problem but you need to know the possible drawbacks.

That said, they are great when used for the right purpose. Kudos to the author for a nice article!

[1] - https://postgresqlco.nf/doc/en/param/plan_cache_mode/

GordonS · on Feb 1, 2021

> The database is obviously not able to tell that there will never be any values above 7

You say "obviously", but with updated statistics this is the exactly the kind of thing you might expect the planner to know and aid index decisions.

I'm a huge fan of Postgres, coming to it around 5 years ago from at least 10 previous years with SQL Server, but I have hit a few things like this in that time. IME the planner is much more fickle about how you specify your predicates than SQL Server is.

mulander · on Feb 1, 2021

No, I don't think statistics can let you get away with this. Databases are concurrent, you can't guarantee that a different session will not insert a record that invalidates your current statistics.

You could argue that it should be able to use it if the table has a check constraint preventing severity_id above 7 being ever inserted. That is something that could be done, I don't know if PostgreSQL does it (I doubt it) or how feasable it would be.

Is SQL Server able to make an assumption like that purely based on statistics? Genuine question.

GordonS · on Feb 1, 2021

> No, I don't think statistics can let you get away with this. Databases are concurrent, you can't guarantee that a different session will not insert a record that invalidates your current statistics.

Of course you can't make a guarantee like that, but why would you need to? Statistics are there to guide planner choices, not make cast iron predictions.

mulander · on Feb 1, 2021

Let us say that in our example table we have 100 000 records with severit_id < 7, 200 000 with severity_id = 7 and 3 records with severity_id = 8.

Statistics claim 100k id < 7, 200k id = 7 and 0 with id > 7. The last 3 updates could have happened right before our query, the statistics didn't update yet.

Let us assume that we blindly trust the statistics and they currently state that there are absolutely no values with severity_id > 7 and you have a query WHERE severity_id != 7 and a partial index on severity_id < 7.

If you trust the statistics and actually use the index the rows containing severity_id = 8 will never be returned by the query even if they exist. So by using the index you only scan 100 k rows and never touch the remaining ~200k. However this query can't be answered without scanning all ~300k records. This means, that on the same database you would get two different results for the exact same query if you decided to drop the index after the first run. The database can't fall back and change the plan during execution.

Perhaps I misunderstood you originally. I thought you suggested that the database should be able to know that it can still use the index because currently the statistics claim that there are no records that would make the result incorrect. You are of course correct, that the statistics are there to guide the planner choices and that is how they are used within PostgreSQL - however some plans will give different results if your assumption about data are wrong.

cmeacham98 · on Feb 1, 2021

Because the database has to decided whether or not to use the index. If it decides to use the index, and there are values above 7, then it will misbehave (the query will miss those results). Now of course the database could then scan the rows for values above 7 it missed but at that point there's no point in using the index and you might as well have row scanned for the original query.

As a result, the database has to be 100% sure that there are no values _at all_ above 7 to safely and efficiently use the index, ex. when there's a constraint.

mattb314 · on Feb 1, 2021

I doubt it? At least the number times the "last updated" column appears on SQL server stats [1] leads me to believe it collects stats async with updates to the table.

The only system I've heard of that relies on up-to-date statistics for correctness is snowflake (long but interesting talk here [2]), where having accurate max/mins for each micro partition is really helpful for cutting down the amount of data in the large range scan queries common in BI. I'd guess that being a BI system, snowflake can get away with higher row update latency too.

[1] https://www.sqlshack.com/sql-server-statistics-and-how-to-pe...

[2] https://www.youtube.com/watch?v=CPWn1SZUZqE

londons_explore · on Feb 1, 2021

All statistics in postgres are considered best effort guidance. Even if the statistics are wrong it can never impact the correctness of the results.

throwdbaaway · on Feb 2, 2021

In your example, the WHERE clause of the query and the partial index didn't match logically, i.e. the query may return rows that are not indexed. There's nothing that postgres can do, and I wouldn't classify the behavior as peculiar.

On the other hand, with sqlite, the WHERE clause of the query and the partial index must match *literally*. So let's say you have a partial index with WHERE severity_id != 0, and a query with WHERE severity_id = 1. All the rows with severity_id = 1 are already indexed, but the engine is still not able to make use of the partial index. This one bit us hard.

boomer918 · on Feb 1, 2021

Partial indexes can flip query plans if the covered part becomes so small that it won't be represented when sampled by the stats collector. The planner could then decide that the index scan isn't worth it and could try an alternative less efficient index if one exists.

tbrock · on Feb 1, 2021

Yeah and sadly using the index in those scenarios could be even more worth it due to the high selectivity it has.

Is PG smart enough to avoid that if the query patterns are frequently or exclusively covered by the index?

deathanatos · on Feb 2, 2021

> Coming from Oracle, I was always taught that NULLs are not indexed

That left me wondering how, if all indexes are by default partial in Oracle… how does one make an unpartial? nonpartial? index.

https://use-the-index-luke.com/sql/where-clause/null/index

Apparently, you add a computed column to the index that just computes a constant value. And single non-null column then causes the nulls in other columns to get indexed, it's only if the whole tuple is composed of nulls that it gets left out.

That also seems like a bug waiting to happen; someone inverts a query to find unset (NULL) entries, and now you're doing a table scan.

…but it seems also like a form of brain rot, induced by a particular implementation, e.g., similar to how I've had MySQL users ask how to make a key on a table. Where a "key" is an index, it's just that MySQL by default uses the word "key" to mean index, instead of … key¹. (The query language even supports "INDEX" in place of "KEY", but things like "SHOW TABLE" default to the "wrong" (linguistically, not programmatically) word.) And then you might have to de-tangle why these two are different concepts, how they're different. It's very Arrival, in the sense of language (mis-)shaping perception.

¹a key is a set of columns that are sufficient to identify a row. The primary such set of columns is … the primary key. An index can index a key (if more than one exists within a table), but it doesn't have to.

pottertheotter · on Feb 1, 2021

This has nothing to do with the content, but the design of this page really stuck out to me. It's very easy to read and doesn't have fluff. But it still feels modern (in the good way). It's perfectly balanced.

paxys · on Feb 1, 2021

Agreed! It's a perfect example of how you can make a website with "modern" features (responsive design, accessible, mobile friendly, dark mode, static pages, embeds) without it being a bloated mess.

de6u99er · on Feb 1, 2021

When I did my Oracle DBA training 15 years ago, I learnt about database reorgs.

It means basically exporting your database (or tables) and importing it again. What happens is that deleted data which doesn't necessarily free up space (Oracle reuses the freed up space sometimes) doesn't get exported.

https://www.iri.com/blog/vldb-operations/database-reorgs-why...

https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUEST...

brianwawok · on Feb 1, 2021

A vacuum full basically does this for a table, copying the data from location A to location B, cleaning up junk. I think index rebuilding may take a separate command?

Tostino · on Feb 1, 2021

Vacuum full does a index rebuild automatically. Since a vacuum full builds an entire new heap table, the old indexs are all pointing to the incorrect locations for all tuples, so it has no choice but to rebuild.

brianwawok · on Feb 1, 2021

Excellent there you go.

Is there a way to just do the index build part, short of dropping an index and adding it back?

Jedd · on Feb 1, 2021

This is described about a quarter the way into TFA.

> REINDEX INDEX CONCURRENTLY index_name;

(A vanilla REINDEX will lock the table, preventing writes, while it runs. The CONCURRENT creates a new index, replicates any updates to the original while it does so, and then does an atomic switcheroo at the end.)

jvolkman · on Feb 1, 2021

I believe pg_repack can do that.

jeffbee · on Feb 1, 2021

Dumping and reloading databases used to be mandatory for major postgresql updates, which is one of the reasons postgresql wasn't suitable for production workloads until recently and also why it resisted fixing bugs in vacuum, index, and compaction for many years.

AbacusAvenger · on Feb 1, 2021

Whoah, that's news to me.

I used PostgreSQL fairly recently (a year or so ago?) and ended up abandoning it after I was forced to do the export/import dance through a few version upgrades.

When did that requirement go away?

fer · on Feb 1, 2021

Since 9 there's pg_upgrade, personally I never had an issue and it was very fast, so the downtime is in the order of a few minutes, which is ok for my usecase. YMMV.

avereveard · on Feb 1, 2021

"pg_upgrade does its best to make sure the old and new clusters are binary-compatible, e.g., by checking for compatible compile-time settings, including 32/64-bit binaries. It is important that any external modules are also binary compatible, though this cannot be checked by pg_upgrade."

This makes me very nervous tho, I've at least two exts (trigrams and gists) maybe they work, maybe not, I just prefer the ease of mind of a old fashioned dump.

fer · on Feb 2, 2021

Don't take my word for it, but I think those extensions aren't considered "external" as they're part of the PGSQL distribution.

It'd be something like nominatim.so, which is external to PGSQL and (AFAIK) has its own format for certain types of data and indexes.

jskrablin · on Feb 1, 2021

Ever since there's pg_upgrade available. Since 8.4 or so - https://www.percona.com/blog/2019/04/12/fast-upgrade-of-lega...

Dump and reload is ok if you have a small database or can afford hours of downtime... if not, use pg_upgrade.

dialamac · on Feb 1, 2021

It never did.

The difference is that you can use logical replication since 10 to prevent downtime during upgrade.

Which if you were using it a year ago could have been done.

brianberns · on Feb 1, 2021

> REINDEX INDEX CONCURRENTLY index_name;

> If for some reason you had to stop the rebuild in the middle, the new index will not be dropped. Instead, it will be left in an invalid state and consume space.

Well, that sure sounds like a bug in PostreSQL to me.

striking · on Feb 1, 2021

Well, you can't just delete it. It is an object that was created by some user and there's no good reason for the database to get rid of it automatically. The database keeps a record of the invalid thing, even though it is invalid.

brianberns · on Feb 1, 2021

The good reason to get rid of it automatically: It takes up space.

Is there any good reason to keep it? (The fact that it was "created by some user" doesn't seem like much of a reason.)

IMHO, creating an index should be atomic: Either you end up with a valid index, or you end up with nothing.

striking · on Feb 3, 2021

It can't be atomic if it's done `CONCURRENTLY`. You can't even do that in a transaction, Postgres will tell you it won't work. By using `CONCURRENTLY`, you're making it clear that you will handle atomicity yourself, and as a result Postgres needs to provide you with the tools to do it.

Want real atomicity? Don't use `CONCURRENTLY`.

Buttons840 · on Feb 2, 2021

I had the same thoughts. Is there any reason to keep it? Is there any scenario where that invalid index could be useful?

jeltz · on Feb 3, 2021

The reason is to make sure aborting the command is fast and safe. PostgreSQL has no infrastructure for spawn background jobs to clean up stuff like this on aborted queries. A patch would probably welcome if it could be done in a good way.

voganmother42 · on Feb 1, 2021

pretty well documented behavior as far as concurrent: https://www.postgresql.org/docs/current/sql-createindex.html...

ivoras · on Feb 1, 2021

Is the partial index technique to avoid indexed NULL data as effective for PostgreSQL 13+?

It looks like in v13+ PostgreSQL could create a single leaf for NULL data and just store row pointers within it, which should reduce data sizes at least a bit.

mattashii · on Feb 1, 2021

Not per se _as effective_, but it will still help a lot. NULL tuples pre-pg13 take ~ 14 bytes each, and 18 bytes when aligned. (= 2 (ItemID, location on page) + 6 (TID) + 2 (t_info) + 4 (NULL bitmap) + 4 bytes alignment). When deduplication is enabled for your index, then your expected tuple size becomes just a bit more than 6 bytes (~ 50 TIDs* in one tuple => 2 (ItemId) + 6 (alt tid) + 2 (t_info) + 4 (null bitmap) + 50 * 6 (heap TIDs) / 50 => ~ 6.28 bytes/tuple).

So, deduplication saves some 65% in index size for NULL-only index-tuples, and the further 35% can be saved by using a partial index (so, in this case, deduplication could have saved 13GB).

*note: last time I checked, REINDEX with deduplication enabled packs 50 duplicates in one compressed index tuple. This varies for naturally grown indexes, and changes with column types and update access patterns.

mattashii · on Feb 5, 2021

heh, my calculation was incorrect: ItemID is 4 bytes in size, so the calculations are slightly off:

pre-13 was 16 bytes each (20 when 64-bit compiled), and post-13 it is 6.32 bytes/heap tuple when deduplication has kicked in.

chrismeller · on Feb 1, 2021

He actually mentioned index de-duplication earlier: https://hakibenita.com/postgresql-unused-index-size#activati...

If I had to guess, I would say that it doesn't accomplish anything (or as much as you'd think) for null values simply because there is no real data to store in either approach, you just have a bunch of pointers either way.

petergeoghegan · on Feb 1, 2021

NULL values are not special as far as deduplication is concerned. They use approximately as much disk space as a non-NULL integer column without deduplication, and compress just as well with deduplication. Deduplication is effective because it eliminates per-tuple overhead, so you see most of the benefits even with index tuples that naturally happen to have physically small keys. You'll still get up to a 3x decrease in storage overhead for the index provided there is low cardinality data (and not necessarily that low cardinality, ~10 or so tuples per distinct value will get you there).

The NULL issue is documented directly -- see the "Note" box here:

https://www.postgresql.org/docs/devel/btree-implementation.h...

matsemann · on Feb 1, 2021

> Clear bloat in tables

Ohh, we've had issues with this. We have this table that's mostly ephemeral data, so rows are constantly inserted and then deleted after a certain amount of time. Due to a bug the deletion didn't work for a while and the db grew very large. Fixed the deletion, but no amount of vacuuming actually allows us to fully reclaim that space so we don't have to pay for it.

At the same time the extra cost is probably negligible compared to spending more energy fixing it..

hinkley · on Feb 1, 2021

The problem we always ran into with deletes is them triggering full table scans because our indexes weren't set up correctly to test foreign key constraints properly. Constant game of whack-a-mole that everyone quickly grew tired of. Also more indexes increases the slope of the line for insert operations as data size grows.

Another solution is tombstoning data so you never actually do a DELETE, and partial indexes go a long way to making that scale. It removes the logn cost of all of the dead data on every subsequent insert.

mulander · on Feb 1, 2021

> The problem we always ran into with deletes is them triggering full table scans because our indexes weren't set up correctly to test foreign key constraints properly.

This is a classic case where partitioning shines. Lets say those are logs. You partition it monthly and want to retain 3 months of data.

- M1 - M2 - M3

When M4 arrives you drop partition M1. This is a very fast operation and the space is returned to the OS. You also don't need to vacuum after dropping it. When you arrive at M5 you repeat the process by dropping M2.

> Another solution is tombstoning data so you never actually do a DELETE, and partial indexes go a long way to making that scale. It removes the logn cost of all of the dead data on every subsequent insert.

If you are referring to PostgreSQL then this would actually be worse than outright doing a DELETE. PostgreSQL is copy on write so an UPDATE to a is_deleted column will create a new copy of the record and a new entry in all its indexes. The old one would still need to be vacuumed. You will accumulate bloat faster and vacuums will have more work to do. Additionally, since is_deleted would be part of partial indexes like you said, a deleted record would also incur a copy in all indexes present on the table.

Compare that to just doing the DELETE which would just store the transaction ID of the query that deleted the row in cmax and a subsequent vacuum would be able to mark it as reusable by further inserts.

pilif · on Feb 2, 2021

have a look at pg_repack. That'll solve it for you.

> but no amount of vacuuming actually allows us to fully reclaim that space

a full vacuum would. but it would also lock the table for the duration (which is something pg_repack won't do)

matsemann · on Feb 2, 2021

Yup, locking the table is off the table so to speak, heh. Will take a look, thanks.

nieve · on Feb 1, 2021

The article includes a couple of useful queries unrelated to the "find" and led me to these useful bloat-detection resources https://wiki.postgresql.org/wiki/Show_database_bloat https://github.com/ioguix/pgsql-bloat-estimation

lucian1900 · on Feb 1, 2021

Partial indexes can be useful in any case where one value has much higher cardinality than others.

Indexing boolean columns is often only useful if one of the values is uncommon and the index is partial to only include those uncommon rows.

mnw21cam · on Feb 1, 2021

Agreed. To explain why this is the case, consider that table in the story that had 99% NULL values. If you were to try to run "SELECT FROM table WHERE column IS NULL", then Postgresql wouldn't use the index anyway, because it would be faster to just read sequentially through the entire table and filter out the 1% that don't match.

maweki · on Feb 1, 2021

That would highly depend on what you select. If the query could be answered by index only, like COUNT(*), it would probably use the index. You are right if you want to query any data from that row that's not in the index.

mnw21cam · on Feb 1, 2021

I might be out of touch a little with Postgres (I last used it in 2010), but my impression was that COUNT(*) still needed to scan the actual table in order to exclude rows that had been deleted in a transaction, due to the way multi-version concurrency worked. Is this something that has been improved since then?

zeroimpl · on Feb 1, 2021

They support index-only scans now, so there is some sort of optimization which bypasses the table lookup, at least in certain cases.

MrStonedOne · on Feb 1, 2021

>There are several ways to rebuild a table and reduce bloat:

>Re-create the table: Using this method as described above often requires a lot of development, especially if the table is actively being used as it's being rebuilt.

>Vacuum the table: PostgreSQL provides a way to reclaim space occupied by dead tuples in a table using the VACUUM FULL command. Vacuum full requires a lock on the table, and is not an ideal solution for tables that need to be available while being vacuumed:

This is confusing to me, i thought postgre was suppose to be better then mysql, yet mysql has a non-locking command to recreate a table. it has like 3 that would fit here, AND deal with the indexes in one command.

malinens · on Feb 1, 2021

Too bad MySQL does not have partial indexes.

We have one huge table I want to add some indexes for specific cases (for max 1% of records) but server will not have enough memory for it if I add those indexes for all records :/

crazygringo · on Feb 1, 2021

As long as you've got primary keys on the huge table, there's a hacky solution -- create a second table with columns for just the first table's primary key and the columns you're indexing and your desired index, and ensure you always write/update/delete both tables simultaneously using transactions. Then when needed, use the index on the second table and join it to your first with the primary key.

Annoying, but it should work for most queries I'd expect without too much SQL.

I've definitely "rolled my own indexing" like this in the past, though it's more often been duplicating strings into a custom "collation" or other transformations.

Another solution is simply to split your table in two, with the same columns in both, and the index only on one of the tables. But of course that really depends on your business logic -- queries that need to retrieve data from both tables together can get pretty hairy/slow, and if you've got auto-incrementing PKEY's then avoiding collisions between the two tables can be tricky on its own. So this is definitely the less general solution.

Of coure it certainly would be nicer if MySQL supported partial indexes. It seems so useful, I'm surprised it didn't happen long ago.

lucian1900 · on Feb 1, 2021

The first approach is one of the steps towards normalising a database.

crazygringo · on Feb 1, 2021

Actually it's the opposite of database normalization.

Normalizing removes data redundancy. This adds data redundancy.

When I design a database structure, it's common to start with the most normalized representation possible. And then to denormalize the minimum necessary for performance reasons -- duplicating rows and/or columns just like here so certain data can be retrieved more quickly, whenever indexes aren't powerful or featured enough.

fauigerzigerk · on Feb 1, 2021

I think what lucian1900 may be thinking is that instead of

    create table purchase_order (
        id int primary key,
        ordered_on timestamptz not null,
        customer_id int not null references customer,
        canceled_on timestamptz
    );

you could have

    create table purchase_order (
        id int primary key,
        ordered_on timestamptz not null,
        customer_id int not null references customer
    );

    create table order_cancelation (
        order_id int primary key references purchase_order,
        canceled_on timestamptz not null
    );

This is indeed a better normalised schema and it allows you to index order_cancelation.canceled_on without worrying about nulls.

crazygringo · on Feb 1, 2021

Oh then, absolutely. I was assuming a constraint that columns couldn't be removed from the original table. But if you can, then your example is an even better solution.

lucian1900 · on Feb 1, 2021

Exactly, that’s what I thought was being described.

abfan1127 · on Feb 1, 2021

Could you create a temporary high memory slave MySQL server, sync the master, add the index, sync back to master, and decommission the temporary high memory? I don't know enough about master/slave operations to know if it would work.

wolf550e · on Feb 1, 2021

'malinens doesn't have the storage space to store an index over a column if all values in the column are indexed. They want to index only some values, but not others. This feature does not exist in MySQL.

fraktl · on Feb 1, 2021

MySQL has pluggable storage engines. TokuDB does what you're after (adds indexes on the fly, as well as alter tables on the fly without overloading the server).

hu3 · on Feb 1, 2021

This page about TokuDB reads:

> TokuDB has been deprecated by its upstream maintainer. It is disabled from MariaDB 10.5 and has been been removed in MariaDB 10.6 - MDEV-19780. We recommend MyRocks as a long-term migration path.

https://mariadb.com/kb/en/tokudb/

Is MyRocks comparable?

wolf550e · on Feb 1, 2021

Altering table online without using pt-online-schema-change doesn't help if they want an index that covers only some of the keys but not others.

gangstead · on Feb 1, 2021

The included query for finding which indexes in your database could benefit from a partial index is amazing. Thanks for putting the extra effort into this post.

alexfromapex · on Feb 1, 2021

It seems like this is an optimization that Postgres should handle internally, doesn't it?

tantalor · on Feb 1, 2021

Graphing "free storage" is meaningless and confusing; it should be "used storage".

Available storage depends on usage and capacity.

Edit: I meant for this article; of course I believe it is useful to track this in practice.

fanf2 · on Feb 1, 2021

Free storage is what matters because it makes it very obvious when you are getting close to a disk-full outage.

AnotherGoodName · on Feb 1, 2021

Used makes sense for getting a feeling for pure performance (smaller the better as its likely to be in memory).

Available makes sense for knowing when things will just plain break (reaching 0 = write failure for a DB).

>Every few months we get an alert from our database monitoring to warn us that we are about to run out of space.

In this case they were avoiding their DB server breaking. They didn't do this for performance reasons.

zufallsheld · on Feb 1, 2021

If you pay for a fixed amount of storage and only use it partially, you should also monitor free storage sdso you know when you waste it.

pierrebai · on Feb 1, 2021

The chart seems to show an uptick of 2GB, not 20GB. Am I missing something?

bombcar · on Feb 1, 2021

The note is in a bad place, the whole slope is the 20gb.

bombcar · on Feb 2, 2021

Are there things you can do to check a MySQL/mariadb instance? I see a command called mysqlcheck I may have to investigate.

gyrgtyn · on Feb 2, 2021

wow 20GB