TFA is all about how to make changes where you drop elements of an enum, and how hard that is. The obvious thing though is not covered: don't do that! Instead you should:
a. Add CHECK() constraints on columns of
the relevant ENUM type checking that
the value is one of the "live" values.
b. RENAME "dead" ENUM values to indicate
that they're dead.
c. UPDATE ... SET column_of_that_enum_type = ...
WHERE column_of_that_enum_type IN (dead_values_here) ...
(c) can be done slowly, say, with a LIMIT clause to keep load and disruptions down. Yeah, you might have "broken" data for a while, but you can make your queries report better values for columns of that ENUM type in the same way that the UPDATE would fix them.
Glad to see this as the top comment, completely agree. After reading the article, was thinking that the only real downside of using enum types (but there is one more issue not mentioned, more on that below) is when you need to remove values. In reality, I've found removing enum values to be a very rare occurrence in prod. Removing an enum value fundamentally breaks backwards compatibility, so usually a better option is to essentially deprecate it, which is more like what you recommend.
Note one other thing I have found is that postgres enums are less "portable" than plain text columns. For example, if you're using BigQuery to query postgres tables as an external connection, BQ doesn't support enums, so your external connection query has to cast any enum return values (e.g. some_enum_column::text) before returning them to BQ, which can be a pain if you want to do "SELECT * FROM ..." some table that returns enum columns.
Apologies, I’m hallucinating features here. In our codebase we also have lots of JDBC extensions to map PG objects to language types and back, and this is the easiest way to handle queries over enums. But obviously some type info has to trigger the coercion, so SELECT * FROM might work in the context of an INSERT INTO but elsewhere you’d have to be explicit.
By the time you've done a generated column you might as well go with the articles solution of using text with a check constraint since you're now using the space anyway.
> Note one other thing I have found is that postgres enums are less "portable" than plain text columns. For example, if you're using BigQuery to query postgres tables as an external connection, BQ doesn't support enums, so your external connection query has to cast any enum return values (e.g. some_enum_column::text) before returning them to BQ, which can be a pain if you want to do "SELECT * FROM ..." some table that returns enum columns.
Yeah, one has to write more adapter queries to add those casts to text (or from text, for DMLs), but it's OK, and ENUMs are just really nice. And as u/faangsticle says, use VIEWs for that.
Good approach. And with Postgres `DOMAIN` types features to DRY up the `CHECK` the live constraint, this should be very palatable. I usually opt for enum tables and regularly find myself needing to declare `archived_at` columns for said tables. Did not know of the `RENAME` command either and what you describe renders native enums less warty.
Agree, PostgreSQL enum is really just for mapping with an enum in application code. And just like you should not remove an enum or change their ordinal in application code, don't do that to a PostgreSQL enum
I find it odd how many schema changes in modern RDBMS must be done on the whole table at once. You can split a table in chunks and recode each chunk gradually in a way which doesn't change the data in it (so no downtime) but removes dead entries like updated enums.
In a way you describe how we can emulate this process. The question is why the heck wouldn't databases do this themselves? Same with adding and dropping columns.
Consider how PostgreSQL encodes null for example, by skipping them in the row as fields, and adding them in a null bitmap in front of the row. Meaning... rows are not uniformly sized, there's no math like offset = row * rowsize + field_offset; kind of addressing for reading a field in PG where recoding some of the rows breaks the entire table.
And yet we have all those huge monolithic operations that need to be done atomically. So weird.