Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

TFA is all about how to make changes where you drop elements of an enum, and how hard that is. The obvious thing though is not covered: don't do that! Instead you should:

  a. Add CHECK() constraints on columns of
     the relevant ENUM type checking that
     the value is one of the "live" values.
  b. RENAME "dead" ENUM values to indicate
     that they're dead.
  c. UPDATE ... SET column_of_that_enum_type = ...
     WHERE column_of_that_enum_type IN (dead_values_here) ...
(c) can be done slowly, say, with a LIMIT clause to keep load and disruptions down. Yeah, you might have "broken" data for a while, but you can make your queries report better values for columns of that ENUM type in the same way that the UPDATE would fix them.


Glad to see this as the top comment, completely agree. After reading the article, was thinking that the only real downside of using enum types (but there is one more issue not mentioned, more on that below) is when you need to remove values. In reality, I've found removing enum values to be a very rare occurrence in prod. Removing an enum value fundamentally breaks backwards compatibility, so usually a better option is to essentially deprecate it, which is more like what you recommend.

Note one other thing I have found is that postgres enums are less "portable" than plain text columns. For example, if you're using BigQuery to query postgres tables as an external connection, BQ doesn't support enums, so your external connection query has to cast any enum return values (e.g. some_enum_column::text) before returning them to BQ, which can be a pain if you want to do "SELECT * FROM ..." some table that returns enum columns.


You can create an implicit cast to and from text with CREATE CAST and it becomes transparent.


Can you explain a little more what you mean here? I'm not sure this helps my use case. E.g. if I run a BQ external query like the following

    SELECT * FROM EXTERNAL_QUERY("my-pg-connection-string", """
        SELECT enum_column FROM foo;
    """);
that always fails for me, even if I define "CREATE CAST (my_enum_type AS text) WITH INOUT AS IMPLICIT;"

The only way I can get it to work is if I use "SELECT enum_column::text FROM foo;" and I don't need a custom cast definition to do that.


Apologies, I’m hallucinating features here. In our codebase we also have lots of JDBC extensions to map PG objects to language types and back, and this is the easiest way to handle queries over enums. But obviously some type info has to trigger the coercion, so SELECT * FROM might work in the context of an INSERT INTO but elsewhere you’d have to be explicit.


I think you would need a way to trigger the cast either way? Eg concatenate with a string?

Otherwise, maybe a function and a generated column (Ed: in a VIEW i guess, if it chokes on the raw enum)?

https://dba.stackexchange.com/questions/276477/immutably-con...


By the time you've done a generated column you might as well go with the articles solution of using text with a check constraint since you're now using the space anyway.


The CREATE CAST will help on DMLs, but not SELECTs.


Oh nice. I didn’t know about that feature.


> BQ doesn't support enums

Sounds like a reason to use views to work around the limitations of BigQuery


> Note one other thing I have found is that postgres enums are less "portable" than plain text columns. For example, if you're using BigQuery to query postgres tables as an external connection, BQ doesn't support enums, so your external connection query has to cast any enum return values (e.g. some_enum_column::text) before returning them to BQ, which can be a pain if you want to do "SELECT * FROM ..." some table that returns enum columns.

Yeah, one has to write more adapter queries to add those casts to text (or from text, for DMLs), but it's OK, and ENUMs are just really nice. And as u/faangsticle says, use VIEWs for that.


Good approach. And with Postgres `DOMAIN` types features to DRY up the `CHECK` the live constraint, this should be very palatable. I usually opt for enum tables and regularly find myself needing to declare `archived_at` columns for said tables. Did not know of the `RENAME` command either and what you describe renders native enums less warty.


Agree, PostgreSQL enum is really just for mapping with an enum in application code. And just like you should not remove an enum or change their ordinal in application code, don't do that to a PostgreSQL enum


I find it odd how many schema changes in modern RDBMS must be done on the whole table at once. You can split a table in chunks and recode each chunk gradually in a way which doesn't change the data in it (so no downtime) but removes dead entries like updated enums.

In a way you describe how we can emulate this process. The question is why the heck wouldn't databases do this themselves? Same with adding and dropping columns.

Consider how PostgreSQL encodes null for example, by skipping them in the row as fields, and adding them in a null bitmap in front of the row. Meaning... rows are not uniformly sized, there's no math like offset = row * rowsize + field_offset; kind of addressing for reading a field in PG where recoding some of the rows breaks the entire table.

And yet we have all those huge monolithic operations that need to be done atomically. So weird.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: