Let's say for the sake of discussion that your MySQL db has a lot of tables encoded as "utf8." Are there any known drawbacks or gotchas to converting them en masse to "utf8mb4"? Is this a lengthy operation?
Schema changes to large tables are always huge pain in the ass in MySQL. Every team I’ve worked on goes to great lengths to avoid changing the schema, which leads to all kinds of exciting anti-patterns like recycling old fields even if their name no longer matches what is stored in them.
I suspect this is because for large databases, schema changes lock the tables for extended periods of time, and if this time exceeds a reasonable time for a service window, you need good database replication. And you probably don’t have good MySQL database replication unless you’re on AWS or similar.
That’s exactly what it is. If you have a big table you are gonna have a pretty long outage while you update that schema. Like in the last place I worked our largest table would take more than an hour for a schema update. So we never did schema updates.
At my last workplace, we had a service window of one weekend. Prior to this, an application had to be restructured to even make the schema upgrade able to finish in one weekend.
So you're right but also wrong thanks to good tooling. If you have enough discipline to not put triggers on your tables then pt-online-schema-change has got you sorted. Does it take forever? Yep, but does it matter that it takes forver? Nope because it doesn't lock tables.
There are collation changes between utf8 and utf8mb4. Specifically, utf8mb4 uses the (much better) UCA 9.0 algorithm.
But this implies that all indexes will have to be regenerated, if they contain a column that is utf8 and will be converted. As they sort differently, this requires rebuilding the index from scratch. Depending on data size this can take measurable time and effort.