Hacker News new | past | comments | ask | show | jobs | submit login

Let's say for the sake of discussion that your MySQL db has a lot of tables encoded as "utf8." Are there any known drawbacks or gotchas to converting them en masse to "utf8mb4"? Is this a lengthy operation?



Schema changes to large tables are always huge pain in the ass in MySQL. Every team I’ve worked on goes to great lengths to avoid changing the schema, which leads to all kinds of exciting anti-patterns like recycling old fields even if their name no longer matches what is stored in them.


I suspect this is because for large databases, schema changes lock the tables for extended periods of time, and if this time exceeds a reasonable time for a service window, you need good database replication. And you probably don’t have good MySQL database replication unless you’re on AWS or similar.


That’s exactly what it is. If you have a big table you are gonna have a pretty long outage while you update that schema. Like in the last place I worked our largest table would take more than an hour for a schema update. So we never did schema updates.


At my last workplace, we had a service window of one weekend. Prior to this, an application had to be restructured to even make the schema upgrade able to finish in one weekend.


sounds like how I approach laundry


So you're right but also wrong thanks to good tooling. If you have enough discipline to not put triggers on your tables then pt-online-schema-change has got you sorted. Does it take forever? Yep, but does it matter that it takes forver? Nope because it doesn't lock tables.

And even if you do have triggers https://github.com/github/gh-ost you can still do it.


There are collation changes between utf8 and utf8mb4. Specifically, utf8mb4 uses the (much better) UCA 9.0 algorithm.

But this implies that all indexes will have to be regenerated, if they contain a column that is utf8 and will be converted. As they sort differently, this requires rebuilding the index from scratch. Depending on data size this can take measurable time and effort.


The author of original article links to this one in their post:

https://mathiasbynens.be/notes/mysql-utf8mb4#utf8-to-utf8mb4

And yeah, there's gotchas in there, the main one being that strings that barely fit now won't fit, and also maximum lengths on indexable strings.


I can’t imagine there would be too many, other than it is a tedious operation not easily automated. Depending on your data, might take a while.

Can’t believe that neither MySQL or MariaDB has created a utility to do the conversions for you automatically.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: