I don't have all the details anymore. But one of the non obvious things for me was that foreign key cascades where not in the binlogs. I also think that some changes in the database layout could lead to strange things on the replicas.
> I also think that some changes in the database layout could lead to strange things on the replicas.
I've been using MySQL for 23 years and have no idea what you're referring to here, sorry. But it's not like other DBs have quirk-free replication either. Postgres logical replication doesn't handle DDL at all, for example.
You can tune binlog_expire_logs_seconds to control how long old binlog files stay around. The default is 2592000 seconds (30 days) which is often too long.
The performance impact depends substantially on whether you've configured it to fsync the binlog on every group commit.
Also, it's important to consider that replication and backups serve different purposes. Backups alone are insufficient for high availability, change data capture, point-in-time recovery / undoing a bad change, etc.
They should have addressed it much earlier, but it makes way more sense in historical context: when MySQL added utf8 support in early 2003, the utf8 standard originally permitted up to 6 bytes per char at that time. This had excessive storage implications, and emoji weren't in widespread use at all at the time. 3 bytes were sufficient to store the majority of chars in use at that time, so that's what they went with.
And once they made that choice, there was no easy fix that was also backwards-compatible. MySQL avoids breaking binary data compatibility across upgrades: aside from a few special cases like fractional time support, an upgrade doesn't require rebuilding any of your tables.
Your explanation makes it sound like an incredibly stupid decision. I imagine what you're getting at is that 3 bytes were/are sufficient for the basic multilingual plane, which is incidentally also what can be represented in a single utf-16 byte pair. So they imposed the same limitation as utf-16 had on utf-8. This would have seemed logical in a world where utf-16 was the default and utf-8 was some annoying exception they had to get out of the way.
OK, but that makes perfect sense given utf-16 was actually quite widespread in 2003! For example, Windows APIs, MS SQL Server, JavaScript (off the top of my head)... these all still primarily use utf-16 today even. And MySQL also supports utf-16 among many other charsets.
There wasn't a clear winner in utf-8 at the time, especially given its 6-byte-max representation back then. Memory and storage were a lot more limited.
And yes while 6 bytes was the maximum, a bunch of critical paths (e.g. sorting logic) in old MySQL required allocating a worst-case buffer size, so this would have been prohibitively expensive.
That's an absolutely ridiculous assertion. Do you similarly think that the Battlestar Galactica reboot was a thinly-veiled racist show because they frequently called the Cylons "toasters"?
> I don't understand how you can run multiple postmasters.
I believe they're just referring to having several completely-independent postgres instances on the same host.
In other words: say that postgres is maxing out at 2000 conns/sec. If the bottleneck actually was fork rate on the host, then having 2 independent copies of postgres on a host wouldn't improve the total number of connections per second that could be handled: each instance would max out at ~1000 conns/sec, since they're competing for process-spawning. But in reality that isn't the case, indicating that the fork rate isn't the bottleneck.
Also check out ProxySQL [1][2], it's an extremely powerful and battle-tested proxy. Originally it was only for MySQL/MariaDB, where it is very widely used at scale, even despite MySQL already having excellent built-in scalable threaded connection management. But ProxySQL also added Postgres support too in 2024 and that has become a major focus.
+1 to ProxySQL, especially in RDS environments with huge monoliths attached that open a shitload of threads. RDS has fixed max_connections depending on the instance size so if you don't want to pay $$$$ for bigger but underused instances - and while you are trying to get the devs update all the hundreds old dependencies in the monolith to improve it, ProxySQL - can save your day. It did, for me.
And yes, it's a self-managed system but it's pretty easy to operate and very stable.
You can change the max_connections in RDS though. The default is insanely high and I have no idea what it is that way. 4vCPU instances running with 5k max connections iirc, I have never seen an instance this size handle more than 100-200 concurrent connections on a CPU bound workload.
There is still a max capped on the instance size [1] (well, RAM available) although now that I'm checking it again I swear it was different last time I read it. Anyway I did totally had issues with big replicated monoliths with their own connection pool (i.e. old Rails) maxing out connections on RDS.
Be sure to look at the actual sqldef command-line tool, not the trivial copy-and-paste demo on their website. Declarative schema management is best used combined with a Git repo.
In the big picture, declarative schema management has lots of advantages around avoiding/solving schema drift, either between environments (staging vs prod) or between shards in a sharded setup (among thousands of shards, one had a master failure at an inopportune time).
It's also much more readable to have the "end state" in your repo at all times, rather than a sequence of ALTERs.
Yes, we've used skeema for this for many years, and it is just plain lovely. Putting into source control your desired end state is so much more intuitive and understandable than accumulating migrations. In a way it's like the difference between jQuery and React -- you just say how you want it to look like in the end, and the computer does the work to figure out how to make it so.
Not sure what that means, but it's named partially as a nod to Skee-Ball. The town I grew up in was actually the home of the factory where all Skee-Ball machines were made.
I was using a location-related naming scheme in general at that time; similarly my automation library was called Go La Tengo because I was living in the town where the band Yo La Tengo was from.
Out of curiosity, the post you linked mentions that it won't work for renames. What's the approach for these and other types of procedural migrations, such as data transformations (ie: splitting a column, changing a type, etc.)
With a declarative model, would you run the migration and follow immediately with a one off script?
For both data migrations and renames, there isn't really a one-size-fits-all solution. That's actually true when doing data changes or renames in imperative (incremental) migrations tools too; they just don't acknowledge it, but at scale these operations aren't really viable. They inherently involve careful coordination alongside application deploys, which cannot be timed to occur at the exact same moment as the migration completion, and you need to prevent risk of user-facing errors or data corruption from intermediate/inconsistent state.
With row data migrations on large tables, there's also risk of long/slow transactions destroying prod DB performance due to MVCC impact (pile-up of old row versions). So at minimum you need to break up a large data change into smaller chunked transactions, and have application logic to account for these migrations being ongoing in the background in a non-atomic fashion.
That all said, to answer from a mechanical standpoint of "how do companies using declarative schema management also handle data migrations or renames":
At large scale, companies tend to implement custom/in-house data migration frameworks. Or for renames, they're often just outright banned, at least for any table with user-facing impact.
At smaller scale, yeah you can just pair a declarative tool for schema changes with an imperative migration tool for non-schema changes. They aren't really mutually exclusive. Some larger schema management systems handle both / multiple paradigms.
Personally I've always called this style "declarative schema management" since the input declares the desired state, and the tool figures out how to transition the database to that state.
sqldef is really cool for supporting many database dialects. I'm the author of Skeema [1] which includes a lot of functionality that sqldef lacks, but at the cost of being 100% MySQL/MariaDB-specific. Some other DB-specific options in this space include Stripe's pg-schema-diff [2], results [3], stb-tester's migrator for sqlite [4], among many others over the years.
The more comprehensive solutions from ByteBase, Atlas, Liquibase, etc tend to support multiple databases and multiple paradigms.
And then over in Typescript ORM world, the migrators in Prisma and Drizzle support a "db push" declarative concept. (fwiw, I originated that paradigm; Prisma directly copied several aspects of `skeema push`, and then Drizzle copied Prisma. But ironically, if I ever complete my early-stage next-gen tool, it uses a different deployment paradigm.)
pgroll is definitely interesting; it overlaps between both schema management tools and online schema change tools. At scale, I would want to have a better understanding on the performance implications of putting views in front of hot tables. I don't like its use of YAML. iiuc, it is an imperative migration tool, not a declarative tool.
Grate is also imperative, not declarative.
That last link (david.rothlis.net) is about a declarative tool for sqlite, not Postgres.
> Personally I've always called this style "declarative schema management" since the input declares the desired state, and the tool figures out how to transition the database to that state.
Personally I've called it a mistake, since there's no way a tool can infer what happened based on that information.
For schema changes, it absolutely can, for every situation except table renames or column renames.
That might sound like a major caveat, but many companies either ban renames or have a special "out-of-band" process for them anyway, once a table is being used in production. This is necessary because renames have substantial deploy-order complexity, i.e. you cannot make the schema change at the same exact instant as the corresponding application change, and the vast majority of ORMs don't provide anything to make this sane.
In any case, many thousands of companies use declarative schema management. Some of the largest companies on earth use it. It is known to work, and when engineered properly, it definitely improves development velocity.
Uh, any database of sufficient size is going to do migrations “out of band” as they can take hours or days and you never have code requiring those changes ship at migration start.
Small things where you don’t have DBA or whatever, sure use tooling like you would for auto-changes in a local development.
Very large tech companies completely automate the schema change process (at least for all common operations) so that development teams can make schema changes at scale without direct DBA involvement. The more sophisticated companies handle this regardless of table size, sharding, operational events, etc. It makes a massive difference in execution speed for the entire company.
Renames aren't compatible with that automation flow though, which is what I meant by "out-of-band". They rely on careful orchestration alongside code change deploys, which gets especially nasty when you have thousands of application servers and thousands of database shards. In some DBMS, companies automate them using a careful dance of view-swapping, but that seems brittle performance-wise / operationally.
Right, but my point was that renames in particular typically can't go out well before the corresponding application change [1]. Thus, renames are "out of band" relative to the company's normal schema change process. (This is orthogonal to how schema changes are always "out of band" relative to code deploys; that wasn't what I was referring to.)
[1] In theory a custom ORM could have some kind of dynamic conditional logic for table or column renames, i.e. some way to configure it to retry a query with the "new" name if the query using the "old" name fails. But that has a huge perf impact, and I'm not aware of any common ORMs that do this. So generally if you want to rename a table or column that is already used in prod, there's no way to do it without causing user-facing errors or having system downtime during the period between the SQL rename DDL and the application code change redeploy.
Not to mention apps that may have differing versions deployed on client infrastructure with different test/release cycles... this is where something like grate is really useful imo.
What "strange stuff around replication" are you referring to?
reply