Migra: Diff for PostgreSQL schemas

gavinray · on Feb 26, 2022

This tool is incredibly useful

I would also like to share a very similar tool that is much less known:

It's the same thing, except instead of requiring two DB URL's to do the diffing, you give it a "schema.sql" file and a folder path that has your migrations in it, and the path to the current DB.

Then it introspects the DB, looks at "schema.sql" + the migration files, and figures out what you've changed in "schema.sql" that isn't in your migrations yet and generates the diff as a new migration. Really amazing workflow.

If you have trouble visualizing it, the workflow would be something like:

    1. DB is empty, migrations empty, schema.sql has "CREATE TABLE todo (id int, description text);"

    2. You run the diff, it generates a migration that contains the CREATE TABLE statement

    3. You modify "schema.sql", maybe adding a new column like "boolean is_completed" to "todo"

    4. You re-run the diff, it sees the DB has the table, the migration for the table is present, but a new column is added.
       So it generates an "ALTER TABLE .. ADD COLUMN .." migration.

    5. Rinse, repeat.

evanelias · on Feb 26, 2022

I work heavily in this space and can add some more details :)

Tusker actually uses Migra to power its functionality: https://github.com/bikeshedder/tusker#how-does-it-work

Tusker's flow is somewhat similar to sqldef https://github.com/k0kubun/sqldef , although the internal mechanics are quite different. Migra/Tusker executes the SQL in a temporary location, introspects it, and diffs the introspected in-memory representation -- in other words, using the database directly as the canonical parser. In contrast, sqldef parses the SQL itself, builds an in-memory representation based on that, and then does the diff that way.

I'm the author of Skeema https://www.skeema.io which provides a similar declarative workflow for MySQL and MariaDB schema changes. Skeema uses an execute-and-introspect approach similar to Migra/Tusker, although each object is split out into its own .sql file for easier management in version control, with a multi-level directory hierarchy if you have multiple database instances and multiple schemas.

Skeema was conceptually inspired by Facebook's internal database schema change flow, as FB has used declarative schema management submission/review/execution company-wide for over a decade now. Skeema actually predates both Migra and sqldef slightly, although it did not influence them, all were developed separately.

In turn, Prisma Migrate and Vitess/PlanetScale declarative migrations were directly inspired by Skeema's approach, paradigms, and/or even direct use of source code in Vitess's case. (Although they're finally moving to a parser-based approach instead, which I recommended they do over a year ago, as it makes more sense for their use-case -- their whole product inherently requires a thorough SQL parser anyway... and ironically, sqldef is based on the Vitess SQL parser!)

hackernewds · on Feb 27, 2022

> Tusker actually uses Migra to power its functionality: https://github.com/bikeshedder/tusker#how-does-it-Work

What a twist! Might we ask what field you work in? Seems niche

evanelias · on Feb 27, 2022

I'm a backend software engineer who often focuses on database automation and developer tools. Maybe niche in terms of expertise, but not niche in terms of market/userbase... a huge number of companies have databases and use developer tools :)

Just to be clear, among the tools I mentioned, I'm the author of Skeema but not any of the others. Skeema is MySQL/MariaDB specific, but I often get asked about similar tools for Postgres, so I try to stay familiar with the landscape.

djrobstep · on Feb 27, 2022

Author here, as evanelias notes in another reply comment, tusker is built on top of migra. My intention with migra was always to keep it as a pure diff tool, as opposed to a tool that enforces a particular workflow. That means you can either use tools built on top like tusker if you want, or roll your own functionality with migra directly, as you prefer.

djrobstep · on Feb 27, 2022

Hi, author here. While this is on the front page, a couple of updates regarding migra that you might be interested in:

- Development slowed on migra for a quite a while, but I've recently returned to active development, and we also have some new maintainers coming on board - so some longstanding bugfixes and feature requests will finally get the attention they need.

- If you're interested in migra and better database workflows, you/your employer might also like migra's associated paid offering, https://databaseci.com/, which offers flexible subsetting for postgres databases </plug>

anacrolix · on Feb 27, 2022

Forgot to open <plug> :)

dang · on Feb 26, 2022

(Reposts are fine after a year or so: https://news.ycombinator.com/newsfaq.html)

Inviz · on Feb 27, 2022

I used this tool very successfully to auto-generate migrations. 1) I maintain "schema.sql" file that contains the schema I want to have 2) That file then is ran in a temporary empty database 3) Migra would run to compare old version of dev database against temporary database 4) Output of migra is wrapped into a conditional that has a guard check to see if database is already migrated to that version (as determined by timestamp in table description) and put into a file with timestamp as a name 5) Each time app restarts it scans all migrations in the directory and runs ones that it thinks it needs to run

Either way i had zero issues with this setup. And I used a lot of postgres things including triggers, complex indecies, extensions, etc. All migrations were source-controlled and could be verified before commits, so it was pretty safe. When I needed to migrate a data, i'd just add code into the same migration file as the schema change. Brilliant project! thanks @djrobstep

password4321 · on Feb 27, 2022

A tool that barely gets the job done for SQL Server: https://github.com/opendbdiff/opendbdiff

And something that can begin collecting a history of DDL changes in a SQL Server database to compare stored procedure versions: https://github.com/unruledboy/SQLMonitor (among many other administrative features).

electroly · on Feb 27, 2022

If you're using SQL Server, I'd just use SQL Server Data Tools (SSDT). It does the job handsomely, it has first party support, and it provides full schema management. I've never heard of OpenDBDiff but SSDT is very popular.

password4321 · on March 2, 2022

Thanks for the pointer, it definitely deserves a shout-out! I'm not super familiar but I believe SSDT has pretty much it's own 'this is the way'.

Of course the primary difference is the price of Visual Studio / the usage limitations of the Community edition.

morelish · on Feb 26, 2022

Sorry for being an idiot. I don't quite understand the point of the tool. You have to have two different versions (x and y) of the same database running? And you're trying to work out how the versions diverged (e.g. what sql statements you'd need to issue to change x into y)?

I'm sort of confused. Is this for people who don't know how to write database migrations? Surely not. What is this for? Sorry for being stupid.

digitxpc · on Feb 26, 2022

The main use case I found was that my production database had a bunch of changes we did manually (early stage startup and all), so I used Migra to figure out what changes we needed to make to keep the migrations in sync with what was actually in production.

The more common use case is this idea in development—-experiment with different schemas manually and then use a tool like Migra to figure out what migration to write, without keeping in your head what changes you’ve made.

ForHackernews · on Feb 26, 2022

This might explain it: https://djrobstep.com/talks/your-migrations-are-bad-and-you-...

It's a (IMHO) better alternative to something like Alembic https://alembic.readthedocs.org/

morelish · on Feb 28, 2022

Thanks that is a helpful link.

djrobstep · on Feb 27, 2022

Hello, I'm the author. There are a whole variety of uses for tools like this, some of the most common:

- Autogenerate migrations. A diff tool can (not always but very often) generate the migration script you need automatically without reference a history of migration scripts.

- Test migrations and other changes. "OK, I've run my migration script - but does prod actually match my intended production state now?"

- Quickly iterating on database designs in local development. "Just added an int column to a table in my local dev database but i meant for it to be a bigint with a slightly different name - all I need to do is update my models and run sync again."

nicoburns · on Feb 27, 2022

We had an issue with our migrations at one point where we were using a “create new table, populate, drop old table, rename new table to old table” strategy for syncing data from an external source. Unfortunately our implementation flog this caused issues with indexes and constraints not being correctly propagated to the new tables, which in turn causes issues with applying migrations as a fresh database now had different indexes to our production database.

I used a tool like this to produce a diff of our prod database vs. a freshly created database with our migrations applied. Saved me a ton of time.

jvolkman · on Feb 26, 2022

Tools like this allow you to keep a single desired schema up to date and then auto-generate commands to move from the current database state to desired state. So instead of manually writing 'alter table foo rename column bar to baz', you just change the column name in the 'create table' section of your golden schema and generate the alter statement.

gregplaysguitar · on Feb 27, 2022

IMO the best way to use this is as a way to determine what’s changed between your canonical dB schema (well organised and easily understood DDL) and migrations (which are what gets deployed, but don’t work well as a dB reference). So you edit the schema, then use migra to generate a candidate migration for the changes which you review before deploying.

Tusker (mentioned in other comment) does it this way; my company has used the technique very successfully for several years via some lightweight tooling built around migra.

dboreham · on Feb 27, 2022

Any time you have two or more supposedly consistent copies of any data set (in this case database schema), you need a tool to tell you if they are indeed consistent. Because sync schemes are seldom perfect.