Cool example. But, doing this will create a tight coupling between business logic , the state machine, and storage/persistence, Postgres.
If ever you decide you want to migrate away from Postgres, you'll need to rewrite all the business logic into some other out of process language. If ever you want to scale this, such that you want to calculate states in batches, you'd want to have the business logic somewhere else...
> If ever you decide you want to migrate away from Postgres, you'll need to rewrite all the business logic into some other out of process language.
Yeah, but I don't consider this a bad thing. IMO migrating between databases should always require a lot of rewriting. If it doesn't, you're most likely underutilizing the features provided by your database.
> If ever you want to scale this, such that you want to calculate states in batches, you'd want to have the business logic somewhere else...
I'm not sure I follow. The application I'm using this in has > 1 billion rows and we frequently re-compute the state of all our entities across the entire data set in batches after we make changes to our logic. Having the code that does this in the database avoids having to move large amounts of data between our db and the application.
> IMO migrating between databases should always require a lot of rewriting. If it doesn't, you're most likely underutilizing the features provided by your database.
Very likely true. But when you get locked into Oracle and it makes it nearly impossible to move because of the features your relying on, has a big effect on shaping your perspective on this.
> The application I'm using this in has > 1 billion rows and we frequently re-compute the state of all our entities across the entire data set in batches after we make changes to our logic.
That is very impressive. Mind sharing the rate of change across all of those rows?
Also, I'm curious. Is this all that DB instance does? Or is it responsible for other data as well?
> But when you get locked into Oracle and it makes it nearly impossible to move because of the features your relying on, has a big effect on shaping your perspective on this.
I hear you. Picking a database is major decision and shouldn't be made lightly. That being said, I hear a lot of companies are successfully migrating from Oracle to Postgres these days. And in fact, this application was actually migrated from Couchbase to Postgres, but that's a story for another day perhaps :).
> That is very impressive. Mind sharing the rate of change across all of those rows?
Nowadays its about 5 million inserts / day with peak rates of 150 inserts / sec. Not too crazy, but it adds up over time :).
> Also, I'm curious. Is this all that DB instance does? Or is it responsible for other data as well?
It does a few other things, but this is the main workload.
The only thing worse than paying through the nose for Oracle is paying through the nose for Oracle and treating it like a really expensive version of MySQL/Postgres. If you're paying for it, might as well get some value for your money and use those features.
Well, this is what the SQL standard is for: to reduce the amount of work to be done in migrating between databases.
That said, I much prefer to put all this logic into SQL, and to use PostgreSQL, than the alternatives. There's any number of reasons for this:
- direct access to the DB is not dangerous if the logic needed to keep it consistent... is in the DB
- you don't have to replicate this logic if you add new front-ends
- SQL is very expressive, so you'll end up having less code using SQL than anything else -- this also means that RDBMS migration costs need not be as high as one might think, since there will be less SQL than the equivalent for the front-end
- updating business logic is easier to do atomically
Yeah, the idea of having a SQL standard is great. But in practice a "standard" is somewhat meaningless without rigorous compliance testing, and unfortunately SQL hasn't had this for 20 years [1]. But I'll take SQL over any of poor reinvention of relational algebra any day, so it's still better than the alternatives by far :).
I might add that it's also fairly easy these days to unit test in the database itself, especially in pg, so moving logic down is a perfectly acceptable practice.
> If ever you decide you want to migrate away from Postgres[…]
it's one of the things to keep in mind. Depending on your application, migrating away from postgres will be more or less painful. If you're already deeply invested in postgres features, this is just one more problem to solve.
> If ever you want to scale this, such that you want to calculate states in batches, you'd want to have the business logic somewhere else
yes. This has the all the usual issues of putting (some) business logic into the database. On the other hand, by using this, you're basically just creating a data integrity constraint similar to a foreign key, just one not as widely supported.
Still. If you ever plan to move to a database that doesn't support foreign key constraints, you will have to implement the business logic somewhere else.
For me, data integrity is paramount. If ever I can put a data integrity check directly into the database, I will do it because bugs in the application logic can exist and when the database itself enforces integrity constraint, I'm protected from those.
I don't want to have to deal with, to stay in the framework of the article, a shipped, but unpaid order. Was this a bug in the application? Did it actually ship? Did the payment fail?
If the database blows up on any attempts to store invalid data, I'm protected from having to ask these questions.
> If the database blows up on any attempts to store invalid data, I'm protected from having to ask these questions.
I'm always surprised when people fight using constraints. During dev, I see them as a godsend for spotting problems - I don't know how many bugs having self-enforcing data structures has caught.
I will say that I have, under protest, turned them off in production once for performance. A particular heavily-used flow was annoying due to a large number of FK checks on an intermediate step. But never during development, and given the number of FK-violation errors I've seen in production code, preferably not even then.
Fixing code is almost always so much easier than fixing data.
I hear the argument about switching storage layers a lot, and I don't completely disagree with it, but in my 20+ years writing code, I've never found a case where switching storage engines didn't cause a massive rewrite even when the storage layer was used in an agnostic way. I'm sure there are examples where it has worked, but saying it as if it's a maxim just feels wrong to me.
It feels like you can go an entire career without switching databases. And if you still have most of the original dev team, the cost of rewrites are probably not as high as they're made out to be (by telling everyone to plan for switching databases at some undefined point in the future)
Loose coupling has a lot of other advantages but this seems to have become the biggest selling point for a loosely-coupled data layer.
> It feels like you can go an entire career without switching databases.
I wish I could say that. I've done more than I care to count. The issue is usually one of the following:
1) converting between types of storage layers (SQL/NoSQL/flat file). I've never seen an abstraction layer that could handle MongoDB and later be converted to Postgres for example (a real migration I had to do once) and still do justice to either backend.
2) Hopefully you pick a storage layer for a good reason. For example, say you picked Postgres because you have Geospatial needs. Business later dictates you have to use MySQL "for reasons", what abstraction layer is going to help there to not require code refactor?
3) If you are doing things right, there's more than App code that interfaces with the DB. You've got the entire devops chain (backups, automation, etc) that will likely need to be rewritten. This can take an immense amount of work as well.
Anyway, not saying it's not possible, but I've found it a good litmus test of another engineer if they think moving to another storage layer should be simple. It's not always, but still more often than not shows inexperience or over optimism.
That's interesting. I have successfully designed systems which could be easily migrated between different RDMS' with minimal changes. Obviously to go to no-sql or other less conventional storage is a different story.
It is an interesting question about how successful people are in migrating to different storage engines.
> I have successfully designed systems which could be easily migrated between different RDMS' with minimal changes.
Not that I doubt you, because I know with the right trade offs it's possible, but is the operative word "could" or did you actually ever migrate one of those architectures?
> Obviously to go to no-sql or other less conventional storage is a different story.
Ruling those out feels a bit disengenuous. There are plenty of good reasons to need to migrate to or from a relational DB.
Even if you stick to the SQL standard as much as possible engine specific syntax almost always creeps in unless you routinely test against all target systems from the start (which you might not be able to justify the time to do on many projects where being platform agnostic is not initially a core high-priority requirement).
And even where there are no feature or syntax issues there may well be optimisation differences. For instance going from MSSQL to postgres you might hit a significant difference with CTEs because MSSQL can perform predicate optimisations through them but postgres doesn't - this might mean core queries need to be refactored significantly for performance reasons (to avoid extra index or table scans) if not functional ones.
(not intending to pick on postgres here, I'm sure there are similar examples in the other direction and between other engines, but this is the most significant example that immediately springs to mind).
My thoughts exactly. I don't do a ton of DB programming, but I've only ever written one thing in a non-agnostic way. We got a requirement that users wanted to copy an entire "project" which was the top level of a hierarchy. I wrote an Oracle routine to do the deep copy. I did it because I imagined what the PL/SQL would look like (clean) vs. what the Java code would look like (considering Hello World in Java is ugly, you can see where I'm going ;-)
Mine too! Having logic embedded into SQL functions seems to be an anti-pattern to me (it's harder to maintain and harder to do release management). While it's great that Postgres can do this (btw, I love Postgres), I suspect there aren't many people will use this feature in production environment.
I used to think this, but have relaxed that view over time.
For constraints, validity checking, things like adding/updating timestamps, and other things that are about data integrity, about the only time I don't do that in the DB is when outside information is involved such that it can't be. Otherwise, to the extent possible, I want the datastore to only accept valid data. This goes to the notion that fixing code is easier than fixing data, so the store can not only defend itself, but also help catch bugs.
There are also times when dealing with huge amounts of data that doing whatever you're doing in an SP is the only way to get decent performance. Dragging enormous tables over the network to process is sometimes really wasteful. If you're tuning indexes and whatnot against that model, you're already changing the DB, and doing so at a level that is implicit rather than explicit, and in ways that can change out from under you (if the statistics change).
Actually, I'm under the impression it's way more common to migrate language or framework than data storage. I externalize most of my business logic to postgres for that reason (plus, it's incredibly performant, especially since you don't keep requesting connections from connection pool for each single part of the computation). My backend app handles request sanitizing, providing endpoints, etc. All data processing is made in the database. Love it.
> If ever you decide you want to migrate away from Postgres, you'll need to rewrite all the business logic into some other out of process language.
Of you follow this rule you'll never be able to use the most powerful features of postgres.
> If ever you want to scale this, such that you want to calculate states in batches, you'd want to have the business logic somewhere else...
Not due I'm totally on board with the premise here, but the need to calculate state outside the database is a generally valid one and so it might be a good idea to write the core of the transition function in JavaScript and implement the postgres functions in PLv8.
If you ever decide you want to migrate away from Postgres, it's because there's another platform that supports significant functionality that Postgres doesn't. Changing out the data tier is not a decision made on a whim.
If there is significant functionality we want to take advantage of in this new platform, then that means that we have to refactor anyway to use this, because by definition we either don't do it today, or we do it with an insufficient workaround in Postgres.
The migration necessitates large refactoring by its very justification.
This implies that a need to migrate away from Postgres will arise. For most companies it is very likely that such a need will never materialize. Meanwhile Postgres keeps getting better and better. https://wiki.postgresql.org/wiki/New_in_postgres_10
It's not just about migration. There are deployment issues as well.
I like to think of each layer in a tiered system as having differing deployment requirements and time tables. Front-end systems will be very frequent, backend systems possibly less so, though not necessarily, and then DBs ideally infrequent and they generally take longer.
At a minimum, keeping them decoupled is freeing for patching bugs and releasing features independently. It does raise the bar for keeping all changes compatible with existing systems.
There are good deployment tools for databases, e.g. ones for which all of the database 'code' objects (or just all of the objects period) are maintained in source control and updates are either automatic or scripted.
Of course updating a database is much harder than overwriting executable or library files, but a lot of the objects in a database should be safely updatable by simply dropping and recreating them.
The tricky changes are of course things like, e.g. splitting one column into two or merging two columns in two tables into one. But those changes are even harder to do the 'dumber' your database is, i.e. the less logic there is in it that enforces a certain level of quality in its data.
Yeah, I've seen that before and it looks promising.
My favorite, and the only one I've used extensively, is [DB Ghost](http://www.dbghost.com/). What I like about it compared to all others I've run across is that it, by default, will automatically sync your target DB (e.g. a production DB) with a model source DB that it also builds automatically.
So instead of scripting out every schema change as explicit SQL statements and queries you just maintain the scripts to build the model source DB, e.g. to add a column to a table, instead of creating a SQL script file to `ALTER TABLE Foo ADD COLUMN Bar ...` you just update the existing SQL script file with the `CREATE TABLE Foo ...` statement. When you deploy changes – 'sync' a target DB in the DB Ghost terminology – it automatically detects differences and modifies the target to match the source.
The benefit being that neither you nor the DB Ghost program needs to explicitly perform every single migration since the beginning of time. Only changes that need to be explicitly handled as migrations, but, with a little customization, doing that is pretty easy too.
The bar now for me working with databases is whether I can create a new 'empty' database (with test data) in a minute or two and whether I can automatically deploy changes (or, as I'm doing now, generate a deployment script automatically). Given that, I can actually do something like TDD for databases, which is really nice, especially if there's significant business logic in the database (which there almost always is in my experience to-date).
If ever you decide you want to migrate away from Postgres, you'll need to rewrite all the business logic into some other out of process languag
You'll rewrite your front end 20 times in 20 different "frameworks", and your app 5 times in 5 different languages, for everytime you actually change databases
If ever you decide you want to migrate away from Postgres, you'll need to rewrite all the business logic into some other out of process language. If ever you want to scale this, such that you want to calculate states in batches, you'd want to have the business logic somewhere else...