Is this about having application logic in multiple places, or having application logic co-located with the data? Does it make a difference if the procedures are database triggers vs being directly called by the application code?
Would splitting off some of the logic into a separate service (that does its own database accesses) have the same issues?
Say I have functionality to get customer data. If the sql logic is in code, I can create a new branch and work on it, merge my changes, I can do blue/green deployments. A rollback if you’re using Kubernetes is a simple matter of reverting back to the configuration file containing a reference to the old Docker container etc.
How do you roll back a dozen stored procedures quickly? How do you just create a new branch or test a regression?
With all of the logic in code, it’s a simple matter of checking out a previous commit.
Triggers are even worse - “spooky actions at a distance”.
> How do you roll back a dozen stored procedures quickly?
Liquibase rollback feature. The SQL changesets and rollback strategy is all defined in my .sql liquibase files and goes inside my git repo alongside my scala/python code the associated CICD automated integration tests + deployments.
Blue Green deployment can handled via bidirectional active-active database replication across your two disaster recovery databases; you deploy the newer stored procedures on one but not the other. Or you have sharded databases for some but not all customers and deploy blue/green with appropriate blue/green sharding. Or you could have separate version numbers for the stored procedure names (perhaps with a wrapper stored procedure that works differently on different shards/databasenames to isolate the impact from your non-SQL codebase.) Or just put a form of feature flags in your stored procedures; it's not 100% blue green but with minor integration test coverage it can be quite safe and allow you to run two code bases at once even in a single database.
Agree with you on triggers (and I agree to a point that stored procedures are often not the right approach despite my above rebuttal... but it is quite straightforward with the right tooling/mindset. For high-volume analytics with aggregations you might be better off leaving the data in columnar database SQL and doing some stored procedures if the code was complex enough; but for CRUD apps I would strongly discourage stored procedures.)
I agree with Liquibase for source control of schema.
But you don’t see how much harder this is for developers with feature flags in stored procedures, etc over standard git? There is tooling around feature flags for code
I think we are in violent agreement though, for OLAP and analytics, I wouldn’t care as much. Because of the way that Redshift works (columnar store) and how it compiles and optimizes queries across a cluster, I wouldn’t care about stored procedures.
If I need to fork code am I going to then make copies of all of the stored procedures corresponding with the code and now have getCustomer1, getCustomer2… and doing the same with all of your procedures?
And then do I need to change all of my stored procedure references in my code? Do I merge my changes back to the main one when done?
Isn’t just doing a “git branch” easier?
Rolling back, is just doing a git revert in your separate Kubernetes repository (you are doing proper Gitops aren’t you?), recommitting your code and your cluster is now using your old Docker container.
If you aren’t using K8s, you just push your old code.
I'm not really following what you are saying. Sprocs are in the same repo as any other code, a branch is a branch, why do you think you can't create a branch that contains sprocs?
Am I then going to deploy the stored procedures to a shared database that other developers are using with their own branch or am I going to have my own database instance?
Are all those developers running different branches of code with automated schema changes on the same database? What if someone adds or removes a field? Stored procedures are not the issue here
If I have logic that returns customer data in code, I can branch the code, make and test my changes locally and other developers can do the same without any conflicts.
Now imagine the next major release requires 50 different sql changes. Would that be easier to push and maintain one git branch or 50 stored procedures?
Again how do you rollback easily and effectively when your release depends on multiple store procedures?
Adding a nullable field shouldn’t be a breaking change, removing a field would be.
> Now imagine the next major release requires 50 different sql changes. Would that be easier to push and maintain one git branch or 50 stored procedures?
Why aren't those 50 stored procedure changes also in your one branch?
It kind of sounds like you only put some types of code in git, as opposed to everything. Is that correct?
And now we are getting back to having to deploy those 50 stored procedures to a database to test your code when you branch.
Are you going to use a shared dev database? What happens when other developers are also making changes to other branches?
I’m assuming you need to test with a representative size data set with a representative size database server.
Just like with traditional Kubernetes Gitops where you have your source code in one repo and a pipeline to create your docker containers and your K8s configuration in another repo to manage your cluster and update the referenced Docker images, even if you don’t have stored procs, you would keep your schema change sql files in another repo and “version” your database separately with a separate pipeline.
> And now we are getting back to having to deploy those 50 stored procedures to a database to test your code when you branch.
Are you going to use a shared dev database? What happens when other developers are also making changes to other branches?
Valid questions, but there are also valid answers, the primary one is of course "it depends".
Sometimes a shared DB works fine, as long as you manage other work being done to avoid conflicts.
Sometimes an isolated DB is needed for either just this work, or this work plus related project work isolated from other projects.
We do all of the above, and yes it takes effort, but that effort exists regardless of sprocs or not due to the nature of most of our projects and changes, rarely do we have small isolated work.
But I'm guessing our environment is not the same as your based on your statements, this is our environment:
1-Medium sized enterprise which means we run many apps from vendors, some on prem, some in cloud, as well as internally developed apps - in other words it's not a homogeneous environment and we don't control all aspects.
2-Functionality that frequently either sits on the side of operational systems, or in between the operational systems or as a layer on top that is unifying the business activity across systems and realizing a higher level of abstraction and activity.
3-Focus on end to end testing due to higher bug detection rate (i.e. around 70% bug detection) vs unit testing (i.e. around 30% bug detection), and due to work flowing through multiple apps frequently.
> even if you don’t have stored procs, you would keep your schema change sql files in another repo
Our schema changes are typically in the branch being merged and deployed. Our deployment process has pre and post DB actions sections.
But some changes are too big for that type of thing and require multiple day deployments with various business and technical actions happening in a specific sequence across multiple different apps (e.g. hold new activity for set of skus, stores and web site complete final transactions, clear DCs, switch to new process where those skus have a new unique flow through the systems and the enterprise).
Would splitting off some of the logic into a separate service (that does its own database accesses) have the same issues?