Hacker News new | past | comments | ask | show | jobs | submit login
PostgreSQL example of self-contained stored procedures (sivers.org)
209 points by weinzierl on Oct 27, 2019 | hide | past | favorite | 79 comments



I'm the author. Thanks to Ludwig for posting this here.

I've been doing all of my database work like this for 5+ years now, and love it. Many more examples here:

https://code.sivers.org/db-api/

It works great when you want to write simple shell scripts, too. They look as simple as this:

https://github.com/sivers/store/blob/master/getdb-example.rb...

People have been asking how I do unit testing. The main thing is to have a fixtures.sql file, which is a dump of data to test against: https://github.com/sivers/store/blob/master/store/fixtures.s...

Then in the setup before each unit test, just drop the database schema, and reload it:

https://github.com/sivers/store/blob/master/test_tools.rb?ts...

Examples of my API unit tests here:

https://github.com/sivers/store/blob/master/store/test-api.r...

And also testing things like database triggers, and private functions:

https://github.com/sivers/store/blob/master/store/test-db.rb...

Feel free to ask me anything here, or email me: https://sivers.org/contact

— Derek


Pretty much how we built client-server applications on Oracle RDBMS 25 years ago.

When your application grows, you'll have a large number of stored procedures. Oracle has packages which significantly eases the way you work with a large amount of stored procedures. Packages enable you to group related stored procedures into one object.

Packages have an interface (stored procedures and custom types available outside the package), and an implementation (package body). You can manage security at the package level, and you can hide the implementation of the stored procedures in the package body.

The postgres alternative, to use schema, isn't very good: a schema is more like a package in java, an oracle package like a class. Doing java with only packages, and no classes feels very limited.


I think the more accurate translation into modern Postgresql would be to develop as an extension. This should more or less cover same functionalities.

One downside is that you need to be careful around upgrade procedures and backup because postgresql extensions weren't initially developed for persistent data. So you have to declare clearly in the extension definition which table should be backed up and kept when updating the extension.

Also if you manage your repo wisely extensions are a great way to version control. I'm currently toying around with a mostly sql app delivered as an extension and so far it's working pretty nicely. But it's a side project and it's not really ready for prime time.


This was my introduction to server-side web programming as a kid; I was helping develop a very early online store (Australia, 94/95) for a local tabletop gaming retailer in Oracle 7 (IIRC). As well as being my first foray into web programming, it was the first time I got my hands on a grown-up RDBMS, Windows NT (still 3.5.1), and Java (I wrote a tetris knockoff as an applet for the site also). I was very surprised to find that my experience writing ADA was actually going to be useful out in the real world :)

Now I feel old :)


Hey Derek. For tests, have you tried pgTAP[1]?

Having all the db tests in SQL is very convenient and I think pgTAP could simplify your API tests.

Here are some examples of how testing with pgTAP looks like:

https://github.com/steve-chavez/socnet/blob/master/tests/fri...

(Loading schema/fixtures can be done with a `\i file.sql`)

[1]: https://pgtap.org/


I read this a few days ago. I was waiting for it to show on HN because I have a few questions.

I want to do the same, but with big databases (the TXID wraparound is a problem every few months). For now, the queries are stored in the code. They rarely evolve.

How do you manage versioning? Queries in code mean you can ensure everything match.

How easily other people interact with that? Not many developers know SQL well. Poor SQL skills can slow down a server for other users.

What did you gain in practice? It is nice, like CI/CD or version control, but I see little gains for something that does not change very often.

Currently, most of the queries are materialized views that can be refreshed separately. It seems simpler to me.

In practice, how is this different from refreshing materialized views?


Versioning: sorry I'm not sure what you mean. Same as if you had queries in an ORM and views using the results, you'd need to branch and make changes, then merge the branch together at once. I just do the same. If the database structure changes, I run the ALTER TABLE commands, when switching to that branch.

Other people : I guess like anyone choosing a language, you're excluding those who don't know it. I'm assuming more know SQL than Rust or Elixir or whatever. And more should.

Gain in practice : Two best improvements were:

(1) Having all data and data-logic (which might also be business-logic) in the same place. No worrying that some external code somewhere might have old data/business logic in it. One definitive source. Like Rich Hickey's classic "Simple/Complex Hard/Easy" talk - https://www.youtube.com/watch?v=rI8tNMsozo0 - I like that this is simple, un-complex. The data and external code don't need to be braided/complected together to work.

(2) The freedom do switch my external code from Ruby to Go or Elixir or whatever, and not have to rewrite all that functionality. It's all just simple API calls.

Sorry I haven't looked into materialized views yet, so I don't know how this compares.


Thanks a lot for your reply!

It seems the main gain for your is to move between languages and allow initial iterations/bugfixes without touching the application code.

The database has more maintenance issues (like rolling txids) than the application code. I am not sure I want to add more complexity and potential issues to the database.

FYI, a materialized view is a potentially long or complex query whose results are cached, so you can say 'select * from complex_query_result' to get them, and refresh the complex_query_result whenever you feel like it. You can also update the query that generates complex_query_result.

In practice, MV can give you speed (as you can refresh the MV when you need/want, while keeping the results) and also put the data-logic inside the database (as the MV is defined initially, and can later be updated) if you don't need super fresh results. If you do, use a regular view.

In either case, you can use the view approach when parameters are needed, iff you can reduce you query to where parameter=something on the view. Otherwise, you need to use languages like pl sql.

As the materialized views queries just return the results to be processed, and I have very little extra to do, your approach seems overkill for my use case.


Unless it changed very recently be aware that you can't update sql definition of a materialized as easily as you would update a view. There is currently no "CREATE OR REPLACE" option so any dependencie build onto a materialized view can quickly become a real pain (been there). Sometime, it might be easier to stick to the old trick of a table updated by a refresher function (possibly called as a trigger).


Dependencies are a general problem, like changing the type of a table that has dependent views. It's a good idea if your database update scripts/migration software can handle something like this. Other databases don't tend to be as strict as Postgres here (I only found out that some rarely-accessed views never quite worked after migrating from Oracle)

I've had good luck with these functions: https://gist.github.com/mateuszwenus/11187288

Allows you to save-and-drop dependent views (materialized or regular) and then restore them after your updates.


Deeply agree about dependencies being a general problem.

Thanks for the script that look pretty clever #bookmarking. Like particularly the approach of "drop what you saved, no more no less". DROP CASCADE is simpler but can have undetected side effect, if this script fail to backup all dependency, logically you will get an error when attempting to delete target and that rocks.


Personally I have update scripts for 'version control' of the MV and its dependancies, it is not very painful to do:

begin; drop ... cascade; create ...; commit;


You just talked about TXID wraparound, materialized views and then talking about not many developers know SQL well. You're all over the place with your questions.

This is very simple, you wrap all your logic as functions/stored procedures, have the DB do the work. This will work for 99% of businesses out there. How many people really wraparound ID? How many people really need to shard DB? Those 1% will know how to tackle the problem.

For most people by moving all these into the DB, your code is much simpler. Something like this should be encouraged more, it's far easier to implement something like this than deal with ORM.


im curious - why not insert from table value list instead of multiple insert statements in your fixtures population? Does this not work well in Postgres?


Only because sometimes I edit them directly, and remove columns or data there with vim.


One unmentioned reason this is powerful: It avoids concurrency bugs and inconsistencies introduced by write skew, in cases where the application layer (eg. Ruby/Python) makes business logic decisions based on data that has become stale.

These are hardest to detect and avoid in the case where business logic depends on no rows matching (count(*) == 0), eg. meeting room scheduling systems, because there is nothing for a transaction to lock!

Stored procedures can avoid this entirely, even under the weaker transaction isolation levels, by keeping all the logic in the DB and running it atomically. At Zipcar we took this approach to writing the web reservation system in the early 2000s. Later, it allowed us to easily add a telephone reservation system without duplicating business logic.

It worked great back then and it still works great, if you have good ways of managing the tradeoffs (version control, unit testing, etc).


> These are hardest to detect and avoid in the case where business logic depends on no rows matching (count(*) == 0), eg. meeting room scheduling systems, because there is nothing for a transaction to lock!

The predicate lock held by a read transaction at the serializable transaction level will, in fact, prevent this. If you need something that secured it between DB transactions because you need a longer-running business transaction but need to avoid long-running DB transactions, you can place a hold with a pending record in a reservations table and appropriate exclusion constraint without a stored proc.

> Stored procedures can avoid this entirely, even under the weaker transaction isolation levels, by keeping all the logic in the DB and running it atomically.

Isn't that just manually replicating, on an ad hoc basis, the machinery the server already has for implementing serializable isolation level? There may be times when doing this by hand is justified for performance or other reasons, but in general doing special-case replication of features for which the DB server had general solutions is a pretty poor use of development time.


Ive worked on several apps that tried this approach and each of them was a mess. I’ve seen two fundamental problems with building your app on top of stored procs:

* Tooling around sql is generally inferior to what’s available for <pick your favorite language>. I’ve yet to see a company with effective automated tests around their database... it’s far more common to have _no_ tests around the database. Even if you’re the unicorn that does have all of that figured out, it still tends to be more difficult for your devs to write and test code.

* Most devs are poor to mediocre when it comes to sql. You’re either going to have to hire more specifically for sql, or force your devs to do complex work (your business logic) in a toolset that they aren’t that good at.

IMO this is a case where a few applications may have significant concurrency issues that warrant the database business logic approach... but for your average app, it’s unnecessary and makes life more difficult for your team.


This has always struck me as odd, how can _any_ competent developer suck at SQL especially after spending maybe a week or two trying to write logic in it? As "languages" go it can't get any simpler, you're just directly forced to think about data in a model that's extremely close to the data itself. Seems to me that a dev that can't think well in SQL with minimal training is not a good dev at all, and this could actually be a nice filter.


> Stored procedures can avoid this entirely, even under the weaker transaction isolation levels, by keeping all the logic in the DB and running it atomically

I don't understand. Multiple SQL statements executed inside stored procedures are not run any more atomically than multiple SQL statements executed from application layer.


> Multiple SQL statements executed inside stored procedures are not run any more atomically than multiple SQL statements executed from application layer.

Yes, but not wrapping those multiple application-generated queries in a transaction is a common bug I've encountered at many of the places I've worked. It also requires a lot more round-trips to the database.


Can’t you just handle these sorts of consistency issues with uniqueness, foreign key, and check constraints?


I'm curious if anybody has practical experience developing a reasonably trafficked website in a similar manner.

I wrote apps which consisted mostly of oracle stored procedures when I first got out of college, and it was a pretty awful experience, but it was also at a place where the development knowledge was minimal.

I've since been getting closer and closer to writing postgres in this way. My current app (which I architected) sticks close to postgres and uses many of its features like custom types, enums and json, but we don't rely on it for stored procedures at all. Instead, all application logic lives in the go API app.

I've been considering moving some things directly into the database, but I'm nervous about it because the app runs well and I like the data/logic division. Also go seems easier to learn for new developers than SQL, and it's a lot easier to deploy a new app server than it is to reconfigure a database server.


Salesforce is built (or at least was ~7 years ago) heavily on stored procedures. Some projects, exclusively so.

Working there as a software engineer introduced me to a lot of dysfunctional things (as is true of all big tech cos) but pushing as much into the database as possible is one thing that worked beautifully and I fell in love with it. Since leaving Salesforce, I've embraced that pattern in three other projects and I've never regretted it.

It turns out that the database is the best place to work with data. It's verbose but has primitives that your application server simply can't have. And it's beautiful how querying data vs reading a variable have the same amount of cognitive overhead.

That is, the lines between what's in memory and what's coming from disk start to disappear. You just work with data, insert it when you're ready, and don't think about it again.

And the performance gains of not having to serialize data back and forth across the wire unnecessarily is huge.

I'd also be remiss if I didn't point out that if you are a fan of pub/sub patterns then triggers should be your favorite feature to use. They're just transactional pub/sub.


Every once in a while I encounter somebody who heavily promotes that way of working. There are a couple of arguments I've heard in favor of it. Often it's a mix of performance, ultimate data integrity, and "simplicity". I just don't buy it.

If performance is so critical, whatever you gain by moving everything to the database is lost by the fact that databases are harder to scale. You may be able to get by with a single instance a little longer, but when eventually you don't anymore, you've now worked yourself in a corner. It's an 80/20 game anyway, so just use an occasional stored procedure when it's really worth it.

For data integrity, I've always found databases not expressive enough. Sanity checks, okay, but you can't capture a non-trivial domain in some custom data types, check constraints and foreign keys. Even if you introduce stored procedures that will be responsible for keeping everything proper, you need to go crazy with permissions to block circumventing those. Might as well build your own API then. (I do find it difficult where to draw the line when it comes to what to enforce in the database still.)

"But you don't need another API! Just use the database as one!" Then I ask how they do testing, and the answer basically is: "We don't make mistakes or we find out about them (in production) soon enough." That pretty much ends the discussion for me. Surely there must be ways to devise a basic testing framework for stored procedures, but why bother? I don't want to spin up a database just to test some logic. Never mind testing, what about refactoring?

From a theoretical point of view, I can see the potential, but practically...? The tooling isn't there, and perhaps that's for a reason. Maybe databases are just not supposed to be used that way. Despite the extra functionality they offer, I treat them mostly as a data store.

What am I missing? I would love to be proven wrong.


We write most code in backend (go), but we DEFINITELY spin up a new, real SQL database in for every test in our automated test suite. ("CREATE DATABASE" is rather cheap; use a fresh docker DB instance for each suite run, or run it permanently while developing.

Mocking away DB is both an inefficient way to work and lets the tests cover less. I will never write software without running tests against a real DB again.

Our setup totally would let you test anything in stored procedures (although we do not use them much).


agree on the mocking the database - I wrote an entire test suite based on mocking the database responses, and then found a ton of live problems caused by the database - backend interaction (things like structs not aligning with the actual return records from functions, input format for arrays of ids being different from the mockup to the actual driver, and hstores).

Now I have a test database, which I create from a version-controlled schema.sql, and each test creates its own fixtures in the test db. Works way better.


The stored procedure can operate on foreign schemas, so could be scaled that way. Then you would have a bunch of lightweight PG engines with no data in them making calls to the upstream.

In practice though, the logic your application uses is going to generate the same queries anyway. So I’m skeptical that using stored procedures increases the workload of the database.

If you need to scale your database, you can do it with any standard replication strategy, and just make sure to bring the functions to the replica too. If you package them in extensions, this is easy.

To be clear, you’re not answering web requests from the database. There is still an application in between; it’s just much thinner, or can be some abstraction like postgrest or postgraphile. You scale that too.


That scaling dbs is hard is so perf argument makes no sense is a questionable claim. Almost nothing needs to scale horizontally-but almost all software is so slow that users notice. I agree with you that there is a big practical tooling problem.


> If performance is so critical, whatever you gain by moving everything to the database is lost by the fact that databases are harder to scale.

Performance and arbitrary scaling aren't problems that always occur together. Not every app with performance concerns is a mass audience consumer app.


I have worked in a team which maintained and improved a legacy piece of software generating a few hundred million euros of yearly revenue. Not technically a website but a critical backend piece of a very popular one. The application was written in Java and PostgreSQL, which leveraged stored procedures heavily. This was a _major pain_ for us.

While the separation of concerns was sometimes quite easy to understand between the business logic in the Java code and the business logic in the sprocs, there were times when it was an impossible, tangled mess. Debugging this thing was _hard_ and every deployment was fragile and hard to deal with.

There are of course many reasons as to why this particular piece of software evolved as it did. I can only say that if you plan to move any parts of the business logic to stored procedures, make sure that you have a good reason to do so and clear architectural patterns and rules to communicate and follow.


This used to be the standard architecture for “enterprise” apps. I’ve worked on lots of Java/Oracle apps like this, and there’s some pretty good reasons this isn’t widely used anymore. It makes your DB more of a bottleneck, which is the hardest part of your architecture to scale. You can’t use SQL for all of your business logic, so you end up fragmenting it across two codebases, which sucks, especially at scale. Databases are optimised for reading, writing and comparing large sets of data, and a typical app will have business logic that simply isn’t performant in a DB. This architecture is basically guaranteed to scale into performance bottlenecks and enormous amounts of complexity.

On the other hand, I don’t really buy the “most engineers aren’t familiar with SQL” line of reasoning. Maybe that’s true, but SQL is incredibly simple, I think any engineer could pick it up quite easily, and that what you learn from doing so will just make you a better engineer in general.


> I'm curious if anybody has practical experience developing a reasonably trafficked website in a similar manner.

YNAB's web app (YouNeedABudget.com) is architected in a similar manner, in that a lot of logic is performed in Postgres Functions. Rails is the API layer, (which lets Rails do authentication, authorization, and a number of other things that Rails does well), but many of the performance-sensitive calculations and sync algorithm are implemented as Postgres functions.

The SQL is version controlled of course, and the functions are DROPped/Redeclared when they change, so other than being written in PL/pgSQL the development experience is similar to any other deployed web app. We chose to do it not because we expect the wrapping language to change from Rails, but because our data is very relational, our calculation and sync logic is more "easily" and accurately expressed in PL/pgSQL, and we don't have to pay a performance penalty to bring a lot of data across the wire to Ruby, run logic on it, and then ship a lot of data back to the database.

I'm not advocating this as an approach for most web apps, but it's well worked for us. It's important to get your tooling right from the get-go so that the development experience is a good one.

> I've been considering moving some things directly into the database, but I'm nervous about it because the app runs well and I like the data/logic division. Also go seems easier to learn for new developers than SQL, and it's a lot easier to deploy a new app server than it is to reconfigure a database server.

Those are really valid concerns! It's difficult to find developers who are comfortable with _advanced_ SQL, so I believe that is especially true for new developers as you point out. I think it's easier to reason about higher level languages like Go than it is to perform advanced logic in a declarative language like SQL, but at least PL/PGSQL makes that easier with constructs like loops, "if" statements, etc. Although I don't regret the decision in this case, I would have to be presented with a _very_ similar problem to architect a future web app the same way. Otherwise, I'd stick with a language/framework the team is more comfortable with.


You may find Graphile interesting

https://www.graphile.org/


I prefer https://hasura.io

Instead of JS, it's written in Haskell (end user does not write haskell). It's incredibly performant and has hooks for database events.

Another similar tool is http://postgrest.org/en/v6.0/


Hadn't heard that before.

Why do you prefer it? Performance? Features? Ease-of-use?


Automated and consistent GraphQL APIs for all CRUD operations and a simple extension layer (through events or remote schemas) are our reasons to work with Hasura. It's fantastic. The performance is very good too because of how they optimise queries.


It's "cheating" a bit, but Postgrest is interesting in this space: https://github.com/PostgREST/postgrest


Yes definitely. I was involved in a project where we used Postgrest on top of a layered database. Postgrest served an api-schema consisting only of views where some had instead-of triggers needed for updates and inserts of data. Some functions and procedures were also present in the api-schema. The tables were kept in a separate data-schema and considered a private implementation detail in the same way one would do it with a more traditional architecture. During development we used tools like https://sqitch.org/ and https://pgtap.org/ to do schema-evolution and unit-testing.

The reason for these architectural choices was that we had a need to connect database-only clients such as https://qgis.org/ as well as web-front ends and wanted to route all requests and updates through the same logic. I think it worked out quite well. We were able to do quite heavy https://postgis.net/ lifting in the views with good performance.

Of course the tooling feels rather primitive during editing, but the round-trip and deployment is fast. Unit tests run faster than you might expect even with rollbacks and loading of test data between each test.


yes! I've seen that and I'm considering writing some sort of similar postgres -> protobuf layer so that I can automatically generate my grpc structs from postgres enums.

Right now a pain point for us is that we have a generated database layer, a generated protobuf layer, and we have to do lots of fiddly manual database -> grpc struct translations. Too easy to get off-by-one errors (because you usually need a null enum element in protobuf that you don't need in postgres, particularly)


If you have competent or better domain modeling skills, the database route is optimal. In general, Object layer domain modeling (typical in industry) is mainly addressing skill gaps in domain modeling, relational dbs, and SQL.

Of course, if you are writing something like Microsoft Office, where there will be pervasive reuse of object layer components, an Object Domain Model will be a better fit - it makes sense to define the system fundamentally at the object/component level.

In my experience in the industry, the rise of object layer modeling was due to multiple issues, some noted above and also including organizational factors such as the traditional DBA/Dev divide, licensing, etc.

So a pure technical approach to this question is, imo, non-optimal. It is ultimatley a business decision.


> I'm curious if anybody has practical experience developing a reasonably trafficked website in a similar manner.

I personally didn't develop but I've talked to people who did. The Estonian national web portal `eesti.ee` intially ran on stored procedures (and technically survived the DDoS of 2007 but was taken down "just in case" :S), later some Java garbage layer was added which significantly worsened the performance and made further development significantly harder.


This is an old pattern (like early 2000s MS SQL app development) and works wonderfully for the majority of applications.

It does become complex and awkward in two case, 1) variable option selection and 2) at a certain "scale" / quantity of inserts/deletes.

Variable option selection

Take for example a search interface of dependent items and item options, like cars. The query for car options for different car models generally returns a variable list of options. Further filtering results based on the variable list of options, when the list of options is all in one table, is most efficient with an "in clause". Trying to represent the "in clause" using functions/stored procedures is a little awkward as there is list construction and deconstruction involved, and sql is not great at list/string parsing.

Number of inserts/deletes

At some point, a successful app will reach a point where it's more efficient to batch insert & delete -- a couple hundred rows. This goes back again to using "in clauses' and the stored procedure approach makes things a bit awkward, as then the store procedure will need to argument parse and check.

Not saying what's done here is bad - it's actually really good for many, many reasons. I'd actually recommend the approach and avoid what I've described above unless it makes the desired functionality hard and complex.


This is how I do relational databases in 2019. My database is a self contained module more akin to an rpc api than a database. My code has virtually no database logic in it at all. I don't know but it feels like functional databases could be a thing. We should call Martin Fowler and ask.


SQL is already a pretty functional language. It's a nice low statefulness, transactional layer around a very stateful core of data. Fully exploiting this gives you many benefits. This is the reason noSQL databases tend to be subpar.

Also, huge are the performance benefits of not always pulling data across multiple layers of cache, across the network, back into multiple layers of cash of another computer to be processed by some script running far away from the original data. Processing and aggregating as much as you can as you are retrieving the data is often multiple orders of magnitude faster and more efficient than sending everything elsewhere to be processed.


A few years ago I wrote an accounting software using that approach. The C++ code was calling predefined set of stored procedures and most logic (especially constraints) was in db. Fixing bugs not related to GUI was easy and, in most cases, not visible to end users. We also had a request to let an external software to insert some data into certain tables. I just created the procedure for this particular case and guys from another company just integrated their software without any problems.


> I don't know but it feels like functional databases could be a thing.

This is how datomic ions work, i think.

https://docs.datomic.com/cloud/ions/ions.html


How is that not how databases are typically used?


Returning JSON to a client is relatively recent. The standard for web apps was to get data from the db and generate HTML on the server. A JSON layer would be a useless overhead.

Furthermore about everybody knows how to use an ORM. Not many people can write store procedures. Maybe the question is why is that so. I heard "I know [language], I don't know SQL, so I write queries in [language] " so many times even from people I would have expected to have a much broader knowledge. (That also applies to networking technology, they know HTTP and nothing more.)

By the way the code doesn't address authentication and authorization, those go in the layer between the client and the database.


Returning JSON might be recent, but multi-tier apps where one tier talks to the data storage and applies business logic served up to a presentation tier dates back to long before the web. The rise of the web caused a lot of developers without any history with multi-tier applications to push "flatter" apps, that is true, but in terms of software development in general it's not new.

Even in terms of web dev, well over a decade ago I worked on a system where we had a middleware layer that consisted of a service that handled the business logic and talked to the database. This middleware layer took requests from a web frontend layer did most of the presentation and served up in various formats. The web frontend never talked directly to the databases. It gave us flexibility in how we sharded things, and in applying different extents of access control (e.g. we could put different frontend services in front of the middleware) as well as use it to enforce constraints that were impractical to express in the database. How important that is certainly depends on the database you use.

It wasn't a new thing then either - I took inspiration for that system from much older systems.


Do you think it's an acceptable excuse for a dev to say they don't know SQL?


No, it's not acceptable. Still it's evident that it doesn't prevent getting a good job.


Stored procedures can also be a great way to scale out transactional workloads.

In Citus (a sharding extension for Postgres by Microsoft) we recently introduced a stored procedure call delegation feature.

If your tables are distributed by person_id and you have a stored procedure that takes person_id as a parameter, you can run e.g. "SELECT create_distributed_function('lineitem_add', 'person_id')". The procedure calls for a given person_id value will then be fully delegated to the PostgreSQL server that stores the data for that person_id without additional round-trips between statements or distributed planning overhead. It'll "magically" take care of procedures that access data from other servers as well.

I realize we're basically going full circle to PL/Proxy, except in Citus almost everything else (distributed queries, DML, DDL, transactions) works as well.


I’ve been using this pattern recently. Very early stages but I like it so far.

I wrote some boilerplate tools for easily loading schemata and extensions. This is a must have for being able to write isolated tests.

I keep all data definitions in schemata, which can be managed by standard migration scripts. I keep function definitions in extensions, which do not create any data stores but expose functions with well defined interfaces for querying the underlying data. I rely on standard Postgres extension loading to manage migrations.

I really like this pattern, but there is a degree of unapproachability to it. It takes a lot of effort to make sure tests work well - but it’s not so bad. Since your DB functions do everything, the app code is just a shell around those functions (few lines of code to call each function), so by testing those functions you are exercising their Postgres counterparts.

With tools like postgraphile and Postgrest available, this is becoming a more appealing option. What we need is more tooling for managing deployment, installation, migration, verification of and communication with the stored procedures from within application code. Combined with Postgres PL libraries like plv8, it should be possible to write JavaScript code within your application toolchain that you can test locally, but deploy to run in Postgres context.


I've been working on some projects and offloading as much as I can to the database. https://github.com/jimktrains/jskerp and https://github.com/jimktrains/jskplan being the two I'd like to get further along with.

Working in the DB like this is both very nice and ... very odd? It doesn't feel like most development environments and feels like going back to C or pre-object Ada (I believe Pl/SQL is based on Ada syntax-wise) or Pascal where you're writing lots of global functions that operate on structs.

Once I gain a little more experience developing like this, I'd love to think about that and see if there'd be a way to build a language that feels more "modern" and that can be transpiled to SQL (including building diffs from a current database).


How would somebody implement this for a team with versioning, source code view, and deployment?


One can create the stored procedure and store it in an .sql file which can be version controlled like any piece of code. These can then be deployed in a number of ways.


We keep our schema in an XML file which is kept our source control repository alongside the application code.

We have a tool to generate the XML and to take the XML and compare with a database and altering it to match the XML schema. Though we never delete data due to backwards compatibility, so no dropping of objects, reduction of column widths etc.

When deploying a new version the database is processed first, if successful the new executable is made available. If not it calls home so we can take a look.

To make changes in development, we commit a new XML to dev branch, we have a commit hook to upgrade the dev database.


I was this guy years ago in a waterfall style shop. We were a SQLServer c# all MS vertical and we did just this.

All stored procedures, table definitions, etc were checked in. There was a whole release process and QA to bundle the right versions of things into deployables which I did. It didn’t work flawlessly but it worked and when it failed it was straightforward enough that figure out why was easy.



If you are using SQL Server, you can create a SQL Server project in Visual Studio and keep all of the source code for your database objects (tables, views, functions, procedures, etc) in git.

The tools in Visual Studio will also help you create migration scripts and do schema/data diffs.


Why are you asking this? Just wondering why think it would not be doable.

Sql and stored procedures are just a textual source code. You can put it into version control, do code reviews, or build CICD pipelines for sql and stored procedures just like for any other language...

The only challenge is the deployment environment, as databases offer a stateful runtime engine, so you'll want to consider it's state. But state can also be managed in CICD pipelines, with scripting, sql-loader, backups, or similar.


This kind of approach can be good for getting data into the database, especially for non-trivial updates, but is not good for getting data out. The problem is if you want things like custom group-bys, sorts, joins, windowing, etc, you end up reinventing a lot of concepts, or copy+pasting procedures with slight variations. It basically takes SQL's biggest issue, query composability, and makes it even worse than it already is.


> The problem is if you want things like custom group-bys, sorts, joins, windowing, etc, you end up reinventing a lot of concepts, or copy+pasting procedures with slight variations.

I could see how you could get that impression if you've worked on a project with poorly planned functions, but our experience has actually been the opposite of that. On the contrary, using PostgreSQL functions has allowed our company to be more consistent and declarative with our return data and has helped us eliminate repetitive SQL code. Particularly so when used in conjunction with our RESTful APIs.

Don't forget that in Postgres you can define custom composite return types (i.e., a return type that includes another return type). You can have a Postgres function that returns nested objects like this (I'm using JSON for clarity):

[ "name":"Bob" ,"age":13 ,"address":"123 Main St" ,"cars":[ { "make":"Toyota" "model":"Camry" "license":"NF98549" }, { "make":"Toyota" "model":"Corolla" "license":"NF5649" }, ] ]

The ability to enforce that kind of return type for a given function is incredibly helpful

> It basically takes SQL's biggest issue, query composability, and makes it even worse than it already is.

I think that the bigger problems are ORMs like SQL Alchemy. Where it not only introduces a new layer of abstraction but also empowers engineers to request data from the database in a completely ad hoc "unregulated" manner. In large projects without a strong lead or code review process, it can quickly turn into a spaghetti mess that seriously complicates database re-factors.

When you use functions, you have clearly defined inputs and outputs, better re-usability and much more consistent interactions between the application layer and database layer.


The article basically discusses the OLTP side. You can always export from the DB to a more analysis-friendly OLAP system or just write a new set of views on the same DB that are geared for reporting.


I’m reminded of the fully-functional HTTP web server (!!) implemented in pure PostgreSQL for the PlaidCTF competition this year: https://cr0wn.uk/2019/plaid-triggered/

It does everything in SQL - header parsing, HTTP routing, and even includes a micro templating engine for generating pages.


As neat as this is, I'm not sure I'd recommend this - I'm not sure what the plan would be when it came to scaling.


Tell us about your scaling problem. Do you really have a scaling problem?


Ok, where does this go when it needs to grow beyond a single machine? Is it time to set up replication? Partitioning?


How many orgs do you know that need to grow beyond a single machine let alone an indie developer? This post was from an indie developer not some big corp engineering blog. I don't know what you mean by partitioning, I'm assuming sharding and not partition/splitting of tables, because this approach is not mutually exclusive to that. MySQL and PostgreSQL can scale up very vertical if designed well. 99.99% or even more never need to scale horizontally. Nevertheless, this can be scaled horizontally carefully a bit. You grow your replicas. Your main DB will be your transactional DB. All procedures that are doing inserts/updates/deletes and select will call that DB. All non transactional SELECTS will call the read only replicas and treat those as your analytical. You can easily have something like one main DB, 3-5 read only DB. Add caching, Add queues and you can have a system that can serve 1+ million users daily.


Why are stored procs still used? My experience with the black hole (or really I should say not in version control) pushed me away from them.


Why not put them in version control, and have your deployment process pull them out of the repo and apply them to the database?


It's a bit more complicated than just "upload all my stored procedures" in practice. I think we need a version control / deployment system that is designed for databases to make this work.

We want it to regard the database as the source of truth, and understand that while you can blow away a development database, you can't do that to prod.

We want it to be able to track which clients are using which stored procedures so it can prohibit dropping stored procedures that are in use. Ideally, when those clients upgrade to newer versions, it's then able to drop them automatically.

Stored procedures can bind to tables, views, types, etc. so it needs to be able to recommend migration plans.


I don't get it. You have a source file. It has your store procedures and what not. You deploy it. You have the server startup load it. Done.

What's the problem?

Well, you might complain that some schema changes are hard to make this way, and that's true, but for stored procedures it's straightforward, and even for other things you can manage and make the update atomic (though it'd be nicer to have better SQL language support for schema changes).

As with everything, you need to mind backwards compatibility. This is true for C, C++, Rust, Java, and SQL/PlPgSQL. There's nothing special about this case.


> You have the server startup load it.

We have to take down a production database to do this?

Or do you mean the app server is going to do it, in which case how do you handle different app servers with different versions of the stored procedure? Especially if we're doing phased deployments?


No, you can load it at any time. Just write (and test) SQL that is safe to load for upgrading a schema.


yep

CREATE OR REPLACE FUNCTION ...

what whatever it was, it now is what came out of version control


Thanks for the comments. They were helpful.


postgrest for those who dont know.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: