Going to take the risk and politely say I do not agree with this article at all....

ako · on Nov 1, 2023

There are databases (and data warehouses) that need to provide random query api’s, e.g, for reporting, analytics, etc. Databases can be considered to provide an api layer themselves, way more flexible than most standards APIs provide. Graphql and odata go a long way to solve this. Also, database views can provide a data api layer for a database. Views provide a stable datamodel, that allow the underlying tables to be changed.

contravariant · on Nov 1, 2023

I'm inclined to agree, but just to provide a counterpoint: if the data is modelled properly then there are very few reporting queries that actually make sense. Except of course if you want people to be able to change definitions on the fly, but I'd argue that is an anti-feature when it comes to reporting.

hosh · on Nov 1, 2023

Data gets messy, because the real world is messy. Once you start ingesting from other sources, it gets very messy. There is a reason ideas like the "data mesh" or semantic web (distributed schemas) were attempted.

contravariant · on Nov 1, 2023

Sure, but that complexity is something you do not want to deal with in your reporting API layer. A reporting API should report correct figures, you don't want to put the responsibility for ensuring the numbers add up on the person requesting the report.

hosh · on Nov 1, 2023

Beware the fallacy of legibility.

https://www.ribbonfarm.com/2010/07/26/a-big-little-idea-call...

contravariant · on Nov 1, 2023

Surely communication is the one area where it's fairly safe to apply the concept of legibility?

hosh · on Nov 1, 2023

It's already bad enough that people, by default, are not aware of the frame of reference in which they see the world, let alone communicate with.

As just a very simple example, unless I am explicitly clear about speaking from a Living Systems world view and a regenerative paradigm, when I talk about permaculture design here in this forum, I end up talking past the vast majority of people here on HN.

But even staying with the same shared technological world views and paradigms -- the people across the Three Tribes of Programmers (https://josephg.com/blog/3-tribes/) will get into flamewars because the frame differs in distinct ways.

Because a schema makes distinctions on information in a particular way, it will always encode a frame in which to view and understand that information.

contravariant · on Nov 1, 2023

This is turning into a rather interesting discussion, but surely it would be better to decide on a schema with each other (or perhaps multiple schemas if absolutely necessary), as opposed to just giving people the ability to query whatever they want and present the results?

If the query itself cannot be explained simply then what does its output even communicate? The more freedom you put in the API the more room you leave for confusion, this might be appropriate for research but not when you're reporting on something.

hosh · on Nov 2, 2023

I agree that you have to make decisions somehow. My points were directed at the phrase “properly modeled”, rather than “very few queries that make sense”.

I see the purpose of reporting within an organization is to allow someone, somewhere to evaluate things and make decisions.

You can have an agreed upon report, but it doesn’t mean that the report itself will always lead to wise decisions. It can also become worse when those very decisions lead to forcing things around you to make things easier to report — that is the core thesis described in that essay, and the book, “Seeing Like a State”

Another example — in the realm of strategic decision making, decisions will always be made with imperfect information. A core way of strategy is deliberately manipulating how the opposing force gathers and interprets information, this influencing their actions. (Example, OODA). So what is already messy gets incredibly messy. This is the kind of stuff that falls way beyond reporting, and not something I think can ever be adequately modeled.

refset · on Nov 1, 2023

Views seem like the ideal solution in theory, but SQL implementations of views are often problematic in practice due to planning complexity & overhead. In theory they should also be a good mechanism for handling writes (see "updateable views" / "writable views") but the list of caveats is long and many developers are understandably nervous about pushing lots of logic into TRIGGERs.

oconnore · on Nov 1, 2023

This reminds me of the microservices trend where the main justification is modularity —- which is of course perfectly possible to implement in a programming language using standard language constructs.

Similarly: any separation of concerns you can implement with APIs and multiple databases you can also implement with a schema. The difference being you have to reimplement a bunch of capabilities that are baked into an rdbms (and will probably never correctly implement something like a hash join).

mpweiher · on Nov 1, 2023

It should be possible to implement modularity in a programming language using standard language constructs.

In practice, the programming language and software engineering communities have largely failed to provide usable modularity. IMNSHO due to our clinging to call/return (so procedural/functional/method-oriented) as our modularity mechanism.

It ain't working.

So we the OS/systems guys and gals need to bail us out. Process boundaries are pretty hard, though of course we then manage to build distributed monoliths.

One thing that's interesting is that µservices, if actually REST-based, use data as the modularity mechanism, rather than procedures.

"Show me your flowchart and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowchart; it'll be obvious." -- Fred Brooks, The Mythical Man Month (1975)

javcasas · on Nov 1, 2023

So because we have bad programmers that don't use the stuff provided by the programming language, we are going to throw away the baby with the bath water.

And, somehow, this is going to fix the fact that this happened because we have bad programmers. Bad programmers that now have to also deal with network complexity on top of the basic complexity they already can't deal with.

mpweiher · on Nov 1, 2023

> bad programmers that don't use the stuff provided by the programming language

Er, no. We actually don't have the right stuff provided by the programming language.

And no, µservices are not a good solution. They are a bad solution to a real problem that programming languages do not solve.

javcasas · on Nov 1, 2023

Ok, in your opinion, what would it look like the right stuff as provided in a programming language?

Microservices are a good solution for scaling a company into multiple sub-sections, I/E a solution for an org chart problem. If your org chart has less than 4 levels, most likely you are doing microservices too soon. At that point, you can have people in charge of dealing with network interactions exclusively, and that allows your bad programmers to still be productive.

mpweiher · on Nov 3, 2023

It is not necessary to be able to solve a problem in order to identify it.

That said, what I think what is needed is general support for architectural connectors in programming languages. Our so-called "general purpose programming languages" (actually: domain specific languages for the domain of algorithms, see ALGOL) effectively support one: the procedure call.

https://insights.sei.cmu.edu/library/procedure-calls-are-the...

See also:

https://blog.metaobject.com/2019/02/why-architecture-oriente...

and:

https://2020.programming-conference.org/details/salon-2020-p...

candiddevmike · on Oct 31, 2023

It's so simple! If you never delete or change things in your schema, you never have to worry about changing it.

The article is pretty devoid of actionable advice.

plandis · on Oct 31, 2023

The article lists fairly sensible rules for backwards compatibility and growth in my opinion.

candiddevmike · on Oct 31, 2023

The central thesis revolves around continuous growth with no advice given for removal/cleanup. This is not a sound strategy for a database schema, at least for the SQL side. Column bloat, trigger bloat, index bloat... Schemas cannot continuously grow, there needs to be DROPs along the way.

slowmovintarget · on Oct 31, 2023

Datomic is not a SQL database. You can simulate tabular data, but it is a very different beast. Datomic schemas are always EAVT. If you add an attribute, or deprecate one, there may still be values tagged to that attribute. That should be fine.

That something as simple as change--a universal condition of all systems--defeats so many schema designs in SQL databases should suggest that there's something wrong with the SQL databases.

cakoose · on Nov 1, 2023

But the article claims it applies to SQL databases as well.

> Are these rules specific to a particular database? > > No. These rules apply to almost any SQL or NoSQL database. The rules even apply to the so-called "schemaless" databases.

plandis · on Oct 31, 2023

Yes you eventually need to do the things you mention but probably less frequently than a normal application needs to add new columns or the like to support new use cases.

The article thesis is essentially make breaking changes as infrequently as possible. The easiest way to do that is never change your data but that’s a sure way to have your competitors crush you as you stagnate. The next best thing you can do is make sure existing producers and consumers are not impacted when you make changes. For most changes being made the advice in the article gives a set of things you can do to achieve this goal.

For times where your database itself is not scaling which are the types of things you’re mentioning, I think there are other things you can do to, if not eliminate backwards incompatibility, at least make the transition easier. For example fronting your DB via an API and gate all producers/consumers through that. If you’re frequently having to handle scaling issues perhaps it’s time to reevaluate your system design all together.

hyperpape · on Oct 31, 2023

From the article:

    never break it

    Never remove a name

    Never reuse a name

Your point is a very reasonable statement, but you are really disrespecting the author by putting a reasonable statement in their mouth. They had every chance to say the reasonable thing, and they clearly made a choice to say the unreasonable thing. Respect that decision (and tell them that they're wrong).

butlerm · on Nov 1, 2023

Never removing a name is relatively straightforward with views and aliases. Never breaking it is somewhat harder, but is still possible for well designed tables - especially if you use updateable views. Never reusing a name is also straightforward. It does require planning your schema evolution years if not decades in advance of course.

slowmovintarget · on Oct 31, 2023

This advice is rock solid... for Datomic. Not for Postgres or other traditional relational DBs.

ako · on Nov 1, 2023

Isn’t this the same versioning approach that graphql proposes?

n0w · on Oct 31, 2023

You've got more than one app sharing a db when you deploy a new version. Unless you're happy with downtime during deploys as the cost of not having to manage how your schema evolves.

These kinds of best practices make sense regardless of how many apps access a db.

Following the advice doesn't also prevent you from enforcing a strict contract for external access and modification of the data.

cogman10 · on Oct 31, 2023

> You've got more than one app sharing a db when you deploy a new version. Unless you're happy with downtime during deploys as the cost of not having to manage how your schema evolves.

2 deploys is all it takes to solve this problem.

    * 1 to deploy the new schema for the new version.
    * 1 to remove the old schema.

This sort of "tick tock" pattern for removing stuff is common sense. Be it a database or a rest API, the first step is to grow with a new one and the second is to kill the old one which allows destructive schema actions without downtime.

koreth1 · on Oct 31, 2023

2 deploys isn't enough for robustness. It depends on what the change is, but the full sequence is often more like

* Add the new schema

* Write to both the new and old schemas, keep reading from the old one (can be combined with the previous step if you're using something like Flyway)

* Backfill the new schema; if there are conflicts, prefer the data from the old schema

* Keep writing to both schemas, but switch to reading from the new one (can often be combined with the previous step)

* Stop writing to the old schema

* Remove the old schema

Leave out any one of those steps and you can hit situations where it's possible to lose data that's written while the new code is rolling out. Though again, it depends on the change; if you're, say, dropping a column that no client ever reads or writes, obviously it gets simpler.

reissbaker · on Oct 31, 2023

Yup, it depends on the change. Sometimes two deploys is enough — e.g. making a non-nullable column nullable — and sometimes you need a more involved process (e.g. backfilling).

Nonetheless, I agree with the OP that the article's advice is pretty bad. If you ensure that multiple apps/services aren't sharing the same DB tables, refactoring your schema to better support business needs or reduce tech debt is

a. tractable, and

b. good.

The rules from the article make sense if you have a bunch of different apps and services sharing a database + schema, especially if the apps/services are maintained by different teams. But... you really just shouldn't put yourself in that situation in the first place. Share data via APIs, not by direct access to the same tables.

sokoloff · on Nov 1, 2023

Why wouldn't one deploy be enough to convert a non-nullable column to nullable? Going the other way takes two deploys I can see, but this way seems like is entirely backwards compatible.

sfn42 · on Nov 1, 2023

If apps are using the column they might need code changes to handle the column suddenly having nulls. So you would have to change and redeploy the code, then make the column nullable.

sokoloff · on Nov 1, 2023

Thank you! (I should have been able to get there on my own but, for whatever reason, I obviously didn't.)

skeeter2020 · on Nov 1, 2023

I agree that the article is pretty bad, but it's not like "make this the API team's problem" is really an answer. API versioning is probably tougher than database schema versioning IME.

sheepscreek · on Nov 1, 2023

Aside from DBs, there are many other communication tools that use a schema. I can think of at least two: Kafka and serialization libraries like Protobuf and Thrift.

kahirsch · on Nov 1, 2023

> never allow more than one app to share the db

I'm a little stunned by this suggestion. I've worked in quite a few different context for application systems, e.g. retail, manufacturing of fiber optic cable, manufacturing of telecommunications equipment, laboratory information management, etc.

I wouldn't even know what you mean by "app" in this context. There may a dozen or more classes of users who collectively have hundreds, even thousands, of different types of interaction with the system.

Sometimes there were natural divisions where you could separate things into a separate database. For example, the keep/dispose system for laboratory specimens, which tracked which specimens needed to be kept for possible further testing, where, and for how long. But most problem domains were not like that.

And sometimes we had to interact with other systems because they were for a separate division (because of mergers and acquisitions). But those kinds of separations made for more limited functionality and more difficulty in managing change, not less.

IceDane · on Nov 1, 2023

Most of this advice is this bad because it's entirely based on the idea that you interact with your database using an extremely dynamic language: clojure. Like most of Hickey's advice, it's terrible and founded in dogma.

throdor23 · on Nov 1, 2023

What makes it easier to follow the advice in the article in a dynamic language?