So, I've seen this go back and forth so many times, and I've come to the conclus...

Scarblac · on Sept 4, 2019

I know about the relational model and used to work with raw SQL for years.

But this is the stack I often work with now:

- Postgres

- Django ORM

- Django REST Framework

- Auto-generated OpenAPI Typescript functions

- A Redux store

- My React app

In other words, four ORM-like non-relational frameworks stacked on top of each other between Postgres and my app.

A generic solution for "complex, possibly unanticipated questions" would need to work through all of those layers.

sul4bh · on Sept 4, 2019

What do you use for "Auto-generated OpenAPI Typescript functions" ?

Scarblac · on Sept 4, 2019

Django-rest-swagger to have a Swagger / OpenAPI spec on top of Django REST Framework, and then OpenAPITools/openapi-generator to generate Typescript code.

mcamac · on Sept 4, 2019

cool -- what do you use to generate OpenAPI Typescript functions?

squiggleblaz · on Sept 4, 2019

I don't doubt you're right.

If you're thinking in the first terms, is a relational DB the right backing store, or wouldn't it be better to back your DB with something more like Mongo?

Do they still obtain some benefit from the schema, since from time to time the semantics of a property will change, and a schema can tell them what the current shape of the data is and guide the migration. Or would they prefer to just write a new property which does a lazy conversion from the old terms? (To an extent, I suppose the answer to this question determines the answer to the first. But I guess there's other tradeoffs to Monggo I'm not aware of, since schemalessness fills me with fear and I just don't want to look there.)

Regarding your final paragraph: I suspect that, for many apps, the choice between relational vs transparent persistency is largely determined by the team who is working on it. The "hard trick" is therefore trying to balance a strong personality with a strong view who disagrees with the rest of the team who have weaker personalities and weaker views, but who all agree on the other side of the fence. This is simply a standard management question with very little technical relevance.

mumblemumble · on Sept 4, 2019

I suppose it depends on whether you want schema-on-read or schema-on-write.

Even if you're working under the first model, there's still a lot an RDBMS can do to help you ensure data integrity. Largely by being less flexible. Databases like MongoDB allow for a more fexible schema, at the cost of pushing a lot of the work of ensuring data integrity into the application code.

For my part, I do a fair bit of working with databases that were built on the MongoDB of the '90s, Lotus Notes, and I've seen what they can grow into over the course of 25 years. It's not pretty. That experience has left me thinking that, while there's certainly a lot of value in the document store model, I wouldn't jump to a document store just because I don't need everything an RDBMS does. I'd only do it if I actively needed a document store.

mamcx · on Sept 4, 2019

> If you're thinking in the first terms, is a relational DB the right backing store, or wouldn't it be better to back your DB with something more like Mongo?

Certainly relational like for 90% of the cases, if not all.

The relational model is THE answer to nosql from the start (ie: it was the solution of the originals "nosql").

Is totally more flexible, powerful, dynamic, expressive... And that without talking about ACID!

You can model all "nosql" stores with tables. With limited exceptions it will work very fine for most uses...

> This is simply a standard management question with very little technical relevance.

I don't get what your are implicating here...

But nosql solutions are the ones to be suspected and the ones to requiere a harder qualifications and justifications to use. Is the wrong choice in the hands of the naive. "NoSql" is for experts and for niche/specific workloads.

shkkmo · on Sept 4, 2019

The advantage of using an ORM is that you can always not use it in the places where you are doing things that are not suited to the strength of the ORM.

I tend to hand-write almost all of my migrations and many of my queries that synthesis data from multiple tables to reach a conclusion. I can think of only a handful of times where it was worth writing custom code to persist state (usually only when there are a large number of records that need a couple of specific fields updated.)

Like many tools, it all depends on how well it is used and how well it fits its use-case.

solatic · on Sept 4, 2019

100% agree.

> The trick is figuring out which way fits your business needs.

Rule of thumb - if you don't control the database, use an ORM; if you do control the database, work directly with it.

For example, let's say that your business is a software company that sells an on-prem product. Some of your customers have Postgres expertise, some have MySQL expertise, some MSSQL, some people are stuck on Oracle. Forcing customers to develop DBA expertise in a database they're not familiar with just for the privilege of buying your product is a sales disaster in the making. So you go with an ORM and set up QA testing that tests releases across all of the databases that you support, and the ORM helps you by making it much more likely that your development efforts will automagically succeed in working with each of the supported databases.

In most other situations, though, it makes much more sense to start with the data design. If your business grows, your databases are going to grow. You are almost guaranteed not to switch databases (absent overwhelming financial need, see: Oracle) over the lifetime of your company. Data analysts (data scientists now?) can extract serious value from your databases by getting into the weeds of the database schema, indexes, and queries and working with developers and DBAs to optimize them for business reporting. If you give up control to an automated tool that knows nothing about your business, your business will be less competitive as a result.

Data is far too valuable these days to refuse to develop expertise with the underlying databases.

icebraining · on Sept 4, 2019

Why do your customers even need to have expertise in the DBA you're using? We just sell them a black box (usually VM images), with a few endpoints to extract data in standard formats. They can use whatever they want to connect to those.

hibbelig · on Sept 4, 2019

Maybe one customer wishes to run their databases in a cluster distributed across two continents.

Maybe another customer has bought Oracle and the installation still has room. Also, they have a custom backup scheme that takes their load patterns into account.

icebraining · on Sept 4, 2019

Customers wish many things, that doesn't mean they're relevant selling points. It just sounds bad judgment to me to tie yourself that way, unable to take advantage of the RDBMS to the full. And have you even tested running your (hypothetical) application in a distributed Oracle cluster across two continents? If not, how will you support it?

Absolutely not worth it, in my experience.

stevenwliao · on Sept 4, 2019

What are some heuristics to determine which way fits the current business problem (and future business problems)?

sjwright · on Sept 4, 2019

The most basic question I'd ask: is the core of your application the data or is it the business logic? Or to put it another way: does it make more sense to build the application around data, or to build your data around the application?

Another question I'd ask is whether you're expecting to deal with millions of rows/objects or hundreds of millions. Modelling relational data correctly can have performance impacts in orders of magnitude.

When I look at most projects, I instinctively begin by modelling the data structure; then I think about why/when/how data can move from one state to another; then I think about how an application could prod the data between these states; then I build code which runs against the data.

If your application isn't data at its core (e.g. a document-based app) then it probably makes more sense to treat data elements as objects and use a CRM (or similar) to store and retrieve the objects.

Scarblac · on Sept 4, 2019

An ORM does not usually limit how much you can model your data and create fast queries. The modeling you talk about can be done just as well in e.g. the Django ORM.

sjwright · on Sept 4, 2019

ORMs that I've experimented with tend to fall into one of two categories: either they treat the object model as prime, or they treat the relational model as prime.

The former almost invariably spurt out inefficient queries, or too many queries, or both. They usually require you to let the ORM generate tables. If you just want to have your object oriented design persist in a database, that's great.

The latter almost invariably results in trying to reinvent the SQL syntax in a quasi-language-native, quasi-database-agnostic way. They almost never manage to replicate more than a quarter of the power of real SQL, and in order to do anything non-trivial (or performant at scale) they force you to become an expert SQL and/or how it translates its own syntax into SQL.

And once you become more expert at SQL than your ORM, it's not long before you find the ORM is a net loss to productivity—in particular by how it encourages you to write too much data manipulation logic in code rather than directly in the database.

For the projects I've worked on, I've almost never wanted to turn data into objects. And on the occasions when I've thought otherwise, it has always turned out to be a mistake; de-objectifying it consistently results in simpler, shorter code with fewer data bugs.

I tend to find that the longer data spends being sieved through layers and tossed around inside your application, the more data bugs you'll end up having. It's much better to throw all data at the database as quickly as possible and do all manipulation within the database (where possible) or keep the turnaround into application code as short as possible. It means treating read-only outputs/reports more like isolated mini-applications; the false nirvana of code reuse be damned.

And that doesn't mean replacing an OOP or ORM fetish into a stored procedure/trigger fetish. It means realising that if your application is data at its core, it's your responsibility as a programmer to become an expert at native SQL.

The problem is that far too few programmers realise how deeply complex SQL can be; it's treated like a little side-hustle like regular expressions, when for so many programmers it's the most valuable skill to level up.