> If you were to do this with joins, you'd have a ton of duplicated data returne...

halfmatthalfcat · on Oct 14, 2023

After writing a lot of SQL over the past 10 years, I'm ready to not do it anymore lol. If I could never worry about syncing models with schema changes, hand writing migrations, etc, I would be happy. Most SQL work is CRUDy and monotonous to write. Prisma allows you to break out of the query builder with their "rawX" variants too and write plain SQL if you never need to (and I do occasionally).

Again, not saying you _cant_ do it with current db constructs but Prisma deals with all that for you while allowing escape hatches if you need them. Just like with anything related to software engineering, there are footguns a plenty. Being aware of them and taking the good while minizing the bad is the name of the game.

sgarland · on Oct 14, 2023

I’m all for using a library to help you with migrations, and even a full ORM like Django has its niceties. As a DBRE, I just want the end result to be performant, scalable, and to encourage good SQL habits.

Knowing SQL can help inform the choices a dev makes in the ORM - for example, knowing about semi-joins may let you write code that would cause the ORM to generate those, whereas if you didn’t, you may just write a join and then have to deal with the extra columns.

ttfkam · on Oct 14, 2023

Re: JSON https://www.postgresql.org/docs/current/datatype-json.html

Re: RDBMS was not designed to store JSON https://youtube.com/watch?v=rpw_x8TtqTo

sgarland · on Oct 15, 2023

Just because RDBMS have added JSON support (MySQL has had it since 5.6 as well) doesn’t mean it’s a good fit. The language and its implementations are designed for relational data that can be normalized; JSON is neither.

Have you ever tested JSON vs. normalized data at scale? Millions+ of rows? I have, and I assure you, JSON loses.

Last I checked, Postgres does a terrible job at collecting stats on JSONB - the default type - so query plans aren’t great. Indexing JSON columns is also notoriously hard to do well (even moreso in MySQL), as is using the correct operators to ensure the indices are actually used.

ttfkam · on Oct 15, 2023

Millions of rows with a JSON column? Yes, indeed I have. Recently in fact. When all normalized fields are populated, you're absolutely right. However that wasn't our dataset. We had entries with sparse keys. This was due to the nature of the data we were ingesting.

We ended with multiple partial expression indexes on the JSON column due to the flexibility it provided. Each index ended up relatively small (again, sparse keys), didn't require a boatload of null values in our tables, was more flexible as new data came in from a client with "loose" data, didn't require us to make schema migrations every time the "loose" data popped in, and we got the job done.

In another case, a single GIN index made jsonpath queries trivially easy, again with loose data.

I would have loved to have normalized, strict data to work with. In the real world, things can get loose without fault to my team or even the client. The real world is messy. We try to assert order upon it, but sometimes that just isn't possible with a deadline. JSON makes "loose" data possible without losing excessive amounts of development time.