Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
How can you represent inheritance in a database? (2010) (stackoverflow.com)
35 points by stefankuehnel 8 months ago | hide | past | favorite | 22 comments


I recently read a lot about solutions people have come up with to put a class hierarchy into a relational DB, because it all mostly applies when the domain’s data model makes heavy use of ad hoc union union types (like TypeScript’s union types).

If you only skimmed the StackOverflow post or were about to bounce because “my models don’t form complex inheritance hierarchies,” ask whether you’ve tried to serialize something with type `A | B` to the database and been frustrated at there being no good solution. Then (re-)read the post in that framing and see if the solutions proposed and tradeoffs discussed look more relevant to you.

It’s made me realize that there’s probably a lot more collected wisdom locked up in writings from the 90’s and early 00’s that I disregard because it’s so heavily class inheritance focused. Alternatively: free blog post ideas by re-contextualizing this content in contemporary languages and technologies!


I might be biased because I do what you might call "data engineering", but my first thought was "there are three obvious ways to do this", and the accepted answer lists exactly those three ways. You'll find that none of them work very well when you have 100 classes rather than 2, because the "correct" answer is to model your data to be relational in the first place. Yes, sometimes it's unavoidable because the structure of the data is just like that, but more often than not someone somewhere is doing something very wrong.

> realize that there’s probably a lot more collected wisdom locked up in writings from the 90’s and early 00’s

Yes and no, it's the "design patterns" discussion all over again. Some of it was genuinely interesting, but most of it just was incidental complexity caused by early Java and the absolute horror of pre-standard C++. When you filter that out so that only timeless foundational concepts remain, you'll find that most of it had already been said before.


I was also genuinely shocked to see this article make it so far up in HN.


My main wish for postgres for a while has been to support ADT. I know it's probably incompatible with optimizing storage, but it would solve so many problems I have when designing dbs


I've had to clean up ORM mess in a few projects. ORM mess is what you get when you have people that have very little experience with databases abuse ORMs to pretend that what they have in their database is the same as they have in their domain model. Usually the signs are that they have transactional issues, severe performance issues, and lots of time wasted on analyzing why stuff isn't working as expected.

Not my favorite job. Learn to do a proper database design; it's not that hard. Balance denormalization against querying needs (disk space is cheap). Use json serialized objects to your advantage for complex things that you don't actually query on. The price you pay for adding complexity to a database is query time overhead, joins, and the resulting application level complexity, bugs, and overhead. You'll need lots of columns with indices to support all that. The queries get more complex and expensive. Etc. Needless complexity at the database level is not a good thing. Keep it simple.

Mostly, you don't need your database model to resemble your domain model. You just need it to store it. A simple database model could be a table with a key and a json blob. Perfectly valid for a lot of domain models. And databases like postgres support creating indices on things inside your json. So, you don't even lose the ability to query. If you really need extra columns (e.g. a timestamp), add them of course.

The mistake that people make is assuming that all the little objects in their domain are equally important. They are not. Most of it is just meta data that needs to be attached to something that is never/rarely used for querying. It needs to live somewhere. But that doesn't necessarily have to be a dedicated column or table in a database.


I think database modeling is a bit of a lost art, it’s so important though for building applications that do anything important. Love your distinction about the database not needing to be 1 and the same with your domain model. I usually do my first iteration with a few json columns then split them out into tables as the domain model comes into focus … it’s kind of an art.


excellent points


As usual, the most useful questions on S.O. get closed. What a terrible direction the website has taken


Oh, that started a while ago, I wanna say 10+ years ago. I always chuckle at when the blindingly obvious answer to a question is buried 18 sub-comments deep while the most upvoted highlighted comment is providing an answer the question didn't ask for, then closed for being "off topic" or some nonsense. Ironically, this is why I also think LLM's aren't very good at providing coding advice in certain domains, I think they tend to give the "upvoted" answer when often in my career the skill gap between engineers was just knowing when/where to throw the top answer out and look deeper.

I understand why strict moderation standards are needed on a site like SO, but IMO that site started huffing too much of its own supply long ago.


It's a silly reason given, too. I wouldn't call a question about techniques for representing inheritance, "asking for an opinion".


Indeed, it has a clear answer too. Of course new techniques might pop up!


This is one of my favourite interview questions to ask:

"You have an online shop with products that have some common attributes but each category of products also has it's specific attributes.

How would you model this in your code and database?"

The question is a bit more interesting if the database is relational.

Lots of answers exist, all with some trade-offs.


This is exactly what I ask for in my interviews for backend engineers. I had good success filtering for backend engineers who view database as a part of the business logic rather than purely just a storage. The “answer” to the question has many variables with no right solution, so it’s also an opportunity to assess the candidate’s communication skills and how they tackle uncertainty.


Honestly, this sounds like a super fun question for all involved. Gets someone thinking, there isn't a right answer (well, no perfect answer), and the back and forth conversation is instructive on how they'll work as a teammate in projects.


Hibernate documentation covers 3 strategies.

https://docs.jboss.org/hibernate/orm/6.6/introduction/html_s...

Technically most people I have seen just use a variant of 3, with @MappedSuperclass.


It's missing the one solution I see fairly often: a single table with a JSON-typed column that serves as a catch-all dumpster for subtype data.

The question and answers are from 2010 and the PostgreSQL JSON data type was introduced in 2012, so it makes sense that this rather awful solution hadn't caught on yet. The "NoSQL" movement was just gathering steam in 2010.


It had enough steam in 2010 for this classic gem to be created in response: https://www.youtube.com/watch?v=b2F-DItXtZs


In Edgedb, we have multiple inheritance.

type ConnectEvent extending RTCEvent, SoftDeletable { handle: str; }


I honestly don't like either of the three solutions. They don't really scale when you have a lot of combinations of attributes.

I have deployed this type of data using both solution #1 and the Entity Attribute Value model and I honestly prefer the latter.

We now have incorporated new data from multiple sources and the number of attributes has grown to around 1000. I don't see how deploying a table with 1000 columns can be considered a good solution.

Also, the data is used downstream in multiple places which means that If I had gone with solution #3 (which looks like the best one) we would have needed to edit tens of queries to add a new join with the new table for the class with custom attributes


Isn't this why postgres exists.

My understanding is that when Stonebraker went back to academia after the commercially successful ingres database project. He had some interesting ideas on how to apply object orientated principals to the relational database.

The result ended up being the postgres database management system.

https://en.wikipedia.org/wiki/Michael_Stonebraker


closed as opinion based, classic stackoverflow

I heard jonskeet write a book so stackoverflow mod doesn't consider his answer as a opinion


Correct solution: stop using inheritance.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: