I mean for highly connected data shouldn't the solution be to ideally use a graph database from the get go instead of generating JSON aggregates in the database?
Don't get me wrong, JSONB is great but I see a lot of people simply slap on a JSON column to anything that requires more complexity without trying to first solve it through RDBMS relations.
PostgreSQL makes a perfectly fine graph database, at least for general workloads. You're not going to get vastly improved performance from a specialized solution unless you're doing very specific things like math-heavy network analysis that PostgreSQL is still a bad fit for.
I loaded this same IMDB dataset into a Kuzu DB the other day, just for fun. It was an interesting project but I would agree that this dataset in particular lends itself to a graph database. Kuzu started to choke with > 10M nodes & 10M edges but I was doing a naive insert & probably could fit the entire dataset in without trouble using their direct CSV import.
Interesting points here. I’ve found the “graph DB vs. relational DB” discussion usually gets framed as an either/or, but there’s a middle ground.
A lot of teams already have their data sitting in Postgres, Mongo, or a lakehouse. Spinning up a separate graph database just for traversals often means duplicating data, building pipelines, and keeping two systems in sync. That’s fine if you need deep graph algorithms, but for many workloads it’s overkill.
What some folks are exploring now is running graph queries directly on top of their existing data, without having to ETL into a dedicated graph DB. You still get multi-hop traversal and knowledge graph use cases, but avoid the “yet another database” tax.
So yeah...graph databases are great, but they’re not the only way to model or query graphs anymore.
Don't get me wrong, JSONB is great but I see a lot of people simply slap on a JSON column to anything that requires more complexity without trying to first solve it through RDBMS relations.