Hacker News new | past | comments | ask | show | jobs | submit login
Finding and optimizing N+1 queries on a relational database (mixpanel.com)
42 points by kiyanwang 27 days ago | hide | past | favorite | 48 comments



In iommi we built something imo much smarter: https://docs.iommi.rocks/en/latest/dev_tools.html#sql-trace

The idea is that:

1. We store all the stack traces and SQL calls

2. At the end of the request we group them to find N+1

3. If there are any N+1 (given some threshold), we print a big warning to stdout. Full traceback for one of the SQL calls and a few examples of SQL.

This is imo MUCH better. It means the console is silent when there is no issue. It also means the console is loud when there IS an issue. And the output you get is very helpful.

This approach is better than the one in the article, and superior to Django Debug Toolbar where you have to actively go look for problems instead of the machine telling you.


Heh, wrote something similar (django-queryinspect) a while ago, then abandoned it since iirc django debug toolbar has it builtin, and I moved to some other stuff.. maybe I should dust it off.

N+1 is fairly well understood and documented gotcha, and Django has excellent tools that make it easy to avoid it (as the article mentions).

To use Django ORM effectively in nontrivial apps, you do need to understand how relational databases work, and it's helpful to understand SQL. But it does make me a whole lot more productive (and happy!) than writing SQL by hand.


Not knowing anything about web development or Django or whatever, why isn't this just in a real database? Where you can do a join instead of 2000 individual queries?


Object Relational Mappers like the one Django use are a whole topic of debate unto themselves.

They strive to map SQL concepts into a more code-friendly domain. Rows are represented as objects, references become attributes of those objects which can be eager-loaded by a join when you fetch the parent object or lazy-loaded with their own query when they are actually accessed. The N+1 problem comes from this lazy-loading scenario.

This programming model abstracts away the SQL which can be really cool and easy until you try to do something off the beaten path. Junior devs love it for the ease, and diving into the internals once things go wrong turns them into Senior devs!


Coming from an area of programming where dataframes/tables are the first-class citizen this is mind blowing to me. Is there a reason for doing things this way by default? I guess it puts off having to think about complex queries, but it also means performance is degraded multiplicatively the more tables you're pulling in, right? Is the tradeoff really just "it's easier to set up any old garbage and fix performance later"?


Lazy-loading is usually the default ORM setting, so you wouldn't pull in any extra tables by default. You can setup Eager-loading on a per-query basis (i.e I know I'll need it here) or just set up certain relationship to always eager-load, depending on your needs.

You've hit the tradeoff right on the nail. ORM-based code is dead-simple to write since everything just looks like an object. Anyone who knows the programming language can start querying the DB without really knowing anything about SQL. This lets you write the code quickly and in a domain thats really easy to reason about.

I've seen many times where an endpoint might really only require 1 SQL query but it turns into a handful because of the ORM. Very rarely does it matter though, because each query takes < 5 milliseconds and you barely notice the slowdown.

If you know your ORM library well then you can get the benefits of both worlds. Mature ORMs will give you a ton of control (including letting you write SQL directly) when you need, you just have to be ok with the learning curve.


Columns though, you do have to restrict the column set if you don't want/need them all.

It's dead simple to do dead simple, and then it quickly gets hard to impossible to do the complex things you need, is my experience with Django ORM.

My favourite model personally is a very lightweight mapping into a model in the querying language, like Diesel or less. Django unnecessarily changes established terminology and introduces abstractions that are ultimately limiting.


IME, it’s more because most (citation needed) web devs are familiar with JSON and its nested format, and so it’s much easier to keep everything in that mental model, and let something else do the conversion.

SQL is of course not a difficult language, but I will give ORMs that it can be somewhat difficult to context switch between the two mental models.

Django, IME, does an extremely good job at producing decent SQL, most of the time. There are times when you can be faster, and there are certainly times where it’s massively slower, but the average is higher than if you just had everyone writing queries.

My counter to that is “have RDBMS SMEs create VIEWs for anything you want, and let devs just call those,” but then I get told I’m old-fashioned, and that no one wants to store any kind of logic in a DB, version controlled or not.


the best practice is just use PostgREST.


The steelman of their argument (coming from things as a database person) is thinking about the problem in a different way for a different domain, which can beget a "simpler" approach.

Something like a document db approach (one big json entry to rule them all) gets significant benefits in setting certain levels of abstraction as even possible.

Hey - we just get the user details and then we update some json and that's all we can do, and it kinda sucks, but it means we're going to put most of our computation in this website and really make something that looks like a modern web app.

It has a bunch of downsides, but you dont have to teach anyone SQL, its just a simple repository pattern + microservices + limiting the scope of any one thing.

Type checking is strong, tooling support is strong and you usually can stick in one language and go ham.


Django queries are an interface to a real relational database. It's not its own thing.

The tradeoff Django chose was lazy loading by default over automatically pulling the entire database by following every foreign key.

One of the standard fixes is to tell it to do JOINs for specific foreign keys instead of lazy loading (select_related(), mentioned but not shown in the article).


You can do joins in Django just fine, and you can even NOT write them and have Django automagically do the correct thing if you know how to use it.


The ORMs provided by full stack frameworks like Rails, Django and DotNet are basically engines in and of themselves. They’re doing a lot of work at runtime to reduce pressure on the DB and ensure connections are returned to the pool as soon as possible.

They’re not ‘automagic’; you can reason with them quite easily.


Then what's up with the pattern they show of querying every board and then iterating through to query again for every report associated with the board?


I was expecting them to say after introducing this, that they started with an intentionally bad example so they could improve on it. But instead they went on to talk about a 2.5x speedup on something that would already be 100x faster in any other database. I don't get why they're doing this at all.


If you have 1 user asking for 2000 things that’s fine, but what if you have 2000 users asking for 2000 things?

You will need to write your API to batch database requests to avoid inefficiency anyway, so it makes sense to break up large joins into smaller queries that can batch well.


Database queries are likely to be the most constrained resource in a typical enterprise application. They are essential but also extremely slow.

Because of this, that limited resource needs to be carefully managed and it needs to be straightforward to reason about the I/O behavior of your application.

Attempting to hide them away and obscure what is actually happening in your application is extremely counter-productive.


One solution is to not use an ORM.

These problems are framework problems. I find the cognitive overhead of worrying whether a framework might do the wrong thing or diagnosing problems after they arise often override any cost savings frameworks provide in the first place.


Not using an ORM is a solution in the same way that the solution to changing a car's oil is to not own a car.

I've spent a good 20 years building web apps. The speed and productivity of having an ORM like Django's a well-integrated into business logic is unparalleled with the possible exception of Rails and Laravel, maybe.

Systems that have 50+ tables become _trivial_ to maintain by one person. You can refactor an underlying database table without having to rewrite queries. There are entire classes of refactoring that go away with the ORM. Serving database objects with Django Rest Framework or building up CRUD interfaces with the Django admin would simply be impossible otherwise.

Getting the hang of the ORM took me all of 6 months to build up to the nastiest of edge cases, which are functionally as difficult to debug as any complex SQL query with window functions or esoteric functions.

The discourse is irrationally framed towards the first months of learning, for a technology that I've been using for 14 years with _no_ sign of finding a replacement.


But now I need a developer with >6months of experience in that particular ORM to maintain my app, instead of just someone who knows SQL. Nothing is free.


You would have needed someone to get up to speed in all the things you had to reinvent that frameworks already did for you, except that frameworks do it better due to the thousands of eyeballs that have looked at the same problems over and over again.


You’ll get N+1 or quadratic queries whether or not you use an ORM. At the end of the day you are allocating time on the database and it doesn’t care about the way you build the query. Runtime compute tends to be a lot cheaper than DB compute too - might as well run those cycles before the DB does.


It is true that DB compute is relatively expensive, but doing a join on the DB itself is obscenely more efficient vs doing it manually in a nested loop on an app server.

Also, the DB is generally a lot smarter than doing quadratic operations. The usual worst case is a Merge Join which involves sorting both tables first, to do an efficient join in nlogn time. It’s just not feasible to compete with the DB for efficiency without reimplementing a DB, which is pretty yikes.

See https://www.cybertec-postgresql.com/en/join-strategies-and-p...


I don't get that. If you're not using an ORM, you're simply looking at writing the queries to satisfy your requirements - nothing is guiding you towards an N+1 path.

There's nothing inherent about how we access data in a relational world that means we have to adopt such an anti-pattern. It does mean that you have to think ahead a little bit though.


If you actually have to reckon with the SQL queries in your app, N+1 becomes less likely.

Consider `rows.map { row -> repository.fetchRelatedRow(row.id) }`. That's much more obviously going to cause issues than `rows.map { row.relatedRow }`. You don't need to cross-reference the model code for row to see if relatedRow is eager loaded or not, the answer is right there for you.


I totally share your frustrations over ORMs and the issues they cause, but I seriously doubt your cost calculations. The sheer number of working CRUD applications that don’t rely on fancy queries is mind-boggling, and most developers engaging in high-level system design seem mostly oblivious to this. The ORM may not make for pretty or efficient queries, but it enables armies of mediocre programmers to crunch out generic line of business applications very fast. The performance of these apps may not be what it could be, but while we are still busy discussing the finer details of query planner optimisation, folks are making a fortune with N+1 queries in production that virtually nobody cares about.


> but it enables armies of mediocre programmers to crunch out generic line of business applications very fast.

Or good programmers that see writing raw SQL as a premature optimization.

Someone on here once said that often companies end up reinventing their own ORM once the number of queries they have to maintain get up into the thousands. But they've usually made a really shitty ORM by that point in time.


This is the best write-up of tools to use to avoid shooting yourself in the foot with Django N+1s that I have seen.

However, having used Django professionally for a while, I can say that it is really starting to show its age even if you deal with the N+1 problems. You still have to deal with:

1. Blocking database calls that cannot easily be used in an asyncio context

2. No types

3. 50% of package code being written without any regard for performance

If you're starting a project today and considering Django, I highly recommend you go with Nestjs and Prisma instead.


> If you're starting a project today and considering Django, I highly recommend you go with Nestjs and Prisma instead.

I recommend you to take a look at PostgREST. I replaced all our Django, Flask, Falcon, FastAPI and custom Python HTTP APIs with it. It's phenomenal. It's faster than any of those solutions and adding an endpoint with roles & permissions is just creating a view and executing a bunch of GRANT/CREATE POLICY (takes about 2 min of work). I'm genuinely surprised this didn't replace CRUD APIs in most Postgres shops.


Pretty rich using Prisma as a counter-example to non-performance, considering it launched without doing any JOINs – just pulled every tuple back into the engine, then filtered.

Also, Django, being Python, supports type hints just fine.


What is this argument? It's not because something was bad 7 years ago that it is bad now, tech evolves.


This was observed at least in 2022 [0], and wasn’t fixed until earlier this year [1].

My point was that any company willing to ship a product that is lacking such an incredibly basic part of the language does not deserve respect. These are deeply unserious people.

[0]: https://github.com/prisma/prisma/discussions/12715

[1]: https://www.prisma.io/blog/prisma-orm-now-lets-you-choose-th...


Nestjs has no real ecosystem compared to Django, Rails, Laravel, ASP.NET or Spring. Prisma has some nice typing but it is not even half as capable as the Django ORM.


Prisma obviously isn't as fast as the .NET ORM or raw SQL, but the MIT undergrad can throw together an app without too much worry about joins or database stuff in general.

If you have a type setup for Django and Python, I am all ears. I've spent too long trying different combinations of language servers and type plugins to get type hints for Django in VSCode and have not had any luck.


In Rails you can make this 3 queries. One to retrieve the board, one to get all the report IDs associated with the board and a third to get all the reports with the associated report IDs.

An example in Rails would look like:

  board = Board.includes(:reports).find(board_id)


I install the Bullet gem in every Rails project because it not only detects these N+1 queries but tells you how to fix them, helping to ensure they never even make it to production.

https://github.com/flyerhzm/bullet


As of rails 6.1 you can use active records built in strict_loading to solve this problem as well


> Django Querycount

This seems pretty much like a must have when you think about it: so where’s the equivalents for .NET, Java, Node, Go, Rails and others?

I can’t believe that every web framework that has an ORM or just a connection to the database doesn’t provide summaries like this.

At best, you can look into some tracing options that might reveal where the the execution time went.

I do hope that there are options out there (or might have to try and fail writing something myself) because I can immediately think of a few legacy projects where N+1 is prevalent, maybe even in the new code, depending on who wrote it; some people genuinely believe that using the service pattern and a lot of DB calls for a single API request is okay.


Every web framework I've used has had an equivalent, they're widely available. Specifically I've seen this same pattern for .NET MVC and for Rails. I've never looked for one of these and not found it.


That’s odd because even in .NET the common suggestion I’ve heard is to just crank the logging level and try to extract data from that, e.g. https://learn.microsoft.com/en-us/ef/core/logging-events-dia...

Or the equivalent in other stacks as well, for example with Hibernate or myBatis logging in Java, almost never a pretty summary table of how many DB actions of what kind were called per API request.

I might need to dig around more!


We built Joist, an ActiveRecord-style ORM in TypeScript, on top of Facebook's dataloader library, and haven't had a single N+1 in ~4 years:

https://joist-orm.io/docs/goals/avoiding-n-plus-1s

Any "query in a loop", whether a manually-written `for` loop, or an "emergent for loop" from 100 `after_update` lifecycle hooks firing at once, are always auto-batched, every time.

Joist is also likely why we actually like GraphQL, vs. it generally having a reputation for being a pita to implement (N+1s everywhere w/o tedious, manual batching).



You either spend your time learning SQL or debugging an ORM - I prefer the former...


Ever since the paper on the Vietnam of Computer Science came out, I basically grew to wonder why SQL was the only language. Good? debatable. Composable? no. spreadsheet friendly, but not json friendly.

There are what, 1000 programming languages, but only one SQL? Why?


> I basically grew to wonder why SQL was the only language

You were well ahead of the pack simply by realizing that other options are possible. It is only within the last year or two that the HN crowd has finally come to accept that SQL and relational calculus are not the same thing. If you want a good laugh, check out the older posts on HN where the comments are adamant that there cannot be anything other than SQL for relational querying because it is some kind of fundamental mathematical truth.

> There are what, 1000 programming languages, but only one SQL? Why?

There are the family of languages that get lumped together under the ORM umbrella. They are quite popular – maybe even more popular than SQL. There are plenty of other languages that are less popular, but they are often impractical to use in any real-world scenario which maintains their unpopular status.

> but not json friendly.

Not standard SQL, but if we're counting extensions duality views are JSON friendly. The Oracle tax may be a tough pill to swallow, though.


I see the same thing happening in the hardware language space. There exists a large amount of gatekeeping by people who believe that Verilog/VHDL can ever only be the one true language. So research into alternative languages die. And interfaces are obfuscated behind the language.


There are other query languages out there, many of which fix the glaring issues with SQL. For example, Microsoft's Kusto Query Language (KQL) has the "FROM" clause at the beginning to enable tab-complete, you can chain multiple "WHERE" clauses without needing to use "AND", etc...


That’s exactly what I was thinking while reading this.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: