Hacker News new | past | comments | ask | show | jobs | submit login
PgBouncer is useful, important, and fraught with peril (jpcamara.com)
229 points by brunoluiz on Sept 8, 2023 | hide | past | favorite | 73 comments



Pgbouncer maintainer here. Overall I think this is a great description of the tradeoffs that PgBouncer brings and how to work around/manage them. I'm actively working on fixing quite a few of the issues in this blog though

1. Named protocol-level prepared statements in transaction mode has a PR that's pretty close to being merged: https://github.com/pgbouncer/pgbouncer/pull/845

2. SET/RESET tracking in transaction mode. For this we need some changes in the postgres protocol to ask for postgres to tell pgbouncer about setting updates. This is being worked on here: https://www.postgresql.org/message-id/flat/CAGECzQQOOhH1Rztu...

3. The single threading issue can be worked around by using multiple processes that listen on the same port using so_reuseport=1 https://www.pgbouncer.org/config.html#so_reuseport

4. The pg_dump issue can actually be solved already by installing the Citus extension and using track_extra_parameters (otherwise you have to wait for the same postgres protocol addition that's needed for the other SET/RESET commands) https://www.pgbouncer.org/config.html#track_extra_parameters


I mean... if you multiplex disparate statements into the same connection and session then, well... yes, that is fraught with an incredible amount of complexity. That stuff's for OLAP, read-only replica servers and so on. High-throughput "hey I just need this one thing."

Your large-scale app probably won't need that by default. You pool connections in pgBouncer (or your application) because they're slow and expensive to set up. If you run a standard N-tier architecture and you have fifty 'backend apps' (say web apps) that talk to a communal PG server, you want pgBouncer to avoid connection churn. If your scale's a lot smaller than that then you don't need pgBouncer and you can get by with your app pooling a handful of conns --- subject to the usual vagaries that come with hand-wavey explanations. Measure, monitor, etc.

It's an interesting and illuminating article, but the take-away is don't do weird statement-based transaction multiplexing unless you know exactly why you want that!

If you're running into "connection limits" on Heroku or whatever... maybe now is the time to stop letting other people manage your DB and just run it yourself? You're clearly big enough to run up against that.

Or, maybe, release your session back to the pool (app/bouncer/whatever) instead of sitting on it. For OLTP stuff like 99% of all web apps, that's key. With 100 connections and a p99 request-response cycle with DB hits being <50ms, you can handle a lot of traffic.


I've been running pgBouncer in large production systems for years (~10k connections to pgbouncer per DB, and 200-500 active connections to postgres). We have so many connections because microservices breed like rabbits in spring once developers make the first one, but I could rant about that in a different post.

We use transaction level sharing. Practically, this means we occasionally see problems when some per-connection state "leaks" from one client to another when someone issues a SQL statement that affects global connection state, and it affects the query of a subsequent client inheriting that state. It's annoying to track down, but given the understanding of behavior, developers generally know how to limit their queries at this point. Some queries aren't appropriate for going through pgbouncer, like cursor based queries, so we just connect directly to the DB for the rare cases where this is needed.

Why so many connections? Say you make a Go based service, which launches one goroutine per request, and your API handlers talk to the DB - the way the sql.dB connection pooling works in Go is that it'll grow its own pool to be large enough to satisfy the working parallelism, and it doesn't yield them for a while. Similar things happen in Java, Scala, etc, and with dozens of services replicated across multiple failure domains, you get a lot of connections.

It's a great tool. It allows you to provision smaller databases and save cost, at the cost of some complexity.


> microservices breed like rabbits in spring once developers make the first one

microservices talking to the same db... thats not microservices thats a disaster. you basically combine the negatives of the microservice world with the negatives of the monolith - tight coupling.


they can have their own databases, but still be on the same Postgres instance (aka cluster in Postgres parlance).


Databases are there to share data and provide transactional guarantees and even locking. Your data often must be tightly coupled like this, and most databases designed with this in mind and provide benefits when doing so. It doesn't mean your apps need to be, and there are still plenty of benefits in deployment and operations to be had with microservices. Silo the data when it makes sense, but force the issue you end up with a different problem trying to reimplement the benefits of a database in the app layer or with a fault tolerant, guaranteed delivery messaging system (itself a database under the hood).


> maybe now is the time to stop letting other people manage your DB and just run it yourself

> With 100 connections and a p99 request-response cycle with DB hits being <50ms, you can handle a lot of traffic.

You would be shocked and appalled at how little many devs know about DBs, or how long something _should_ take. I've had to explain that triple-digit msec latency on a simple SELECT is in fact not normal, and they should fix it. And that's just using it, not running it. There are a massive number of orgs that don't want to have to ever think about things like backup strategies or patching, and so they happily fork money over to a cloud provider to do it for them.


> If you're running into "connection limits" on Heroku or whatever... maybe now is the time to stop letting other people manage your DB and just run it yourself? You're clearly big enough to run up against that.

This is top advice actually. I would usually always suggest this! I've felt a lot other services too that can run better on the local/server deployment.


On-topic tangent: reminder or heads-up (depending on if you’ve already seen this) that Postgres is experimenting with thread-based instead of process-based connections (which would pretty much obviate the need for pgBouncer if it works out and becomes the connection model going forward).

HN discussion from a few months ago, with lots of commentary relevant to any pgBouncer scenarios: https://news.ycombinator.com/item?id=36393030


I follow the PG world very closely and would like to add that Postgres community essentially "soft" turned that down and its not going to be a thing any time soon, like in the next five years at least.


I’m wondering if they’ve considered less radical solutions? Something like adding a new thread-based front end to Postgres itself while keeping the processes, and then gradually fixing whatever makes that less than seamless.


> Postgres doesn’t have a concept of nested transactions

It has savepoints and those nest fine.

(perl's DBIx::Class can be configured to automatically convert transactions into savepoints if it's already inside another transaction; presumably any other ORM-like thing could also do that for you in theory but whether the one you're currently using -does- is left as an exercise to the reader)


beware of the issues related to postgres subtransactions https://postgres.ai/blog/20210831-postgresql-subtransactions...


Solid list of ways to shoot yourself in the foot with it, even if "use savepoints in moderation" is, for a suitable value of "moderation", sufficient to largely avoid them. (though if you're going to use them, read the article and maybe some of the things it links anyway to calibrate your sense of suitable)

And that article also lists a bunch of ORMs that handle subtransactions out of the box, for anybody curious.


Django does this too


This might be the best explanation I have seen of the benefits and pitfalls of deploying pgbouncer ( or any connection pooler as they are all similar ). The only thing I would add is from a deployment perspective, you should run two layers of pgbouncer. One that runs on the same host as your application ( sidecar container in k8s ), and one global layer that all connections to your DB must pass through. Every large scale application generally gets to this kind of architecture for various reasons, and it helps to just start with it. Some people will argue that connection pooling at the application layer is good enough to replace the sidecar proxy, but from my experience management is simplified by having the pooling done via the sidecar.


My large scale application certainly didn't run the pooler on the host. I don't see any reason to do this except for not wanting to become familiar with postgres itself.


What about running PHP, Django, Node, or Ruby based applications ? They all do connection per request so the pooler on host gives you an immediate improvement to connection latency.

Even if you are using a language with a built in connection pooler like Go or Java being able to manage your connections external to your application is enough of a benefit to keep the pooler separate.


I ran the Pooler on the host.. I ran a few hundred hosts, so having a central pool was not ideal.


> I also think community and industry enthusiasm around Postgres is at an all time high. There are more managed hosting options than ever (Crunchy Data, Render, Fly.io, and on and on), deep extensions like PostgresML, Citus and Timescale, serverless options like Neon, and real-time services like Supabase with Postgres at their center.

Please sell me why you'd use one vendor to host your DB and another one to host your app. If they are in different racks (forget about different data centers), that's going to kill you on latency. The only way around that we've seen is to vectorize all your queries.


If deploying on different racks is that big of problem for your latency then you’re severely limited in how far you can scale your application just by physical rack space. IME this just isn’t true.


And even if in different data centers (say different AZs), query latency is probably 1ms; in practice that’s quite low. How many queries are you doing in a request?


> query latency is probably 1ms

> This generally produces single digit millisecond roundtrip latency between AZs in the same Region.

https://aws.amazon.com/blogs/architecture/improving-performa...

So we're both wrong.


Crunchydata deploys in AWS, GCP or Azure if I recall correctly.


Yep, they do. Currently using them on AWS and pretty happy.


Ditto aiven.


what a great post, we have had a ton of issues with users using pgbouncer and it's not because things are "broken" per se, it's just the situation is very complicated, and pgbouncer's docs are also IMO in need of updating to be more detailed and in a few critical cases less misleading, specifically the prepared statements docs.

This blog post refers to this misleading nature at https://jpcamara.com/2023/04/12/pgbouncer-is-useful.html#pre... .

> PgBouncer says it doesn’t support prepared statements in either PREPARE or protocol-level format. What it actually doesn’t support are named prepared statements in any form.

IMO that's still not accurate. You can use a named prepared statement just fine in transaction mode. start a transaction (so you aren't in autocommit), use a named statement, works fine. you just can't use it again in another transaction, because it will be "gone" (more accurately, "unmoored" - might be in your session, might be in someone else's session). Making things worse, when the prepared statement is "unmoored", its name can then conflict with another client attempting to use the same name.

so to use named prepared statements, you can less ideally name them with random strings to avoid conflicts, or you can DEALLOCATE the prepared statement(s) you used at the end of your transaction. for our users that use asyncpg, we have them use a uuid for prepared statements to avoid these name conflicts (asyncpg added this feature for us here: https://github.com/MagicStack/asyncpg/issues/837). however, they can just as well use DEALLOCATE ALL, set this as their `server_reset_query`, and then so that happens in transaction mode, also set `server_reset_query_always`, so that it's called at the end of transactions. Where pgbouncer here IMO entirely misleadingly documents this as "This setting is for working around broken setups that run applications that use session features over a transaction-pooled PgBouncer." - which is why nobody uses it, because pgbouncer claims this is "broken". It's not any more broken than it is to switch out the PostgreSQL session underneath a connection that uses multiple transactions. Pgbouncer can do better here and make this clearer and more accommodating of real world database drivers.


> There are more managed hosting options than ever (Crunchy Data, Render, Fly.io, and on and on)

From fly.io's docs [1]:

> This Is Not Managed Postgres

[1] https://fly.io/docs/postgres/getting-started/what-you-should...


Why would someone downvote this? Seems pretty relevant.


My understanding now is that using pgBouncer falls under the category of YAGNI / “the best part is no part”: don’t use it unless you have a demonstrated need for it, because it adds complexity.

About 5 years ago we migrated to postgres and installed pgBouncer because of course you do. Sometime last year I started seeing some weird issues like the article points out, and the more I looked into the gory details, the more shocked I became. (The fact that statement level isolation even exists is… wow)

So I did an experiment and deleted it. Things kept working just fine. We’re not living in a microservices world so application-level pooling is enough — I don’t think I’ve ever seen us use half of available connections.


PgBouncer has always left me confused in the world of application level connection pooling. I have never quite understand the value of it if we are already using connection pools in our applications. I don't want to pool connections to another pool.


Application level connection pools are not enough if you're using something like k8s and have your "application" running across hundreds of pods, each with their own application level connection pool. pgBouncer helps tremendously in that situation because all those pods will use a single pool. We cut down avg open connections dramatically by doing that from over 1000 to less than 400.


Also IoT


I would hope you are not allowing IoT devices direct access to your database. There is no saving that haha.


Unfortunately some of my customers are. Hopefully they're setting up isolated roles which can only access stored procedures to log readings so that at worst they're opening themselves up to DDoS


This still doesn't really make sense to me. You can't scale an application that relies on a database heavily to this level because your fundamental constraint IS the database. If you are already hitting your max number of connections with a small number of applications there are no benefits to further horizontal scaling. You are just passing the buck around because only a limited number can hit the database at any one time.


The constraint is less often the database in reality then in theory. Expecially when you consider many large scale applications are doing complex things. A normal request trace might only spend 30% or 40% of its time in the DB call. When you consider that a single postgres database can clear 200k QPS, you start to get to a world where you have thousands of hosts. If you tried to tune the in application connection pool to suit the various scale of deployments you would quickly find that having a proxy is both simpler and safer.

I would confidently state that most large scale applications have run into the situation where they have scale up their application worker fleet, and then crashed their database with too many connections. Coordinating the size of the worker connection pool in a world where we have elastic application workers is enough of a task that deploying a proxy is really the simplest and best solution


Most large scale web applications spend their time reading and writing data, both to/from clients and to/from other remote services such as databases. You don't need thousands of hosts. Stackoverflow famously ran 9 server instances in 2016 with 7 dedicated to the primary sites.

Unlike postgres, Oracle and Sql server can support thousands of connections but they see performance degradation at a certain point. So I have never seen them crash from too many connections(Although they definitely get slower.).


Stackoverflow is very much the exception not the rule. Most of your top tier software companies have server fleets that scale well past the 10,000's of nodes level, and for container based workloads I don't think its uncommon to have even medium sized companies running 100k+ containers.


Tldr; aim to be the exception!


The real problem is working with postgres’ “each connection is a process” model. Pgbouncer puts an evented pool in front of postgres to deal with this. Apps that are aggressive with DB will not benefit from having an evented pool in front. However, web apps (think rails) will have connections checked out even if they don’t need them. Pgbouncer helps here. If your app recycles DB connections when not in use, that leads to connection thrashing and higher latency, which pgbouncer can help with. But you’re right that at some point, the DB is the bottleneck. For most people, it’s the number of connections, because postgres makes a process for each connection.


> You can't scale an application that relies on a database heavily to this level because your fundamental constraint IS the database.

I pool ~10.000 connections down to under 500. I can't do application level pooling because the application is just thousands of individual processes, you often see this with Python, PHP or NodeJS applications.

500 open connections is way less overhead than 10.000 on the Postgres server. I'm very happy to "pass the buck" of connection pooling to separate machines with pgBouncer.


The point is that if you have 50 k8s pods that each have their individual connection pool, some of them will be holding idle connections while others are hitting their max connection limit. A single pool is much more flexible.

Additionally, the "transaction" mode of PgBouncer can increase the utilization of connections further by making them available to a different application when one application holds a connection while doing something different (e.g. waiting for a call to an external service).


If one has CPU load for 50 pods, then it is not a usual project I guess? Average load should be thousands of requests per second I believe.

So the question is how many applications have CPU load for 50 pods but also cannot saturate a single database, so sharding is not considered yet? Gut feeling is that they are few, more like exceptions.

From my personal experience pgBouncer is used a lot when framework/library does not support application level pooling in the first place, and not so much when it does.


This. For the web service workloads most of us run, I’ve always run out of postgres connections before exceeding the vertical limit of a postgres server.


Is this because your default approach is to horizontally scale or because you have tried other options? Logically scaling vertically a single server with pooled connections should have a better effect than horizontally scaling to multiple servers that need to share connections through a proxy. The proxy doesn't magically increase the number of connections available to the database. It will add overhead to all your database interactions though.


Vertically scaling has a cost component. If you're able to use pgbouncer with transaction pooling to better utilize but not exceed your existing vertical postgres limit, then you're set. If you do, then you must scale up. Pgbouncer can help you lengthen your runway. Tuning your pgbouncer pool sizes can help minimize the overhead but as with anything we do, it's about weighing trade-offs for your situation.


Don't spin up 50 pods. You have outscaled your database. You can't make IO operations faster by throwing more workers at it and you can only have so many connections working at once. As a side note if your application is a typical IO bound web app its very unlikely you can process enough transactions to effectively use 50 workers in a single region.


We're not necessarily talking about 50 pods for the same application, could also be a zoo of smaller low traffic applications sharing a Postgres instance.


In those cases, does one keep the application-level pool at each pod, in addition to the "communal" PgBouncer pool, or does not offer any advantage?


If you limit yourself to a subset of Postgres' features, connections can become your bottleneck. I work with a production system where the major scaling constraints are 1) the VM capacity of the cloud region it runs in and 2) available connections to Postgres


To be clear pgbouncer does not add connections to postgres or remove the connection bottleneck. Its still there under the covers. If you are saturating your connections it will not be able to improve on throughput. It sounds like you need a different architecture to allow for queueing work. The approach pgbouncer takes may actually reduce performance overall as it will intermix work on the pg instance which, if you are already saturating the database, will slow down other operations overall.


Yup, one of the things we're doing is moving parts of the system away from Postgres into queues.


If you have 10 application instances each with a pool of 20 connections (half of which are idle), you have 100 active connections and 100 idle connections.

If you have a single "real" connection pool, the idle pool is effectively shared among all application instances. You can have 100 active connections and 10 (or maybe 20) idle.

I have run into this problem, but I solved it with careful tuning of automatic instance scaling parameters, connection pool size and timeout. But this only gets you so far; a single real connection pool would be more effective (at the cost of added complexity).


I think it's an issue with a microservice setup. TFA suggests ~300 connections being optimal for Postgres and I've seen microservice setups which have more than 300 processes, which means even limiting each process to a single DB connection might not be enough.

But yeah, for less distributed applications, just have N worker threads and don't close the DB connection after each job.


> You can't scale an application that relies on a database heavily to this level because your fundamental constraint IS the database.

That's a big assumption. If your app is just an HTTP wrapper for SQL queries against a poorly-optimized database, sure.

But there are plenty of applications that spend time the majority of their time doing other stuff (http requests, FFI calls, g/cpu-intensive processing, etc.) where the database interaction is a small part of the overall performance profile - certainly not the bottleneck, even with hundreds of concurrent clients.

In those cases, rather than artificially throttling the concurrency to stay within your global max_connections, you would ideally run as many as you can (ie fully utilize your infrastructure) and grab database connections from an external pool when you need them.


it lets me create a load balancer in front of multiple database instances without having to do a bunch of application-level BS

additionally with microservices, managing connection pooling can be difficult across legacy software, service versions, teams, etc.

PGBouncer lets me have a front-end and manage that at an infra level.


Lambdas


If you are running aws lambdas I believe you should be using the RDS proxy product(This is a very similar product though).


Super annoying that RDS Proxy doesn’t support IAM auth against the DB.

We moved all our DB users to use IAM auth based on instance roles and then found out that RDS proxy doesn’t support it.


Great article. Should be part of the official PgBouncer documentation.


If a Database needs to support more than 500 connections at a given time (due to scale), what's the way around?


What exactly do you want to scale? But in the end the answer is likely "connection pooling", either integrated in the application or with a dedicated pooler like PgBouncer which this article is about.

For Postgres a good way to scale is to just use a bigger server. You can get a lot of very fast storage and lots of CPU cores inside a single server today. And if that isn't enough you're far, far into territory where you can't give generic answers and it depends a lot on your specific use case and what you do with that database.


I mean let's say I have to support 10000 updates per second and 100000 reads per second. Surely 500 connection pool won't suffice.


I don't think 10K updates per second are really that difficult for the db (Unless you have locking issues).

But at that point your reads should be probably be sent to read-only replica's. So you write to a master but all your read-heavy apps and queries run against replicas.


Each of those reads may only be 1ms, which would translate into only 100 connections needed.

So the workload you describe — without more detail — may in fact be just fine with that connection count.

But plenty of people also run Postgres with well above 500 max connections.


I think you would look at

1. "Server level connection pooling" in the article (all clients share one pgbouncer instance) and

2. "Statement mode" if you can live without transactions (connection is re-pooled after every statement) otherwise "Transaction Mode" (connection is repooled upon transaction commit or rollback).


If you require 500 instances of your app to scale, how do you get away with just a single DB instance?


By having relatively simple queries. You can push 50k queries per second on a 64 CPU Postgres machine.

This is real world experience from a moderately complex application.


It all depends on your workload. Those 500 instances may be make calls to other services, services, for example, or burning CPU in various ways.


simple queries, grouping writes, minimizing updates, caching, connection pooling


Let's hope PG team will drop thread-based connections for the superior lightweight task-based connections. Then PgBouncer will not be needed.


PG doesn’t have a thread per connection; it has a process per connection.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: