Application level connection pools are not enough if you're using something like...

__s · on Sept 8, 2023

Also IoT

Deadron · on Sept 8, 2023

I would hope you are not allowing IoT devices direct access to your database. There is no saving that haha.

__s · on Sept 8, 2023

Unfortunately some of my customers are. Hopefully they're setting up isolated roles which can only access stored procedures to log readings so that at worst they're opening themselves up to DDoS

Deadron · on Sept 8, 2023

This still doesn't really make sense to me. You can't scale an application that relies on a database heavily to this level because your fundamental constraint IS the database. If you are already hitting your max number of connections with a small number of applications there are no benefits to further horizontal scaling. You are just passing the buck around because only a limited number can hit the database at any one time.

dkhenry · on Sept 8, 2023

The constraint is less often the database in reality then in theory. Expecially when you consider many large scale applications are doing complex things. A normal request trace might only spend 30% or 40% of its time in the DB call. When you consider that a single postgres database can clear 200k QPS, you start to get to a world where you have thousands of hosts. If you tried to tune the in application connection pool to suit the various scale of deployments you would quickly find that having a proxy is both simpler and safer.

I would confidently state that most large scale applications have run into the situation where they have scale up their application worker fleet, and then crashed their database with too many connections. Coordinating the size of the worker connection pool in a world where we have elastic application workers is enough of a task that deploying a proxy is really the simplest and best solution

Deadron · on Sept 8, 2023

Most large scale web applications spend their time reading and writing data, both to/from clients and to/from other remote services such as databases. You don't need thousands of hosts. Stackoverflow famously ran 9 server instances in 2016 with 7 dedicated to the primary sites.

Unlike postgres, Oracle and Sql server can support thousands of connections but they see performance degradation at a certain point. So I have never seen them crash from too many connections(Although they definitely get slower.).

dkhenry · on Sept 8, 2023

Stackoverflow is very much the exception not the rule. Most of your top tier software companies have server fleets that scale well past the 10,000's of nodes level, and for container based workloads I don't think its uncommon to have even medium sized companies running 100k+ containers.

raspasov · on Sept 8, 2023

Tldr; aim to be the exception!

film42 · on Sept 8, 2023

The real problem is working with postgres’ “each connection is a process” model. Pgbouncer puts an evented pool in front of postgres to deal with this. Apps that are aggressive with DB will not benefit from having an evented pool in front. However, web apps (think rails) will have connections checked out even if they don’t need them. Pgbouncer helps here. If your app recycles DB connections when not in use, that leads to connection thrashing and higher latency, which pgbouncer can help with. But you’re right that at some point, the DB is the bottleneck. For most people, it’s the number of connections, because postgres makes a process for each connection.

aeyes · on Sept 8, 2023

> You can't scale an application that relies on a database heavily to this level because your fundamental constraint IS the database.

I pool ~10.000 connections down to under 500. I can't do application level pooling because the application is just thousands of individual processes, you often see this with Python, PHP or NodeJS applications.

500 open connections is way less overhead than 10.000 on the Postgres server. I'm very happy to "pass the buck" of connection pooling to separate machines with pgBouncer.

brazzy · on Sept 8, 2023

The point is that if you have 50 k8s pods that each have their individual connection pool, some of them will be holding idle connections while others are hitting their max connection limit. A single pool is much more flexible.

Additionally, the "transaction" mode of PgBouncer can increase the utilization of connections further by making them available to a different application when one application holds a connection while doing something different (e.g. waiting for a call to an external service).

adontz · on Sept 8, 2023

If one has CPU load for 50 pods, then it is not a usual project I guess? Average load should be thousands of requests per second I believe.

So the question is how many applications have CPU load for 50 pods but also cannot saturate a single database, so sharding is not considered yet? Gut feeling is that they are few, more like exceptions.

From my personal experience pgBouncer is used a lot when framework/library does not support application level pooling in the first place, and not so much when it does.

film42 · on Sept 8, 2023

This. For the web service workloads most of us run, I’ve always run out of postgres connections before exceeding the vertical limit of a postgres server.

Deadron · on Sept 8, 2023

Is this because your default approach is to horizontally scale or because you have tried other options? Logically scaling vertically a single server with pooled connections should have a better effect than horizontally scaling to multiple servers that need to share connections through a proxy. The proxy doesn't magically increase the number of connections available to the database. It will add overhead to all your database interactions though.

film42 · on Sept 8, 2023

Vertically scaling has a cost component. If you're able to use pgbouncer with transaction pooling to better utilize but not exceed your existing vertical postgres limit, then you're set. If you do, then you must scale up. Pgbouncer can help you lengthen your runway. Tuning your pgbouncer pool sizes can help minimize the overhead but as with anything we do, it's about weighing trade-offs for your situation.

Deadron · on Sept 8, 2023

Don't spin up 50 pods. You have outscaled your database. You can't make IO operations faster by throwing more workers at it and you can only have so many connections working at once. As a side note if your application is a typical IO bound web app its very unlikely you can process enough transactions to effectively use 50 workers in a single region.

brazzy · on Sept 8, 2023

We're not necessarily talking about 50 pods for the same application, could also be a zoo of smaller low traffic applications sharing a Postgres instance.

severino · on Sept 8, 2023

In those cases, does one keep the application-level pool at each pod, in addition to the "communal" PgBouncer pool, or does not offer any advantage?

dharmab · on Sept 8, 2023

If you limit yourself to a subset of Postgres' features, connections can become your bottleneck. I work with a production system where the major scaling constraints are 1) the VM capacity of the cloud region it runs in and 2) available connections to Postgres

Deadron · on Sept 8, 2023

To be clear pgbouncer does not add connections to postgres or remove the connection bottleneck. Its still there under the covers. If you are saturating your connections it will not be able to improve on throughput. It sounds like you need a different architecture to allow for queueing work. The approach pgbouncer takes may actually reduce performance overall as it will intermix work on the pg instance which, if you are already saturating the database, will slow down other operations overall.

dharmab · on Sept 8, 2023

Yup, one of the things we're doing is moving parts of the system away from Postgres into queues.

stickfigure · on Sept 8, 2023

If you have 10 application instances each with a pool of 20 connections (half of which are idle), you have 100 active connections and 100 idle connections.

If you have a single "real" connection pool, the idle pool is effectively shared among all application instances. You can have 100 active connections and 10 (or maybe 20) idle.

I have run into this problem, but I solved it with careful tuning of automatic instance scaling parameters, connection pool size and timeout. But this only gets you so far; a single real connection pool would be more effective (at the cost of added complexity).

aidenn0 · on Sept 8, 2023

I think it's an issue with a microservice setup. TFA suggests ~300 connections being optimal for Postgres and I've seen microservice setups which have more than 300 processes, which means even limiting each process to a single DB connection might not be enough.

But yeah, for less distributed applications, just have N worker threads and don't close the DB connection after each job.

perrygeo · on Sept 9, 2023

> You can't scale an application that relies on a database heavily to this level because your fundamental constraint IS the database.

That's a big assumption. If your app is just an HTTP wrapper for SQL queries against a poorly-optimized database, sure.

But there are plenty of applications that spend time the majority of their time doing other stuff (http requests, FFI calls, g/cpu-intensive processing, etc.) where the database interaction is a small part of the overall performance profile - certainly not the bottleneck, even with hundreds of concurrent clients.

In those cases, rather than artificially throttling the concurrency to stay within your global max_connections, you would ideally run as many as you can (ie fully utilize your infrastructure) and grab database connections from an external pool when you need them.