Hacker News new | past | comments | ask | show | jobs | submit login
PostgreSQL 15: Stats Collector Gone? What’s New? (percona.com)
153 points by _bohm on Aug 28, 2022 | hide | past | favorite | 39 comments



I look forward to trying this out and seeing how this architecture change works in practice. For large Postgres databases (tens of TB in my experience), the old stats collector architecture was a reliable source of operational bugs that didn't have any real fixes, and this has been the case for a long time. If those issues start being addressed by these changes, that is a huge boon for people with large Postgres instances and will significantly improve its scalability story.

This could be a really important change, especially for people with large instances.


We are running pg11 for fairly big databases (almost terabyte size data directory).

I was waiting to upgrade to pg12 then 13 then 14.

With this change I seriously think I’ll just upgrade to 15 at the end of the year.


Do you really want to upgrade to a .0 release?

Now is the perfect time to update to pg14 because it’s on rev 5 (14.5).


To give some anecdata: I have been bitten twice by going to a .0 release with Postgres, once by index corruption and once by some subqueries returning wrong results in some cases.

I have since decided to always wait for a .1 with Postgres before updating.

The good news is that this is only a few weeks past the initial release.

Now if only more people upgraded during the RC phase already, then everybody could go to a .0 release.

And conversely, if everybody follows my (and your) advice, then .2 will be the new .1.

It’s never easy


Depends on how great you trust the quality of those released. `.1` is usually good enough for PSQL except the following issue.

14.4 was released with a fix on silent data corruption when using the CREATE INDEX CONCURRENTLY or REINDEX CONCURRENTLY commands.

https://www.postgresql.org/about/news/postgresql-144-release...


That was a nasty one, but then again, my recommendation would still be to use `pg_repack` over reindex concurrently because that one also gets you concurrent clustering and that one was not affected by this bug.

I'm not downplaying the issue and index corruption is really bad, but I would wager a guess that admins who do need concurrent reindexing would also be aware of `pg_repack` and would prefer that anyways because of the other benefits it provides.

This is probably why it took 6 months for the issue to be reported and fixed.


What advantage does pg_repack have when you only rebuild an index? Or do you mean it has advantage when pg_repack is run on the entire table?


pg_repack has some significant downsides in its implementation; I question whether it’s really a default over re-indexing concurrently. I’ve certainly not gotten that impression, and we maintain many very large Postgres clusters.


Interesting! Were the queries reliably returning wrong results?


it was in the 2012/2013 time frame, so I can't find the relevant release note any more, but it was reliably returning wrong results for a specific sub query pattern.

Not all of them were broken, but the broken one was returning wrong results 100% of the time.

Index corruption shows the same symptoms, but, of course, it is a different cause.


It sounds like Postgres's test coverage could be better ...


Yes, PostgreSQL needs more tests, but no it is not really a coverage issue. Most of the serious bugs have been related to concurrency or other things which simple test coverage can find. Finding these bugs is usually not trivial.


This is something foundationDB does pretty well. They built a simulator that tests such things. Doubt you could port to Postgres easily though.


That is exactly the kind of tools PostgreSQL needs more of. There are some tools but more are needed. More plain old test coverage will not help much if at all.


You meant "cannot" right:

> which simple test coverage cannot find.


Major version jumps are always fun. Recently I discovered RDS recommends pglogical over built in replication for a reason, the latter doesn't work well in RDS with larger than RAM replica logs.


Aurora doesn't work with a query that computes a temporary table larger than memory either. Amazon's Postgres things are not Postgres.


I wasn't using Aurora at the time. And I don't expect non-Aurora RDS to be exactly the same as vanilla Postgres either. Still, it was surprising that the old Pg solution is supported (pglogical) while the new one wasn't yet, at least for my version jump.


I have never used RDS.

Anyway here is a post for using built-in logical replication.

https://dev.to/pikachuexe/postgresql-logical-replication-for...


Posts like this didn't help in my case. Though depending on versions being replicated and RDS settings it could work.


Ya there are many managed services and limitations around those.

Only throwing out this as a reference.


Hey, we use logical replication on RDS. Never considered pglogical. Do you have a link to Amazon’s recommendation of pglogical?


So I was using their guide for minimal downtime upgrades [0], see option 'D'. And I was replicating Pg 11 to 14. Logical replication appeared to work in that case yet big tables would never catch up or appear to get truncated. If you're replicating among nodes at the same Pg version then built-in replication may work for you.

Once I switched to pglogical jumping major versions worked as intended. Now it's possible in the rush I used the wrong setting somewhere and built-in logical replication can work between major versions. Though I lost enough time experimenting I'll keep using pglogical until AWS officially recommends something else.

[0] https://aws.amazon.com/blogs/database/part-1-upgrade-your-am...


Thank you!


What are you doing for replication, just curious?

Reads are easy with replicas, right? What are you doing to handle "writes" across all your apps/regions/etc.?


Just standard replication on 2 other nodes.

We don’t have huge constant loads. Over-provisioned on massive bare metal so we can take a lot.

Also daily pg_basebackup which does not eat too many resources. Standard pg_dump is impossible.

About to also add log shipping when I upgrade. We don’t do super critical things like banking or anything like that. But I do want to move to the point where at worse we only lose a few minutes of data.

Now we can lose up to 24 hours if all the replicas die at once.


So that's 3 nodes total, one permanently master and the other two permanently read-only slaves? Is there any kind of "automatic rollover" if the master goes down where one of the slaves automatically promotes itself?


Not necessary. As I said, we are not at all a critical service. If we are down an hour, it's not a big deal (like 95% of the rest of the inter-webs). We are not banksters.

If it goes down, we get a page, and we can have it switched by hand in a few minutes.

I guess if our service was super-critical, I would do that, but since it's not, the three of us that work as developers and sys-admins can deal with it very quickly.


I'm sure Postgres is full of these inefficiencies and suboptimal system designs. The process model is known to be pretty horrible.

Considering the huge engineering teams SqlServer and Oracle have, I'm always amazed how well Postgres works - despite the tiny number of full time developers.


Oracle is 25 million lines of code vs 1.3 million lines of code of PostgreSQL Still Oracle don't have all basic isolation levels. Only the Read-Commited works perfect. And DDL operations are still not transactional. We should think which is inefficient design. The process model may not be very "efficient" as thread model. but it is more "stable" and more "secure". Benchmark results are not bad either.


Oracle, in the past, was also a multi-process model on Linux. It looks like the multi-threaded model was an optional change at some point around Oracle 12.

To get around the inefficiencies of spawning many processes, I put pgbouncer in front of PostgreSQL.


> The process model is known to be pretty horrible.

isn't this still processes, just with shared memory?


Any recommended resources explaining the pitfalls of multiprocess vs multithreaded?


Postgres forks off a new process for each connection.

This introduces lot's of overhead across the board.

* Cross process communication is a lot more expensive (done via shared memory or as here via the file system)

* Switching between processes is a lot more expensive because each process has its own memory space, hence a switch flushes the TLB. Also more bookkeeping for the OS.

This is especially bad for a DB, which will usually spend most of its time waiting for IO, so can switch execution context all the time.

* Each process also has a distinct set of file descriptors, so those need to be cloned as well

* A dB needs lots of locks. Cross process locks are more expensive.

* ...

These things add up.


Biggest one I'm aware of is connections aren't threaded. And Pg tries to preallocate resources at the time of the connection. Together these make connections expensive compared to Mysql's threaded connections. Many folks run a connection pooler in front of Pg for this reason.

One pro of non-threaded is simplicity and no need for thread safety everywhere.


This makes we want to profile postgres if they have this kind of thing in version 14.


Weird to see these kind of problems in a database that used to be considered the best.


I think if you were to dive into the mailing lists of other databases/software you'd realize that every piece of software has its lingering issues.


Every software has imperfections. thats why new versions comes with improvements. There is no end to it




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: