I look forward to trying this out and seeing how this architecture change works in practice. For large Postgres databases (tens of TB in my experience), the old stats collector architecture was a reliable source of operational bugs that didn't have any real fixes, and this has been the case for a long time. If those issues start being addressed by these changes, that is a huge boon for people with large Postgres instances and will significantly improve its scalability story.
This could be a really important change, especially for people with large instances.
To give some anecdata: I have been bitten twice by going to a .0 release with Postgres, once by index corruption and once by some subqueries returning wrong results in some cases.
I have since decided to always wait for a .1 with Postgres before updating.
The good news is that this is only a few weeks past the initial release.
Now if only more people upgraded during the RC phase already, then everybody could go to a .0 release.
And conversely, if everybody follows my (and your) advice, then .2 will be the new .1.
That was a nasty one, but then again, my recommendation would still be to use `pg_repack` over reindex concurrently because that one also gets you concurrent clustering and that one was not affected by this bug.
I'm not downplaying the issue and index corruption is really bad, but I would wager a guess that admins who do need concurrent reindexing would also be aware of `pg_repack` and would prefer that anyways because of the other benefits it provides.
This is probably why it took 6 months for the issue to be reported and fixed.
pg_repack has some significant downsides in its implementation; I question whether it’s really a default over re-indexing concurrently. I’ve certainly not gotten that impression, and we maintain many very large Postgres clusters.
it was in the 2012/2013 time frame, so I can't find the relevant release note any more, but it was reliably returning wrong results for a specific sub query pattern.
Not all of them were broken, but the broken one was returning wrong results 100% of the time.
Index corruption shows the same symptoms, but, of course, it is a different cause.
Yes, PostgreSQL needs more tests, but no it is not really a coverage issue. Most of the serious bugs have been related to concurrency or other things which simple test coverage can find. Finding these bugs is usually not trivial.
That is exactly the kind of tools PostgreSQL needs more of. There are some tools but more are needed. More plain old test coverage will not help much if at all.
Major version jumps are always fun. Recently I discovered RDS recommends pglogical over built in replication for a reason, the latter doesn't work well in RDS with larger than RAM replica logs.
I wasn't using Aurora at the time. And I don't expect non-Aurora RDS to be exactly the same as vanilla Postgres either. Still, it was surprising that the old Pg solution is supported (pglogical) while the new one wasn't yet, at least for my version jump.
So I was using their guide for minimal downtime upgrades [0], see option 'D'. And I was replicating Pg 11 to 14. Logical replication appeared to work in that case yet big tables would never catch up or appear to get truncated. If you're replicating among nodes at the same Pg version then built-in replication may work for you.
Once I switched to pglogical jumping major versions worked as intended. Now it's possible in the rush I used the wrong setting somewhere and built-in logical replication can work between major versions. Though I lost enough time experimenting I'll keep using pglogical until AWS officially recommends something else.
We don’t have huge constant loads. Over-provisioned on massive bare metal so we can take a lot.
Also daily pg_basebackup which does not eat too many resources. Standard pg_dump is impossible.
About to also add log shipping when I upgrade. We don’t do super critical things like banking or anything like that. But I do want to move to the point where at worse we only lose a few minutes of data.
Now we can lose up to 24 hours if all the replicas die at once.
So that's 3 nodes total, one permanently master and the other two permanently read-only slaves? Is there any kind of "automatic rollover" if the master goes down where one of the slaves automatically promotes itself?
Not necessary. As I said, we are not at all a critical service. If we are down an hour, it's not a big deal (like 95% of the rest of the inter-webs). We are not banksters.
If it goes down, we get a page, and we can have it switched by hand in a few minutes.
I guess if our service was super-critical, I would do that, but since it's not, the three of us that work as developers and sys-admins can deal with it very quickly.
I'm sure Postgres is full of these inefficiencies and suboptimal system designs. The process model is known to be pretty horrible.
Considering the huge engineering teams SqlServer and Oracle have, I'm always amazed how well Postgres works - despite the tiny number of full time developers.
Oracle is 25 million lines of code vs 1.3 million lines of code of PostgreSQL
Still Oracle don't have all basic isolation levels. Only the Read-Commited works perfect. And DDL operations are still not transactional. We should think which is inefficient design. The process model may not be very "efficient" as thread model. but it is more "stable" and more "secure". Benchmark results are not bad either.
Oracle, in the past, was also a multi-process model on Linux. It looks like the multi-threaded model was an optional change at some point around Oracle 12.
To get around the inefficiencies of spawning many processes, I put pgbouncer in front of PostgreSQL.
Postgres forks off a new process for each connection.
This introduces lot's of overhead across the board.
* Cross process communication is a lot more expensive (done via shared memory or as here via the file system)
* Switching between processes is a lot more expensive because each process has its own memory space, hence a switch flushes the TLB. Also more bookkeeping for the OS.
This is especially bad for a DB, which will usually spend most of its time waiting for IO, so can switch execution context all the time.
* Each process also has a distinct set of file descriptors, so those need to be cloned as well
* A dB needs lots of locks. Cross process locks are more expensive.
Biggest one I'm aware of is connections aren't threaded. And Pg tries to preallocate resources at the time of the connection. Together these make connections expensive compared to Mysql's threaded connections. Many folks run a connection pooler in front of Pg for this reason.
One pro of non-threaded is simplicity and no need for thread safety everywhere.
This could be a really important change, especially for people with large instances.