Hacker News new | past | comments | ask | show | jobs | submit login
PostgreSQL 14 (postgresql.org)
950 points by jkatz05 on Sept 30, 2021 | hide | past | favorite | 286 comments



These changes look fantastic.

If I may hijack the thread with some more general complaints though, I wish the Postgres team would someday prioritize migration. Like make it easier to make all kinds of DB changes on a live DB, make it easier to upgrade between postgres versions with zero (or low) downtime, etc etc.

Warnings when the migration you're about to do is likely to take ages because for some reason it's going to lock the entire table, instant column aliases to make renames easier, instant column aliases with runtime typecasts to make type migrations easier, etc etc etc. All this stuff is currently extremely painful for, afaict, no good reason (other than "nobody coded it", which is of course a great reason in OSS land).

I feel like there's a certain level of stockholm syndrome in the sense that to PG experts, these things aren't that painful anymore because they know all the pitfalls and gotchas and it's part of why they're such valued engineers.


We currently use MongoDB and while Postgres is attractive for so many reasons, even with Amazon Aurora's Postgres we still need legacy "database maintenance windows" in order to achieve major version upgrades.

With MongoDB, you're guaranteed single-prior-version replication compatibility within a cluster. This means you spin up an instance with the updated version of MongoDB, it catches up to the cluster. Zero downtime, seamless transition. There may be less than a handful of cancelled queries that are retryable but no loss of writes with their retryable writes and write concern preferences. e.g. MongoDB 3.6 can be upgraded to MongoDB 4.0 without downtime.

Edit: Possibly misinformed but the last deep dive we did indicated there was not a way to use logical replication for seamless upgrades. Will have to research.


The problem with MongoDB though is that you're on MongoDB


As someone that mostly shared that opinion for the last decade or more, I recently set up a cluster for work, and everything seems much more production level quality than I remember or what I assumed it was going to be like. I'm not the one using it for queries every day, but I did do a bunch of testing for replication and failed nodes to confirm that I understood (and could rely) on the claims of robustness, and it seemed to be stable and with good documentation of what to expect in different scenarios and how to configure it (which is not what I experienced doing the same testing back in 2010-2011).

All in all, my impression of MongoDB now is that they're one of those "fake it till you make it" success stories, where they leveraged their popularity into enough momentum to fix most their major problems.


One thing that turned me away from MongoDB was their utter lack of care for your data integrity that they displayed for years. Some of those instances were even documented. Then there were some bad defaults - some could _also_ cause data loss.

For any component that's viewed as a database (as opposed to, say, cache), data integrity is one of the most important metrics (if not THE most).

In contrast, PostgreSQL data loss bugs are rare - and are treated extremely seriously. Defaults are sane and won't lose data. It's one of the few databases I'm pretty confident that data will be there even if you yank a server power cord mid writes.

Has MongoDB improved? Yes, leaps and bounds(seems to still fail Jepsen tests though). But I can't help but feel that it should have been released as a beta product, instead of claiming it was production ready. It wasn't. Maybe it is now. I'd still evaluate other alternatives before considering it.

That said, one thing that always amuses me is how MongoDB gets mentioned in the same context as PostgreSQL. If PostgreSQL would meet your needs, it's unlikely that MongoDB would. And vice-versa(but maybe something else like Cassandra would).


Postgres with tables that are just an ID and a JSONB column nowadays give you practically everything you'd want out of MongoDB.

You can add deep and customized indices as desired, you can easily shard with Citus, and if you want to live without transactions you'll see equally good if not better performance - with the option to add ACID whenever you want. The developer experience argument, where the ->> operator was more confusing than brackets, is now moot.

As a former MongoDB user, there were good synergies between MongoDB and Meteor back in the day, and I loved that tech, but between Materialize and Supabase, you have vastly more options for realtime systems in the Postgres ecosystem.


Although MongoDB claims in an undated article entitled "MongoDB and Jepsen"[65] that their database passed Distributed Systems Safety Research company Jepsen's tests, which it called “the industry’s toughest data safety, correctness, and consistency Tests”, Jepsen published an article in May 2020 stating that MongoDB 3.6.4 had in fact failed their tests, and that the newer MongoDB 4.2.6 has more problems including “retrocausal transactions” where a transaction reverses order so that a read can see the result of a future write.[66][67] Jepsen noted in their report that MongoDB omitted any mention of these findings on MongoDB's "MongoDB and Jepsen" page.

from https://en.wikipedia.org/wiki/MongoDB#Bug_reports_and_critic...


Those defaults were changed a decade ago and were never an issue if you used a driver eg. Python.

And the Jepsen tests are part of the core test suite so do you some evidence they are still failing.

It’s so ridiculous and pointless to be rehashing the same issues a decade later.

Actually more a testament to the company that it’s still hugely successful and depended on by some very large applications.


Perhaps, but mongodb was responsible for something I have bookmarked as "the worst line of code ever".

Which decided whether or not to log connection warnings based on Math.random()

https://github.com/mongodb/mongo-java-driver/blob/1d2e6faa80...


a) This is a line of code from 2013 and was fixed weeks after.

b) Based on the JIRA [1] it was designed to only log 10% of subsequent failures where there is no connection to prevent log flooding. You would still get the initial failure message.

Pretty reasonable technique and hardly the worst code ever.

[1] https://jira.mongodb.org/browse/JAVA-836



Jepsen test suite completely tore them a new one. I don't trust any allegedly distributed database that gets excoriated that badly by Aphyr.

https://jepsen.io/analyses/mongodb-4.2.6

That's just a bit more than a year ago. Come on.

MongoDB is like Mysqldb. I am so so so tired of hearing "that's been fixed, it's great now", doing a paper-thin dive into things, and seeing there are massive problems still.

I used MongoDB with Spring Data, it is impressively seamless.

It's just that there are way too many people who have sold snake oil for a decade-plus now, and I don't trust what they say anymore, and won't for a long long time.


Even worse, MongoDB lied about having fixed these bugs.

https://en.wikipedia.org/wiki/MongoDB#Bug_reports_and_critic...


Let's be clear, I definitely don't think it's great. It's just that my immediate response prior to six months ago was to laugh at the mere suggestion it be put into production.

The only reason it actually was put into production is because we had a vendor requirement on it (and why they thought it was sufficient, I'm not sure).

There's a difference between "not suitable for anything because it's so buggy and there's been so many problems over the years" and "not suitable as a replacement for a real RDBMS for important data". For the former, I think my opinion was possible a little harsh for the current state of it. For the latter, yeah, I'm not going to blindly trust it for billing data and processing yet, that's for sure.


So did you do app-level code to verify writes? Double checking, etc?


I wrote a few small test programs to run doing continuous inserts to the master, and tested shutting down, firewalling off, and killing the process of different members of the cluster and how it recovered and if data loss was experienced by comparing data sets.

It was sufficient for me to not feel like we were taking on undue risk by using it, and since our use case is not one where we're in major trouble if a problem does come about (restoring from daily backups should be sufficient) and we're not doing anything transactional, that's good enough. As I mentioned earlier, it was a vendor requirement, so we just wanted to make sure it wasn't something that was problematic enough to make us question the vendor's decision making.


>All in all, my impression of MongoDB now is that they're one of those "fake it till you make it" success stories, where they leveraged their popularity into enough momentum to fix most their major problems.

The downside being that their reputation is now somewhat charred.


> All in all, my impression of MongoDB now is that they're one of those "fake it till you make it" success stories, where they leveraged their popularity into enough momentum to fix most their major problems.

That's not all bad. The same could be said of MySQL. Both DBMS prioritized ease of use over data integrity in the early going.


And yet PostgreSQL making the exact opposite choice has really paid off in the longer run. People used to dismiss it as simply a toy for academics to play with, and look where the project is today. It can easily surpass most NoSQL databases on their home turf.


To be fair PostgreSQL 15 years ago also had a lot of problems storing data reliability. Some of them manifested as performance issues. I also heard a fair number of war stories about corruption with "large" databases (e.g., 1TB+). PG replication lagged MySQL for many years as well. These seem to be non-issues today.

At this point there's effectively no difference in the durability of data stored in MySQL or PostgreSQL, so it's hard to argue that one or the other made a better choice. They just got there by different paths.

In fact, PostgreSQL is winning back share in part because of licensing. GPLv2 is limiting for a lot of applications, and there continue to be concerns about Oracle ownership. It's also absorbed a lot of features from other databases like JSON support. That's not special to PostgreSQL though. It's been a trend since the beginning for SQL RDBMS and explains why they have stayed on top of the OLTP market for decades.


And how exactly is that a problem?


Some things seem to have changed from 2018, but MongoDB was by far the worst database I ever had the displeasure of using (and Amazon DocumentDB was even worse).

https://jepsen.io/analyses/mongodb-3-6-4

https://jepsen.io/analyses/mongodb-4.2.6


Posting old Jepssen analyses is like pointing at old bug reports. Everytime Jepsen finds a bug we fix it lickety-split. I know it's not cool to focus on that fact, but it is a fact. The Jepsen tests are part of the MongoDB test suite so when we fix those problems they stay fixed.

I would love to hear your personal experience of MongoDB as opposed to reposting old Jepsen reports. Perhaps there is something that we can address in 5.1 that is still a problem?

(I work in developer relations at MongoDB)


The latest "old Jepsen report" is barely a year old. It's not like digging up dirt from years ago.

It also seems like there was quite a lot wrong even a year ago, quoting from there:

> Roughly 10% of transactions exhibited anomalies during normal operation, without faults.

It's just not a very reassuring response to say "when someone goes to dig a bit and finds a lot of show-stopping bugs, we address those specific bugs quickly".

To me it sounds like the architecture and care just isn't there for a robust data storage layer?


Something that was drilled into me decades ago is that there is no such thing as fixing multi-threaded (or distributed) code via debugging or patching it "until it works".

You either mathematically prove that it is correct, or it is wrong for certain.

This sounds like an oddly strong statement to say, but the guy who wrote the textbook that contained that statement went on to dig up trivial looking examples from other textbooks that were subtly wrong. His more qualified statement is that if a professor writing simplified cases in textbooks can't get it right, then the overworked developer under time pressure writing something very complex has effectively zero chance.

The MongoDB guys just don't understand this. They're convinced that if they plug just one more hole in the wire mesh, then it'll be good enough for their submarine.

PS: The professor I was referring to is Doug Lea, who wrote the "EDU.oswego.cs.dl.util.concurrent" library for Java. This was then used as the basis for the official "java.util.concurrent".


If you use MongoDB as a document store, arguably it's core functionality, you're not exposed to any of the shortcomings Jepsen rightly identified and exploited weaknesses in.

Transactions are new to MongoDB and they are not necessary for most. Structure your data model so you only perform single-document atomic transactions ($inc, $push, $pull) rather than making use of multi-document ACID transactions. It's possible, we're doing it for our ERP.

Sharding is something we've intentionally avoided opting for application-layer regional clusters. We specifically were avoiding other complexities related to shards that are not a concern for replica sets. Durability and maximum recovery time during emergency maintenance caused us to avoid them.


where is the latest jepsen test results published?


That was sarcasm. But yeah, you can search for MongoDB and you'll come across many many posts criticizing it.

It can be said Mongodb is hated ad much as Postgres is loved.

Personally I have no opinion about mongodb.


Yes, and most of these love/hate memes are blowned out of proportion by people who don't actually have any real expertise in those technologies, but just parrot whatever they've read in some memes.


You're exactly correct. Tons of "XYZ is bad" because of some meme that they don't even understand or have context on that hasn't been relevant for years.

I have no idea if MongoDB is good or bad at this point, but the comments of "haha it's mongo" are completely devoid of meaningful content and should be flagged.


I was part of a team that operated a large Mongo cluster for most of the last decade. I would not have advised anyone to use Mongo as their durable source of truth database then, and I still don't think it's advisable to do so now. On numerous occasions, Mongo demonstrated the consequences of poor engineering judgment and an addled approach to logic in critical components responsible for data integrity. In addition, Mongo internalized many poor patterns with respect to performance and change management. Mongo did not, and does not provide the data integrity or performance guarantees that other databases internalize by design (the WiredTiger transition helped, but did not cure many of the issues).

PostgreSQL introduced JSONB GIN index support sometime around 2015, making Postgres a better fit for most JSON-based applications than Mongo.


My issue isn't with people not liking Mongo. It's with contentless meme posts. Your post has real information that adds value to the conversation, and I appreciate that you took the time to write it out.


Because it's (or at least it definitely WAS) true.

There are valid use-cases for mongo but for vast majority of things, you're better to start with postgres. And I say that as an early adopter - I really wanted mongo to succeed but it just failed all of my expectations. All of them.

BTW: this is post about postgres.


You can find posts criticising every database.

Most of the ones for MongoDB are from a decade ago and not at all relevant today.


It's easier to ask how that is NOT a problem, because that list will be much, much shorter.


> Edit: Possibly misinformed but the last deep dive we did indicated there was not a way to use logical replication for seamless upgrades. Will have to research.

It is possible since PG10

https://severalnines.com/database-blog/how-upgrade-postgresq...


We seem to have been misguided by all of the Amazon RDS and Aurora documentation. It seems Amazon prefers to implement postgres logical replication through their database migration service. All upgrades are typically done through pg_upgrade which does require downtime.

Interesting. I can't wait to see how PG12 influences future offerings from the cloud providers for more seamless major version upgrades.


MongoDB and Postgres are like apples and oranges tho.

I'm not gonna choose MongoDB if I need a relational model… even if it offers zero downtime upgrades out-of-the-box.


You might choose Postgres with JSON as an alternative to MongoDB though. There are plenty of people pushing the limits of MongoDB who are researching it if not just for access to a larger pool of DBAs who can work for them.


Logical replication across major releases for seamless upgrades has been supported and documented since pgSQL 10.


Migrations are quite highly prioritized in PostgreSQL. PostgreSQL has the by far best migration support of any database I have worked with. There is of course a lot of work left to do but that is true for many other areas in PostgreSQL.

Also I can't agree at all with "nobody coded it", patches in this area generally welcome. My first real patch for PostgreSQL was about improving migrations. And there has been talk about several of the ideas you propose, but nobody is working on any of them right now. So I mostly think it is a lack of resources in general.


Sorry, I didn't mean offense. What I meant with "nobody coded it" is that the migration DX features that don't exist yet, likely don't exist simply because they haven't been made yet (and not because eg they're architecturally impossible or because the postgres team are stupid or sth).

Its hard to complain about OSS without attacking the creators, I tried to do that right buy clearly I failed nevertheless :-) Thanks for your contributions!


[flagged]


What does SJP mean?


"Social Justice" something, probably, like in "SJW" = "Social Justice Warrior".

Some people see politeness or compassion as weakness, and ttherefore use terms acknowledging it as pejoratives. The funny thing is, they think this makes objects of derision of their targets, and don't realise that it's themselves it does so.


Ah, I suppose that makes sense. Its weird to me how 'Social Justice Warrior' is derogatory.. because it puts you against people fighting for justice. I get the mind games (I think) you have to play to arrive at the conclusion that SJW is a good insult but its got this "are we the baddies?" energy, to me.


DDL migrations are amazing. Migrations between different postgres versions not so much.


I love postgresql, as long as I don’t have to do upgrade. I have yet to see a successful zero downtime upgrade. That being said, other databases aren’t that much better. Maybe except SQLite.


> far best migration support of any database I have worked with

BS, have you never worked with mysql or sqlite?


MySQL migrations are beyond a joke.

It’s absolutely impossible that you’re not trolling with such a statement.


Copy/hardlink the data-files and start the new engine instead have to make a dump and import it?

It's a joke that you have to make a offline dump and import for upgrades with pgsql.



Works one out of three without hiccups.


Sounds like a fun bug then.

I’ve never had it fail for me. :/


They are slowly getting there. For example, postgres 12 added REINDEX CONCURRENTLY. Under the hood it's just recreating the index and then name swapping and dropping the old one. Basically what pg_repack was doing.

There's a huge collection of tricks out there that just need to become formal features. The trick I'm working with today is adding a check constraint with NOT VALID and then immediately calling VALIDATE because otherwise it takes a very aggressive lock that blocks writes. That could easy become ALTER TABLE CONCURRENTLY or something.


Do you know a good resource with these tricks? I often struggle to predict exactly which schema migrations will lock tables aggressively and what the smartest workaround is.


We've had good success with https://github.com/sbdchd/squawk to lint migrations. It tells you if a query is going to lock your table (as long as it's written in SQL, not some ORM DSL)


An orthogonal migration issue which I'm hitting right now: we need to migrate from heroku postgres to aws rds postgres, and I'm stressed about the risk and potential downtime in doing so. If there was a way to make a replica in rds based on heroku, promote the rds replica to be the primary, hard switch our apps over to rds, that'd be a lifesaver.

I'm working through this blog post [1] now, but there is still a bit to be defined (including a dependency on heroku's support team) to get this rolling.

Why the migration is required? Heroku postgres doesn't support logical replication, and logical replication is required for any ELT vendor (Fivetran, Stitch, Airbyte) to use Change Data Capture to replicate data from postgres to snowflake (with replicating deleted rows efficiently).

Note: I've also read this ebook [2], but this approach requires downtime.

Note 2: I reached out to heroku support and asked if logical replication was on their short term roadmap. They said they've heard this quite a bit, but nothing tangible is on the roadmap.

If anyone has any thoughts on the above migration, I'd be all ears. :)

1) https://vericred.com/how-we-migrated-a-1tb-database-from-her...

2) https://pawelurbanek.com/heroku-migrate-postgres-rds


I did that exact migration. Unfortunately, to my knowledge, there's no way to do it with zero downtime. You need to make your app read only until the RDS instance has ingested your data, then you can cut over. For me, that was roughly one gigabyte of data and took about forty seconds.

My best advice is to automate the whole thing. You can automate it with the Heroku and AWS CLIs. Test on your staging site until you can run through the whole process end to end a few times with no interruptions.


Yep, absolutely garbage that these clouds (Azure is another one) don't allow you to replicate with external systems. Pretty much devalues their entire hosted postgresql offering if you ask me, since it's just designed to keep you locked in (duh).

If you have any significant amount of data where you're worried about a migration, stay far away from hosted postgres offerings. You'll never get your data out without significant downtime.


There are other ways to handle this at the application level, to be clear, using dual read & write and backfill. More relevant when you have TB+++ of data.


Interesting. I've done dual-writes at the application level to migrate the datastore for a smaller feature (branch by abstraction), but never for an entire application. And the code path was quite simple, so it was easy to think about all of the edge cases at one time in your head.

Do you have any resources which talk through the read/write/backfill approach?

Here's what I found so far: * https://medium.com/google-cloud/online-database-migration-by... * https://aws.amazon.com/blogs/architecture/middleware-assiste...


Jumping in again... Your post reminded me that I actually typed my migration up!

https://mattbasta.medium.com/migrating-from-heroku-to-aws-6d...

Hopefully it's somewhat helpful!


Thank you for this - extremely helpful in validating the current approach and de-risking the developer time.


So, basically, Postgres would have a replication port which can be used for both replication/clustering and transfer across cloud providers. And sharding. </dreaming>


I mean, it essentially does. Heroku's managed postgres has it disabled.


We've moved a number of customers from Heroku over to Crunchy Bridge with essentially no down time, am currently helping one customer with 7TB through that process. It's not over to RDS, but would be happy to talk through process if helpful. And we do support logical replication and have many people using wal2json/logical replication with us.


> Why the migration is required? Heroku postgres doesn't support logical replication

You could possibly hack together some form of higher-layer logical replication via postgres_fdw and database triggers. A comment ITT references this as a known technique.


One possible solution for the ETL stuff might be to use Heroku Kafka for the Change Data Capture and then from that Kafka you can move it someplace else.

See https://blog.heroku.com/streaming-data-connectors-beta Heroku's own Kafka seems to have slightly more native support than if you use a 3rd party like Confluence.

We've not yet tried any of this, but it's been bookmarked as a possible solution to explore.


Interesting that you bring this up. I looked into heroku's streaming connectors to facilitate an integration with materialize.com, but Heroku's support team wasn't confident we could sync all 187 postgres tables under 1 connection.

I thought about using Debezium and Kafka to roll my own micro-batch ETL solution, but listening to this podcast made me walk away slowly: https://www.dataengineeringpodcast.com/datacoral-change-data...


Interesting, what was it from that podcast that made you reconsider? Always eager to learn about opportunities for improving the experience of using Debezium.

Disclaimer: I work on Debezium


Oh wow, by "work on" you mean "the core maintainer of". Thank you for replying. :)

The main part I reconsidered based on was the level of effort taking the data from kafka and landing into snowflake, especially around handle postgres schema changes safely. I also have no experience with kafka, so I'd be out of my depth's pretty quickly for a critical part of the architecture. He also expressed the need for building quality checks into the kafka to snowflake code, but those details were a bit sparse (if i recall correctly).

Note: all of the above are probably outside the scope of debezium. :)

Note 2: your article [1] on using cdc to build audit logs w/ a "transactions" table blew my mind. Once I listened to your data engineering podcast interview [2], I knew there was some implementation of "event sourcing lite w/ a crud app" possible, so I was excited to see you had already laid it out.

1) https://debezium.io/blog/2019/10/01/audit-logs-with-change-d...

2) https://www.dataengineeringpodcast.com/debezium-change-data-...


Gotcha, yeah, there's many things to consider indeed when setting up end-to-end pipelines. Thanks for the nice feedback, so happy to hear those resources are useful for folks. As far as event sourcing is concerned, we got another post [1] which might be interesting to you, discussing how "true ES" compares to CDC, pros/cons of either approach, etc.

[1] https://debezium.io/blog/2020/02/10/event-sourcing-vs-cdc/


We faced this migration, too. My sympathies.

Adding to your list of options that still require _some_ downtime: we used Bucardo [0] in lieu of logical replication. It was a bit of a pain, since Bucardo has some rough edges, but we made it work. Database was ~2 TiB.

[0] https://bucardo.org/


When you subscribe to managed services instead of running the software yourself, these are the kinds of trade-offs that get made


logical replication, but this is one of the walls that heroku creates.


Coming from the outside, with zero understanding of the internal details, my hunch is the same: lack of support for logical replication is more of a business decision than a technical decision. (But again, this a hunch -- partially based on how good heroku is from a technical perspective)


It's absolutely an evil business decision, and all the clouds are playing this game. Don't ever use a hosted database solution if you're thinking about storing any significant amount of data. You will not be able to get it out without downtime.


It looks like gcp supports logical replication now: https://cloud.google.com/blog/products/databases/you-can-now...


The big question is: Can you enter arbitrary IP addresses, or do you have to replicate to another GCP instance? Azure does the latter.



> I feel like there's a certain level of stockholm syndrome in the sense that to PG experts, these things aren't that painful anymore

I don't know if I’m a PG expert, but I just prefer “migration tool” to be a separate thing, and for the DB server engine to focus on being an excellent DB server engine, with the right hooks to support a robust tooling ecosystem, rather than trying to be the tooling ecosystem.


As a developer I fully support the notion of splitting the tools out from the server engine, like things are today.

But, realistically, pg_upgrade's functionality would need to be integrated into the server itself if we're ever going to have zero-downtime upgrades, right?

I don't know how other RDBMSs handle this, if at all


Yeah, the comment I was responding to addressed two different kinds of migration—schema migration and version upgrades—and my comment really applies more to schema migration than version upgrades; more support for smoothing the latter in the engine makes sense.


Agreed: Postgres' transactional schema migration is freaking sublime.

I used and abused it pretty hard at my previous gig and now it's hard to imagine ever living without it.

At my gig before THAT, we had MySQL and schema migrations were so very very painful.


> If I may hijack the thread with some more general complaints though, I wish the Postgres team would someday prioritize migration.

A thing I'm interested in is a 'simple' replication setup to reduce single points of failure. We currently use Galera with My/MariaDB/Percona and it's quite handy for HA-ish needs: we can have two DBs and the garbd running on the web app server.

Pointers to tutorials for Debian/Ubuntu to accomplish something similar would be appreciated. (We run things on-prem in a private cloud.)



There's no such thing as "simple" when it comes to HA setups, the requirements are simply too varied. PostgreSQL has great documentation for their HA featureset, but when it comes to systems-level concerns (detecting that a primary is down and arranging promotion of a replica to primary) you're expected to address those on your own.


With our Galera setup we have a keepalived health check look at the local system, and if it fails/times out it stops sending heart beats so the other sides takes over the vIP. If one system crashes the vIP fails over as well.


Doesn’t PG already support inplace version upgrade?

Also PG is one of the few that support schema/DDL statements inside a transaction.


"one of the few" is a pretty low bar though, I don't know a DB that doesn't suck at this.


Or maybe it is you who are underestimating the technical complexity of the task? A lot of effort has been spent on making PostgreSQL as good as it is on migrations. Yes, it is not as highly prioritized as things like performance or partitioning but it is not forgotten either.


My complain would be that there is no standard multi-master solution for postures, whereas mysql now has group replication as a native multi-master solution.


There’s a Ruby gem called strong_migrations that does this. It’s fantastic and I include it on all of my Ruby projects.


The gem does not solve these issues, merely tells you about them (and even then it can't catch all of them, only the ones that Rails defines).


It gives you mitigation strategies.

The mere presence of it on the dev workflow keeps your developers thinking about these types of issues as well, which goes a long way.


PostgreSQL is one of the most powerful and reliable pieces of software I've seen run at large scale, major kudos to all the maintainers for the improvements that keep being added.

> PostgreSQL 14 extends its performance gains to the vacuuming system, including optimizations for reducing overhead from B-Trees. This release also adds a vacuum "emergency mode" that is designed to prevent transaction ID wraparound

Dealing with transaction ID wraparounds in Postgres was one of the most daunting but fun experiences for me as a young SRE. Each time a transaction modifies rows in a PG database, it increments the transaction ID counter. This counter is stored as a 32-bit integer and it's critical to the MVCC transaction semantics - a transaction with a higher ID should not be visible to a transaction with a lower ID. If the value hits 2 billion and wraps around, disaster strikes as past transactions now appear to be in the future. If PG detects it is reaching that point, it complains loudly and eventually stops further writes to the database to prevent data loss.

Postgres avoids getting anywhere close to this situation in almost all deployments by performing routine "auto-vacuums" which mark old row versions as "frozen" so they are no longer using up transaction ID slots. However, there are a couple situations where vacuum will not be able to clean up enough row versions. In our case, this was due to long-running transactions that consumed IDs but never finished. Also it is possible but highly inadvisable to disable auto-vacuums. Here is a postmortem from Sentry who had to deal with this leading to downtime: https://blog.sentry.io/2015/07/23/transaction-id-wraparound-...

It looks like the new vacuum "emergency mode" functionality starts vacuuming more aggressively when getting closer to the wraparound event, and as with every PG feature highly granular settings are exposed to tweak this behaviour (https://www.postgresql.org/about/featurematrix/detail/360/)


> Also it is possible but highly inadvisable to disable auto-vacuums.

When I was running my first Postgres cluster (the reddit databases), I had no idea what vacuuming was for. All I knew was that every time it ran it slowed everything down. Being dumb, I didn't bother to read the docs, I just disabled the auto vacuum.

Eventually writes stopped and I had to take a downtime to do a vacuum. Learned a few important lessons that day. I also then set it up to do an aggressive vacuum every day at 3am, which was the beginning of low traffic time, so that the auto-vacuuming didn't have as much work to do during the day.


Everytime I've seen people having "vacuuming too expensive" problems, the solution was "more vacuum"!


"vacuum during the times when I want instead of randomly at peak traffic"


As it turns out, yes!


> Each time a transaction modifies rows in a PG database, it increments the transaction ID counter.

It's a bit more subtle than that: each transaction that modifies, deletes or locks rows will update the txID counter. Row updates don't get their own txID assigned.

> It looks like the new vacuum "emergency mode" functionality starts vacuuming more aggressively when getting closer to the wraparound

When close to wraparound, the autovacuum daemon stops cleaning up the vacuumed tables' indexes, yes. That saves time and IO, at the cost of index and some table bloat, but both are generally preferred over a system-blocking wraparound vacuum.


What is wrong with using a 64 bit, or even 128 bit transaction id?


It would increase disk usage by a significant amount, since transaction IDs appear twice in tuple headers (xmin/xmax). Essentially they are overhead on every database row. This submission has a discussion on it: https://news.ycombinator.com/item?id=19082944


I wonder how does MS SQL work differently.


By default MS SQL uses pessimistic locking, depending on isolation levels. There's only one version of the data on disk and the isolation level of a transaction determines what happens - for example if a transaction in SERIALIZABLE reads a row SQL Server takes a shared read lock on that row preventing any other transaction from writing to it.

MS SQL also has snapshot (and read committed snapshot) isolation levels. These are much more like the Postgresql isolation levels - in fact Postgres only has two 'real' isolation levels, read committed and serializable, you get upgraded to the next higher level as permitted in the spec.

In snapshot isolation instead of taking a lock SQL Server copies the row to a table TempDB when it would be overwritten, additionally it adds a 14-byte row version to each row written. There's a lot of detail here: https://docs.microsoft.com/en-us/sql/relational-databases/sq...

This is also why MS SQL maintains a clustered index on the main table - the main table only contains the latest globally-visible version, so it can be sorted. Postgres stores all versions (until vacuum removes dead rows), so the main table is a heap, only indexes are sorted.


It has a similar concept if you need MVCC (with InnoDB). It also has a concept of transaction IDs. And also need to clean them up (purge). They will both have table bloat if not done.

Since details matter, there's a post that explains it far better than I could:

https://www.enterprisedb.com/blog/mysql-vs-postgresql-part-2...


I asked about MS not My though.


Yeah, making this not be optional is the issue. Only some situations don't get by with 32 bit txids, but imposing the cost on everyone would be bad.

Oh, and C codebases make such changes far harder than more rigid newtyping in e.g. Rust, which is why I assume no one made the necessary patches yet.


I believe mostly how much code needs to be changed, which they are working towards slowly, but there is more overhead (memory / disk) associated with those larger data types which are used everywhere:

https://wiki.postgresql.org/wiki/FullTransactionId


SRE?



Thank you for this explanation!


Once again, thanks to all the contributors that provided these awesome new features, translations and documentation.

It's amazing what improvements we can get through public collaboration.


If you want to test the new features on a Mac, we've just uploaded a new release of Postgres.app: https://postgresapp.com/downloads.html


I love this app on Mac, but I wonder if there is a similar app for Windows (i.e. portable Postgres)?


This app is amazing. Highly recommend it.


I know this isn't even a big enough deal to mention in the news release, but I am massively excited about the new multirange data types. I work with spectrum licensing and range data types are a godsend (for representing spectrum ranges that spectrum licenses grant). However, there are so many scenarios where you want to treat multiple ranges like a single entity (say, for example, an uplink channel and a downlink channel in an FDD band). And there are certain operations like range differences (e.g. '[10,100)' - '[50,60)'), that aren't possible without multirange support. For this, I am incredibly grateful.

Also great is the parallel query support for materialized views, connection scalability, query pipelining, and jsonb accessor syntax.


Multiranges are one of the lead items in the news release :) I do agree that they are incredibly helpful and will help to reduce the complexity of working with ranges.


Suppose I had a "friend" with a PostgreSQL 9.6 instance (a large single node)... what's the best way to upgrade to PostgreSQL 14?


If you're on a single node, probably something like this. This is what my "friend" is doing later this week. He's migrating from 9.5 to 13:

    sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
    wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
    sudo apt-get update
    sudo apt install postgresql-13
    sudo service postgresql stop
    sudo mkdir -p /secure/pgsql13/data
    sudo chown -R postgres:postgres /secure/pgsql13/data
    sudo -u postgres /usr/lib/postgresql/13/bin/initdb -D /secure/pgsql13/data
    sudo rm -rf /var/lib/postgresql/13/main
    sudo ln -s /secure/pgsql13/data /var/lib/postgresql/13/main
    sudo chown -hR postgres:postgres /var/lib/postgresql/13/main
    sudo ln -s /etc/postgresql/9.5/main/postgresql.conf /var/lib/postgresql/9.5/main/postgresql.conf
    sudo chown -hR postgres:postgres /etc/postgresql/9.5/main/postgresql.conf
    cd /tmp
    sudo -u postgres time /usr/lib/postgresql/13/bin/pg_upgrade --old-bindir /usr/lib/postgresql/9.5/bin --new-bindir /usr/lib/postgresql/13/bin --old-datadir /var/lib/postgresql/9.5/main --new-datadir /var/lib/postgresql/13/main --link --check
    sudo -u postgres time /usr/lib/postgresql/13/bin/pg_upgrade --old-bindir /usr/lib/postgresql/9.5/bin --new-bindir /usr/lib/postgresql/13/bin --old-datadir /var/lib/postgresql/9.5/main --new-datadir /var/lib/postgresql/13/main --link
    sudo service postgresql start 13
    sudo -u postgres "/usr/lib/postgresql/13/bin/vacuumdb" --all -j 32 --analyze-only
    sudo -u postgres ./delete_old_cluster.sh


> wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -

Is there a reason to do this rather than putting the key in /etc/apt/trusted.gpg.d/?


Putting the key in /etc/apt/trusted.gpg.d/ gives it too much power over all the repositories. A malicious (maybe compromised) third party repository could publish a package that replace an official Debian package.

So trusted.gpg.d/ is not the recommended method. For more information, see the official Debian wiki which states "The key MUST NOT be placed in /etc/apt/trusted.gpg.d" https://wiki.debian.org/DebianRepository/UseThirdParty


Actually it states:

> The key MUST NOT be placed in /etc/apt/trusted.gpg.d or loaded by apt-key add.

And yet the snippet that I quoted has the latter command.

Further: the link has the example "[signed-by=/usr/share/keyrings/deriv-archive-keyring.gpg]". Perhaps it's my BSD upbringing showing through, but shouldn't only/mostly OS-provided stuff generally live in /usr/share? Shouldn't locally-added stuff go into /usr/local/share? Or perhaps creating a /etc/apt/local.gpg.d would be appropriate?


The idea is you put the keyring into the same place where the -keyring package will be installed, so the -keyring package will overwrite it, so you don't then have a temporary keyring you downloaded with wget to delete, because it was overwritten by the -keyring package.


Unless you are auditing all third-party packages before installing them, any package can modify /etc/apt/trusted.gpg.d/ by dropping a file in there, or from its postinst/etc scripts at installation time etc. So using the signed-by mechanism isn't much of a security enhancement without the auditing.


And while you're there, pay attention to the signed-by option. It's what allows you (the admin) to limit APT's trust in the keys you add.


I second this, upgraded 100 odd instances using pg_upgrade. And it's lightning fast, takes less than a min whether the db cluster is a couple GB or a few TB. Just make sure that you always run the check mode to catch incompatibilities between versions.


I think you can use -o '-D /etc/postgresql/9.5/main' -O '-D /etc/postgresql/13/main' to avoid making the symlink to postgresql.conf. This should also make it easier if postgres.conf tries to includes files from a conf.d subdirectory.


Looks like the best approach might be to use in-place upgrade tool that ships with Postgres to upgrade to v10 (use Postgres v10 to perform this upgrade). From there you'd be able to create a fresh v14 instance and use logical replication to sync over the v10 database. Before briefly stopping writes while you swap over to the v14 database.

EDIT: Looks like pg_upgrade supports directly upgrading multiple major versions. So maybe just use that if you can afford the downtime.


Why would you first upgrade to PG 10?


Native logical replication (making it possible to upgrade without downtime) was introduced in pgSQL 10. But if you're going to have downtime anyway, there's no reason not to do the upgrade in a single step. pg_upgrade should support that quite easily.


also, logical replication has some issues that make it not necessarily perfect for cross version migration: schema changes and sequence usages are not synced.

While schema changes are probably not much of a problem because that's something you can easily prevent during migration, sequence usage is because once you fail over to the logically replicated secondary, you will have to reset all sequences to their current values or all further inserts into tables with sequences will fail.

the other option, of course is to not use sequences but rather use UUIDs for your primary keys, but those have their other collection of issues


Because version below 10 don't support logical replication. The alternative would be to use a 3rd party extension for replication, which may well be a good option but I don't have any experience with that so I can't really comment.


With downtime, I guess the pg_upgrade tool works fine.

Without downtime / near-zero downtime is more interesting though. Since this is an old version, something like Bucardo maybe? It can keep another pg14 instance in sync with your old instance by copying the data over and keeping it in sync. Then you switch your app over to the new DB and kill the old one.

Newer versions make this even easier with logical replication - just add the new DB as a logical replica of the old one, and kill the old one and switch in under 5 seconds when you're ready.


I think I remember a talk where someone manually set up logical replication with triggers and postgres_fdw to upgrade from an old server with zero downtime.


pg_upgrade has a `link` option that when used reduces the time take to ~15 seconds.

IMO, that counts as "near-zero downtime".


Sort of, yeah. I usually consider zero-downtime to be ‘imperceptible to users’. This option is interesting, though, first I’ve heard of it. Thanks!


If you can afford a short downtime I would recommend just running pg_upgrade and upgrade directly to 13. Preferably test it out first before doing it in production.


Pglogical, replicate to a new database server. There will be minimal downtime and it allows you to test the process thoroughly.


Yeah this is how we do it... Even with terrabytes of data it seems to be pretty efficient and robust, and easy to just keep up the replicating version until you're satisfied it all looks good.


Depends on whether you can shut down the server for some time or not. If you can live with some downtime, just use pg_upgrade if you install the new server on the same machine: https://www.postgresql.org/docs/current/pgupgrade.html


We are currently using pg_upgrade to go from 9.6 to 13 in our systems. its supported and works well. We are using hte 'hardlinks' feature that makes it extremely fast as well.


Do be aware that using the "hardlinks" feature will mean that restarting the old database version will not work properly and may not be safe. You should make sure to copy the old directory beforehand if you might need that, or simply restore from backup.


I'd argue you shouldn't be performing an upgrade at all without a proper backup. But yes, absolutely do not use a hardlinks upgrade unless you have a backup laying around.


This is one of the major pain points with PostgreSQL. Unless you absolutely need any new features and uptime is important, you can just continue using PostgreSQL 9.6 even thought it is EOL. https://www.postgresql.org/support/versioning/ It will most likely work great for many more years.

I wish future versions of PostgreSQL will have some backwards compatibility for old system/data tables/datastructures and be able to do live migration when running a newer release.

pg_upgrade will not work if the internal data structure for your data changes. It only recreates the system tables. https://www.postgresql.org/docs/14/pgupgrade.html


This is not really true for a number of reasons. pg_upgrade is absolutely the preferred method if you are able to spare small downtime. It will absolutely work to upgrade your existing data structures to be compatible with the new Postgres version, so not sure what you even mean.

It doesn't automatically re-create indexes (to take advantage of new features) for example, but that is likely something you don't want to do right away (incurring extra downtime) when you are doing an upgrade anyways. You can easily just REINDEX after the upgrade is complete.


If you read the link about pg_upgrade, you're not guaranteed that pg_upgrade will work if the underlying data storage format changes. pg_upgrade may not work in a future version of PostgreSQL. It works for version 14 though.


I converted from MySQL (before whole MariaDB and fork), and I've been happier with every new version. My biggest moment of joy was JSONB and it keeps getting better. Can we please make the connections lighter so that I don't have to use stuff like pgbouncer in the middle? I would love to see that in future versions.


FWIW Mysql 8 has gotten a lot better in standards compliance and ironing out legacy quirks, with some config tweaks. While my heart still belongs to PostgreSQL things like no query hints, dead tuple bloat (maybe zheap will help?), less robust replication (though getting better!), and costly connections dampens my enthusiasm.


> maybe zheap will help?

According to Robert Haas, the zheap project is dead.


Where did Robert Haas say this? Quick search didn't surface much, except for a blog post from July by Hans-Jürgen Schönig: https://www.cybertec-postgresql.com/en/postgresql-zheap-curr... Doesn't sound like they discontinued the project.


I'd love to see that info, been interested in why it was taken up by Cybertec and EDB seemingly dropped it.


There has been a lot of improvement in this release.

https://www.depesz.com/2020/08/25/waiting-for-postgresql-14-...


The benchmark linked in the comments shows 7-38% improvements. Nothing to scoff at, but if you need PGBouncer that probably won't make enough of a difference.


It certainly isn't much of an improvement in connection latency (the connections are still pretty heavy), but it is a massive improvement in transaction throughput with higher numbers of connections. If you scroll down a bit, there is now higher TPS even at 5000 connections than previously could be had at 100 connections. That fixes a massive amount of pain that previously only could be solved with pgbouncer.


I'd be curious to see if the concurrency improvements in PostgreSQL 14 help with increasing the threshold for when you need to introduce a connection pooler.


Check out the Azure team's benchmarks. Pretty damn impressive.

https://techcommunity.microsoft.com/t5/azure-database-for-po...


Lighter connections would finally allow for using lambda functions that access a Postgres database without needing a dedicated pgbouncer server in the middle.


Yes, but this patch does not make startup cheaper, it only decreases the performance cost of concurrent open connections.


Not sure if you’re using this, but AWS has RDS Proxy as a service, in case you’re hosting your own


PostgreSQL is one of those tools I know I can always rely on for a new use-case. There are very few cases where it can't do exactly what I need (large scale vector search/retrieval).

Congrats on the 14.0 release.

The pace of open source has me wondering what we'll be seeing 50 years from now.


I recall seeing some library that adds vector search to Postgres. Maybe https://github.com/ankane/pgvector?

Also there's Pinecone (https://www.pinecone.io) which can sit alongside Postgres or any other data warehouse and ingest vector embeddings + metadata for vector search/retrieval.


ElasticSearch has "dense_vector" datatype and vector-specific functions.

  https://www.elastic.co/guide/en/elasticsearch/reference/current/dense-vector.html
  https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-script-score-query.html#vector-functions
ZomboDB integrates ElasticSearch as a PG extension, written in Rust:

  https://github.com/zombodb/zombodb
I dunno what exactly a "dense_vector" is, but if you can't use the native "tsvector" maybe you could use this?


I think a dense vector is the opposite of a sparse vector

i.e. in a dense vector every value in the vector is stored

whereas sparse vectors exist to save space when you have large vectors where most of the values are usually zero - they reconstruct the full vector by storing only the non-zero values, plus their indices


  "a dense vector is the opposite of a sparse vector"
I think there's another thing besides vectors that are a bit dense in the room here, eh? Yeah that makes sense hahaha -- thank you.


ZomboDB has always seemed really interesting to me.

Elastics search, with PostgreSQL's everything else sounds like a perfect match, but I've never seen it used anywhere which is kind of spooky to me.

Do you know of any non-toy use cases I can look at?


You can ask Eric Ridge on the ZomboDB Discord, he's really nice:

https://discord.gg/hPb93Y9

Also probably one of the most knowledgeable people about both Postgres and Rust I've ever met.

He makes his living primarily from supporting enterprises using ZomboDB (I think), not sure how much he can say, but it's worth asking.


You're not kidding, they were incredibly friendly and helpful.

Makes me wish I had a big deployment so I could justify sponsoring them.

I paid in cat pictures for now.


I need a self managed solution, so I'm not sure Pinecone is feasible and I don't think pgvector scales well enough for my use-case (hundreds of millions of vectors).

So far I think I'm going to go with Milvus[1], ideally I'd just have a foreign data wrapper for Milvus or FAISS.

[1] https://github.com/milvus-io/milvus


Fantastic piece of software. The only major missing feature that I can think of is Automatic Incremental Materialized View Updates. I'm hoping that this good work in progress makes it to v15 - https://yugonagata-pgsql.blogspot.com/2021/06/implementing-i...


This has been a major one i've wanted for a long time too, but sadly it's initial implementation will be too simple to allow me to migrate any of my use cases to use it.


This looks like an amazing release! Here are my favorite features in order:

• Up to 2x speed up when using many DB connections • ANALYZE runs significantly faster. This should make PG version upgrades much easier. • Reduced index bloat. This has been improving in each of the last few major releases. • JSON subscript syntax, like column['key'] • date_bin function to group timestamps to an interval, like every 15 minutes. • VACUUM "emergency mode" to better prevent transaction ID wraparound


Somewhat related, but does anybody have suggestions for a quality PostgreSQL desktop GUI tool, akin to pgAdmin3? Not pgAdmin 4, whose usability is vastly inferior.

DBeaver is adequate, but not really built with Postgres in mind.


I've been trying out Beekeeper Studio [https://www.beekeeperstudio.io/] recently, and like that it supports both MySQL and PostgreSQL (and others I don't use). The interface takes a little bit to get used to, but it's been pretty powerful for me.

Before that, or when I'm in a rush to just get something done as I adjust to Beekeeper, I use Postbird (an Electron app) [https://github.com/Paxa/postbird]


Beekeeper Studio maintainer here. It's 100% FOSS and cross platform (Windows, MacOS, Linux).

I love Postgres and started Beekeeper because I wanted 'Sequel Pro for PSQL on Linux'. We have a lot of users so seems like we're doing something right.

Very happy for new PSQL.

If you have questions, let me know!


Thank you for your work on it! I used Sequel Pro for a long time for MySQL databases and occasionally use its successor Sequel Ace as well, but I'm moving my workflow to Beekeeper Studio as much as I can.


Glad to hear! Please file issues if anything is confusing or missing, it's a pretty friendly GitHub community.


In addition to the excellent Beekeeper, you might want to check out https://dbeaver.io/.

And azure data studio now has some pg support: https://docs.microsoft.com/en-us/sql/azure-data-studio/exten...

Personally I've landed on just using pgcli.com (and pg_dump etc).


I pay for jetbrains datagrip, worth every penny.


Seconded. DataGrip is terrific and supports every database type I have ever come into contact with. And it's all JDBC-based so you can add new connectors pretty easily (from within the app, no less. No fiddling with files necessary). I had to do that to do help on a proposal a few years ago for a project that had a Firebird database and Datagrip didn't natively support it.


Going to second this, however I will warn, at least in my experience it is a little bit different from most DB IDEs. I didn't like it at all first time I used it, then a friend told me to give it another try. I've never looked back, fantastic tool.


One of my coworkers uses datagrip. Needing to install mysql specific tooling so that they can take a full database dump is kind of frustrating. Many other tools can do it out of the box, why not datagrip?


PgModeler [0], cannot recommend it enough. You have to compile yourself the free version (it's open source - you pay if you want to directly download the precompiled binaries) and it's a bit of work if you want to include the query optimizer plugin, but there's documentation and GitHub issues with solutions already in place. Once you compile it for the first time and start using it, you'll keep doing it again with each new version. I haven't found a better Postgres tool yet.

edit: it's a cross-platform tool, supporting Linux, Mac and Windows (32/64)

[0]: https://pgmodeler.io/


I really like the diff function to generate migrations. Make changes to your model and generate a sql file that syncs your database to the model.


Especially cool that you can buy a license using Bitcoin! Do you know if the backup/restore functionality is reliable?


Commercial, but worth every Penny and then some in my opinion: https://tableplus.com/

It looks good, performs very well, and supports WAY more than just pg. It’s an indispensable part of my dev tooling. Bought it what last year, year before maybe? Very happy customer here, zero problems, all gravy.


How does it compare to Postico for working with PG databases? TablePlus seems very similar to Postico, but with support for other DBs, which we don’t need.


Tableplus is fantastic software. I have it open all day every day.


I like Postico, but it is mac-only and not free.


Ah, I should have clarified that I’m using Linux and Windows.


Postico is my favorite desktop software of any kind



I second Postico on Mac. I usually recommend TablePlus for other platforms.


Interesting. I’m currently using Postico on Mac, but considering TablePlus. Why do you find Postico superior?


I've been using SQL Workbench/J [https://www.sql-workbench.eu/] for quite a while now. Uses JDBC so it'll connect to anything, good SQL formatter, builtin scripting commands, support for XLS import/export, headless mode, and most importantly fast even when loading massive result sets.


Happy with dbeaver, what are its shortcomings re pg?


I guess I’m still getting used to it, but it always takes a few extra steps to open things like views and functions, autocomplete needs configuring to work properly, and a couple of other features I’m blanking on at the moment. It’s all small stuff, but it can add up.


Yes, they are in separate folders, but don't think it has to do with PG per se. My autocomplete works well, don't think I did anything special. Maybe added a schema to the "search path?" May have been pgcli (which I also recommend).


Yeah, ultimately it’s the best alternative I’ve used so far, and it also supports other languages, so I’m likely to continue using it.


When did you last use pgadmin4? It recently went through a big changes, in my opinion it's the best client for PostgreSQL.


A few months ago. I had semi-frequent troubles with starting up and freezing (on both Linux and Windows).

I would also easily make accidental GUI changes that I could only revert by reloading original settings.

A less galling example was the introduction of a highlight that marked the line in a query where an error occurred. It was a bright blue color, hard on the eyes, almost completely obscuring the text. It’s a comparatively minor issue, but illustrative of how the tool’s usability was steadily declining.


https://dbeaver.io/ is pretty good IMO.


There’s plenty of GUI tools, but what about tools for writing SQL inside VS Code?

We usually write our DB migrations in VS Code along with any other code changes, but the PG support in VS Code seems to be lacking. Just some naïve validation of PostgreSQL code inside VS Code would be awesome!


I used pgsanity at the CLI for a while when I unfortunately had coding and testing split between two machines. Can be used with any editor. https://github.com/markdrago/pgsanity


Interesting, thanks! I wonder how easy it would be to write a VS Code plug-in using that.


If you can run a linter, you can run it as well.


Ah, good point! I might try to create an eslint rule that incorporates it.


DataGrip is cool.


pgAdmin 4 has made rapid strides. If you haven't checked it out recently (~6 mos?) you may be pleasantly surprised. Many of the rough edges have been sanded off.


Interesting, it’s been roughly that amount of time since I’ve last used it. I might check it out again.


It's still not the most polished thing in the world, but few IDE-ish tools are... particularly free ones.

I've used a lot of very very expensive ones that aren't as polished as pgAdmin4.

The whole "native app with a web interface" thing is... probably not anybody's idea of "ideal," but I 100% respect it. It's free and it's multiplatform. Choices and compromises were made to achieve that and I don't know that I would have done differently when faced with the same constraints.


For Windows https://www.heidisql.com/ is great


Yeah, that is a good one. It’s lightweight and supports a lot of languages, but it lacks some of the nicer features of Postgres-specific tools. I think I had difficulty listing things by schemes.


TablePlus


The best is psql, really, learn to use it.


TablePlus is the only thing I've used cross-platform that is even close to Postico. Postico is amazing.


Interesting. I’m currently using Postico on Mac, but considering TablePlus. Why do you find Postico superior?


Sorry for the late response, missed this somehow. I'm realizing as I try to write this that it is a bunch of small things, and maybe it boils down to having used it a long time or it being my first non-garbage (SSMS, PGAdmin) UX db tool. The autocomplete seems to work exactly as I'm expecting, the filtering seems to work a little better than TablePlus. Copying and pasting is a bit more consistent (I do this a lot while dev'ing). The whole UI is lightning fast and always stays responsive.

That all being said, TablePlus is 80 - 90% of that and the fact that it also works with multiple databases is huge. I also love the vertical tabs for multiple connections, editor sidebar pane for single records and tabbed queries vs a single query pane.

Both are fantastic, but if I was postgres only I'd probably stick with postico. As-is I just paid for both.


It's paid, but I like SQL Pro for Postgres. Simple, lightweight, etc.


I use Valentina Studio (free license). Does everything I need.


another vote for TablePlus


If you’d like to try out PostgreSQL in a nice friendly hosted fashion then I highly recommend supabase.io

I came from MySQL and so I’m still just excited about the basic stuff like authentication and policies, but I really like how they’ve also integrated storage with the same permissions and auth too.

It’s also open source so if you can to just host it yourself you stil can.

And did I mention they’ll do your auth for you?


[Supabase cofounder] thanks for the kind words.

We're about to wrap up Row Level Security on our real-time APIs too[0] - soon you'll be able to use Policies universally on all APIs.

[0] https://github.com/supabase/walrus


Congrats on an awesome product. It was exactly what I was looking for.

You also get a feeling about a company pretty quick from their docs (I think it’s the best way to judge a company) and I have to say I had the same feeling as using Twilio or Cloudflare for the first time - so you’re in good company.


I cannot even begin to tell you how excited I am for this!


Would you mind expanding on what's so appealing about Supabase (i.e. Firebase).

I feel like I live in a cave because I haven't quite understood what problem Supabase/Firebase is solving for.


[supabase cofounder] While we position ourselves as a Firebase alternative, it might be simpler for experienced techies to think of us as an easy way to use Postgres.

We give you a full PG database for every project, and auto-generated APIs using PostgREST [0]. We configure everything in your project so that it's easy to use Postgres Row Level Security.

As OP mentions, we also provide a few additional services that you typically need when building a product - connection pooling (pgbouncer), object storage, authentication + user management, dashboards, reports, etc. You don't need to use all of these - you can just use us as a "DBaaS" too.

[0] https://postgrest.org/


Thanks so much and really appreciate you taking the time to respond here.

I think it's fantastic to make deploying existing software/tools easier, and people are definitely willing to pay for the ease, curious though - what prevents the Postgres team from taking supabase contributions (since it's Apache 2.0) and including it in core Postgres?


> what prevents the Postgres team from taking supabase contributions

Absolutely nothing - we are open source and we would encourage it. However we're not modifying Postgres itself at this stage, we're just providing tooling around it. We have hired developers to work specifically on Postgres but we will always attempt to upstream (if the PG community wants it) rather than maintain a fork


as a product designer, I've been with Supabase since its inception. You guys make projects so easy to start prototyping, without me having to think about starting a Postgres droplet or whatever. Thank you so much for making Supabase better every day!!


Thanks for joining us for the ride!


All those reasons + not owned by google.

I prefer to use companies where you're using their bread and butter service. That way you know they won't suddenly lose interest in it, which google has a reputation for, especially anything to do with google cloud.

I just find what Supabase it trying to do matches up exactly with what I wanted - it solves 90% of my developer headaches and all works together nicely.

Plus I just get good vibes from Supabase in general.


Supabase looks nice. I’m also using Postgres hosted by https://nhost.io/ which also do auth and storage. It looks like Supabase is focused on REST and Nhost is focused on GraphQL?


Love the JSONB subscripts! It will be so much easier to remember! I may not even have to reference the docs!


What is JSONB... https://stackoverflow.com/questions/22654170/explanation-of-...

(Don't upvote me, I just googled it)


Now if I only could remember how to convert a JSONB string to a normal string (without quotes)...


And now the wait for RDS to support it. Thanks PG team!


Any ideas on the typical delay before its supported in RDS?


https://www.brianlikespostgres.com/rds-aurora-release-dates....

I gathered release dates a few weeks ago because I was curious. There aren't that many data points, but 150 days might be a good guess


dont quote me on this, but I think it used to be quite a while, but since 12 things have gotten a lot better. I think they generally wait for the .1 patch and then get it in pretty quick.


Postgres 13 was released 2020-09-24

13.1 was released 2020-11-12

13.2 was released 2021-02-11 https://www.postgresql.org/docs/13/release-13-2.html

AWS supported 13 as of 2021-02-24 https://aws.amazon.com/about-aws/whats-new/2021/02/amazon-rd...


Well that's a lot worse than I was expecting.


Any suggestions to learn and go deep in PostgreSQL for someone who worked mostly on NoSQL (MongoDB)?

From the few days I have explored it, it is absolutely incredible, so congratulations for the work done and good luck on keeping the quality so high!


If you are a novice or even if just a bit rusty with relational databases I recommend:

- Grab a good book or two. The Art of PostgreSQL is one. There are _many_ others. Take a bit of time to find out what suits you. I typically like reading a more theoretical one and a more practical one (with actionable, concrete advice, actual code in it and so on).

- Get a nice tool/IDE/editor plugin that can interactively work with databases, create them, query them etc. Almost all of the editor utilities that you expect from general purpose programming also applies to programming with SQL.

- PostgreSQL is a very powerful, primarily relational database, but there is very little it cannot do. For example it is perfectly feasible to implement a document (JSON) store with it, and convenient to boot. There are also great extensions such as for temporal tables.

- There is a ton of advice, discussion etc. to find, especially around data modelling. Word of caution: Some of it is religious, at least slightly. The book I mentioned above too to a certain degree, but it is still a very good book. Realize that a relational DB can give you a ton of stuff in a performant, declarative manner. But not everything should live in the DB, nor is everything relational in the strictest sense. Be pragmatic.

Source: I've been diving deeper into this topic since a few years and still am. The more I learn and are able to apply, the more I understand why the relational paradigm is so dominant and powerful.


As a young developer I stayed away from traditional relational databases because I thought they were boring and restrictive. I quickly fell in love with them. I realized they painlessly provided all the things I was already doing with data and they were doing it much faster and more reliably.

I hope you have the same kind of experience and enjoy Postgres!

As far as actual advice...

1. The key concept of an RDBMS is that each table typically specifies relationships to other tables. So in order to query data you will typically be joining multiple tables. Sometimes new DB programmers get a lot of enlightenment from the simple Venn diagrams that illustrate the basic types of JOINs: https://www.google.com/search?q=inner+join+diagram ...IMHO, once you get this (fairly simple) paradigm the rest is easy.

2. Database constraints are your friends. At a minimum you specify the types of the columns, obviously, but you can go much farther - specifying NOT NULL constraints, foreign keys, and more complex check conditions. Do as much of this as feasible at the database level. Otherwise it is something you need to code and enforce at the application level, where it will generally be less performant and more prone to coder error. Additionally, any constraints not enforced at the application level will need to be implemented across multiple applications if more than one app uses the database.

The second one above is an example of something that sounds boring and restrictive but really frees you up to do other things as a developer. About a decade ago, the "trend" was to treat RDBMSs as "dumb" storage and do all that stuff at the application level, in the name of being database-agnostic. Opinions remain divided, but I think that was an objectively bad trend. For one thing, folks noticed they hardly ever needed to suddenly switch databases, but there are other reasons as well.


Pg docs are so good I reference them whenever I want to check the SQL standards, even if I'm working on another DB. (I prefer standard syntax to minimize effort moving DBs.)

Otherwise maybe try it with a toy project.


I stick to standard SQL syntax/features whenever possible as well, but...

Honest question: how often do you switch databases?

I've never really found myself wanting or needing to do this.

Only time I could really see myself wanting to do this is if I was writing some kind of commercial software (eg, a database IDE like DataGrip) that needed to simultaneously support various multiple databases.

    > MySQL
It feels particularly limiting to stick to "standard" SQL for MySQL's sake, since they frequently lag behind on huge chunks of "standard" SQL functionality anyway. For example, window functions (SQL2003 standard) took them about a decade and a half to implement.


We got an application which has been developed for over 20 years with the mentality of "we're never switching db's".

Yet now we are, because some core customers demand MSSQL support...

Of course the db we're using supports all kind of non-standard goodness that has been exploited all over. Gonna be fun times ahead...


Yikes. Good luck with that.

(I don't mean that sarcastically. I truly wish you luck.)


I've done two moves, one from MySQL to Pg and the other from Pg to MySQL. I'm not opposed to leveraging their nonstandard parts where necessary.

Some companies end up with a mix of different DBs and it can help to consolidate to share expertise or resources.

Though at this point both have grown much closer together in capabilities and performance.


I found "The Art of PostgreSQL" quite helpful: https://theartofpostgresql.com/


I'm trying to understand if with v14 I will be able to connect Debezium to a "slave" node and not to the "master" in order to read the WAL but can't figure it out. Can someone help me with this?


I was just yesterday talking to someone about this; they mentioned that Patroni leverages some way for setting up replication slots on replicas [1]. Haven't tried my self yet, but seems worth exploring.

Something I'd like to dive into within Debezium is usage of the pg_tm_aux extension, which supposedly allows to set up replication slots "in the past", so you could use this to have seamless failover to replicas without missing any events. In any case, this entire area is of high importance for us and we're keeping an eye on any improvements closely, so I hope it will be sorted out sooner or later.

[1] https://twitter.com/cyberdemn/status/1443130986116624388 [2] https://github.com/x4m/pg_tm_aux


That extension is not safe / correct afaict.


Unfortunately that didn't make it into 14, there were too many rough edges to file off.


Whoa, I thought the proper terminology was more like primary/replica. Are we talking Postgres databases or decades-old clunky IDE hard drives?


Yeah, I should have used "primary"/"replica", I agree. Somehow this words did not came to my mind. Thanks


Both are valid. Master/slave is an older way of describing how the two are related, but most DBMSes used this terminology up until recently.


Postgres is my bread and butter for pretty much every project. Congratulations to the team, you work on and continue to improve one of the most amazing pieces of software ever created.


Congratulations and thanks to all involved! Do I understand correctly that, at this time, while PG has data sharding and partitioning capabilities, it does not offer some related features found in Citus Open Source (shard rebalancer, distributed SQL engine and transactions) and in Citus on Azure aka Hyperscale (HA and streaming replication, tenant isolation - I'm especially interested in the latter one)? Are there any plans for PG to move toward this direction?


Streaming replication is supported as per https://www.postgresql.org/docs/current/warm-standby.html#ST... . You can likely build shard rebalancing and tenant isolation on top of the existing logical replication featureset. There are some groundwork features for distributed transactions (PREPARE TRANSACTION, COMMIT PREPARED, ROLLBACK PREPARED), but they're not supported as such.


I see. Thank you for clarifications.


Does anyone know the status of the zheap project? I always hope to say news in postgres release notes, but nothing so far.


https://www.cybertec-postgresql.com/en/postgresql-zheap-curr... The last status update is from July, so seems like things are progressing.


Disappointed by the release. No big changes. Still using processes instead of threads for connections. No build-in sharding/high availability (like Sql Server Always On Availability Group). No good way to pass session variables to triggers (like username). No scheduled tasks like in MySql. Temporal tables are still not supported 10 years after the spec. is ready.


the sense of pride is palpable (and well deserved)


Can someone who uses Babelfish for PostgreSQL compatibility with SQL Server commands please describe their experience, success, hurdles, etc. We would move to PostgreSQL if made easier by such a tool. Thanks!


I haven't been able to play with it yet.

This was partly due to my lack of SQL Server projects, but mostly due to the lack of availability of the supposedly Apache-licenced sources and/or binaries (and not wanting to configure AWS for 'just testing').


It’s currently in a limited preview, not accessible outright to any AWS customer even.


Yep. I'm quite interested when the 'somewhere in 2021' will be, as the last mention of babelfish from an Amazon-employee on the mailing lists was at the end of March and I haven't heard news about the project since.


How does everyone do postgresql upgrades with the least amount of downtime?


Fastest non-intrusive way I know in RDS or any other environment which allows you to spin up a new box:

* Set up a replica (physical replication)

* Open a logical replication slot for all tables on your old master (WAL will start accumulating)

* Make your replica a master

* Upgrade your new master using pg_upgrade, run analyze

* On your new master subscribe to the logical replication slot from your old master using the slot you created earlier, logical replication will now replicate all changes that occurred since you created the slot

* Take down your app, disable logical replication, switch to the new master

You can do the upgrade with zero downtime using Bucardo Multi-Master replication but the effort required is much much higher and I'm not sure if this is really feasible for a big instance.


For Pg and MySQL I usually have to resort to replicating to a newer instance then cutting over. Tools like pg_upgrade offer promise but I rarely have the time or access to test with a full production dataset. Hosting provider constraints sometimes limit my options too, such as no SSH to underlying instances.


Bucardo is great for replication.


The query parallelism for foreign data wrappers bring PostgreSQL one step closer to being the one system that can tie all your different data sources together into one source.

Really exciting stuff.


Anyone know where to download it? Looks like they haven't updated the download pages, even though they link to them at the end of this post.



PostgreSQL is my favorite database, but I wish it was possible to use it as a library like sqlite. That would let me use it in a lot more places.


Anyone know when it will be available on Azure Flexible Server? Also, does anyone know when Flexible Server will leave Preview status?


Congrats to the team. I cannot wait to work on v14.


Here we are, at a fantastic version 14, and still no sign of an MySQL AB-like company able to provide support and extensions to a great piece of open source software. There's a few small ones, yes, but nothing at the billion dollar size.

I am still unable to understand why.


I thought this is what Enterprise DB was? Or am I misinformed? I concede that is indeed a possibility.


Which extensions do you think would be enough of a value add to support a billion dollar company?


Microsoft and Citus are "a little" over a billion?


I hope you don’t think Microsoft/Citus is going to continue to support PostgreSQL installations on anything other than Azure.

This is ask about the battle of the clouds for them.


Doesn't matter what I think, it's an option though.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: