PostgreSQL 14

skrebbel · on Sept 30, 2021

These changes look fantastic.

If I may hijack the thread with some more general complaints though, I wish the Postgres team would someday prioritize migration. Like make it easier to make all kinds of DB changes on a live DB, make it easier to upgrade between postgres versions with zero (or low) downtime, etc etc.

Warnings when the migration you're about to do is likely to take ages because for some reason it's going to lock the entire table, instant column aliases to make renames easier, instant column aliases with runtime typecasts to make type migrations easier, etc etc etc. All this stuff is currently extremely painful for, afaict, no good reason (other than "nobody coded it", which is of course a great reason in OSS land).

I feel like there's a certain level of stockholm syndrome in the sense that to PG experts, these things aren't that painful anymore because they know all the pitfalls and gotchas and it's part of why they're such valued engineers.

bmcahren · on Sept 30, 2021

We currently use MongoDB and while Postgres is attractive for so many reasons, even with Amazon Aurora's Postgres we still need legacy "database maintenance windows" in order to achieve major version upgrades.

With MongoDB, you're guaranteed single-prior-version replication compatibility within a cluster. This means you spin up an instance with the updated version of MongoDB, it catches up to the cluster. Zero downtime, seamless transition. There may be less than a handful of cancelled queries that are retryable but no loss of writes with their retryable writes and write concern preferences. e.g. MongoDB 3.6 can be upgraded to MongoDB 4.0 without downtime.

Edit: Possibly misinformed but the last deep dive we did indicated there was not a way to use logical replication for seamless upgrades. Will have to research.

TedShiller · on Sept 30, 2021

The problem with MongoDB though is that you're on MongoDB

kbenson · on Sept 30, 2021

As someone that mostly shared that opinion for the last decade or more, I recently set up a cluster for work, and everything seems much more production level quality than I remember or what I assumed it was going to be like. I'm not the one using it for queries every day, but I did do a bunch of testing for replication and failed nodes to confirm that I understood (and could rely) on the claims of robustness, and it seemed to be stable and with good documentation of what to expect in different scenarios and how to configure it (which is not what I experienced doing the same testing back in 2010-2011).

All in all, my impression of MongoDB now is that they're one of those "fake it till you make it" success stories, where they leveraged their popularity into enough momentum to fix most their major problems.

outworlder · on Sept 30, 2021

One thing that turned me away from MongoDB was their utter lack of care for your data integrity that they displayed for years. Some of those instances were even documented. Then there were some bad defaults - some could _also_ cause data loss.

For any component that's viewed as a database (as opposed to, say, cache), data integrity is one of the most important metrics (if not THE most).

In contrast, PostgreSQL data loss bugs are rare - and are treated extremely seriously. Defaults are sane and won't lose data. It's one of the few databases I'm pretty confident that data will be there even if you yank a server power cord mid writes.

Has MongoDB improved? Yes, leaps and bounds(seems to still fail Jepsen tests though). But I can't help but feel that it should have been released as a beta product, instead of claiming it was production ready. It wasn't. Maybe it is now. I'd still evaluate other alternatives before considering it.

That said, one thing that always amuses me is how MongoDB gets mentioned in the same context as PostgreSQL. If PostgreSQL would meet your needs, it's unlikely that MongoDB would. And vice-versa(but maybe something else like Cassandra would).

btown · on Sept 30, 2021

Postgres with tables that are just an ID and a JSONB column nowadays give you practically everything you'd want out of MongoDB.

You can add deep and customized indices as desired, you can easily shard with Citus, and if you want to live without transactions you'll see equally good if not better performance - with the option to add ACID whenever you want. The developer experience argument, where the ->> operator was more confusing than brackets, is now moot.

As a former MongoDB user, there were good synergies between MongoDB and Meteor back in the day, and I loved that tech, but between Materialize and Supabase, you have vastly more options for realtime systems in the Postgres ecosystem.

TedShiller · on Oct 1, 2021

Although MongoDB claims in an undated article entitled "MongoDB and Jepsen"[65] that their database passed Distributed Systems Safety Research company Jepsen's tests, which it called “the industry’s toughest data safety, correctness, and consistency Tests”, Jepsen published an article in May 2020 stating that MongoDB 3.6.4 had in fact failed their tests, and that the newer MongoDB 4.2.6 has more problems including “retrocausal transactions” where a transaction reverses order so that a read can see the result of a future write.[66][67] Jepsen noted in their report that MongoDB omitted any mention of these findings on MongoDB's "MongoDB and Jepsen" page.

from https://en.wikipedia.org/wiki/MongoDB#Bug_reports_and_critic...

threeseed · on Sept 30, 2021

Those defaults were changed a decade ago and were never an issue if you used a driver eg. Python.

And the Jepsen tests are part of the core test suite so do you some evidence they are still failing.

It’s so ridiculous and pointless to be rehashing the same issues a decade later.

Actually more a testament to the company that it’s still hugely successful and depended on by some very large applications.

dralley · on Sept 30, 2021

Perhaps, but mongodb was responsible for something I have bookmarked as "the worst line of code ever".

Which decided whether or not to log connection warnings based on Math.random()

https://github.com/mongodb/mongo-java-driver/blob/1d2e6faa80...

threeseed · on Sept 30, 2021

a) This is a line of code from 2013 and was fixed weeks after.

b) Based on the JIRA [1] it was designed to only log 10% of subsequent failures where there is no connection to prevent log flooding. You would still get the initial failure message.

Pretty reasonable technique and hardly the worst code ever.

[1] https://jira.mongodb.org/browse/JAVA-836

__turbobrew__ · on Sept 30, 2021

https://github.com/mongodb/mongo-java-driver/commit/d51b3648...

AtlasBarfed · on Sept 30, 2021

Jepsen test suite completely tore them a new one. I don't trust any allegedly distributed database that gets excoriated that badly by Aphyr.

https://jepsen.io/analyses/mongodb-4.2.6

That's just a bit more than a year ago. Come on.

MongoDB is like Mysqldb. I am so so so tired of hearing "that's been fixed, it's great now", doing a paper-thin dive into things, and seeing there are massive problems still.

I used MongoDB with Spring Data, it is impressively seamless.

It's just that there are way too many people who have sold snake oil for a decade-plus now, and I don't trust what they say anymore, and won't for a long long time.

TedShiller · on Oct 1, 2021

Even worse, MongoDB lied about having fixed these bugs.

https://en.wikipedia.org/wiki/MongoDB#Bug_reports_and_critic...

kbenson · on Oct 1, 2021

Let's be clear, I definitely don't think it's great. It's just that my immediate response prior to six months ago was to laugh at the mere suggestion it be put into production.

The only reason it actually was put into production is because we had a vendor requirement on it (and why they thought it was sufficient, I'm not sure).

There's a difference between "not suitable for anything because it's so buggy and there's been so many problems over the years" and "not suitable as a replacement for a real RDBMS for important data". For the former, I think my opinion was possible a little harsh for the current state of it. For the latter, yeah, I'm not going to blindly trust it for billing data and processing yet, that's for sure.

AtlasBarfed · on Oct 1, 2021

So did you do app-level code to verify writes? Double checking, etc?

kbenson · on Oct 2, 2021

I wrote a few small test programs to run doing continuous inserts to the master, and tested shutting down, firewalling off, and killing the process of different members of the cluster and how it recovered and if data loss was experienced by comparing data sets.

It was sufficient for me to not feel like we were taking on undue risk by using it, and since our use case is not one where we're in major trouble if a problem does come about (restoring from daily backups should be sufficient) and we're not doing anything transactional, that's good enough. As I mentioned earlier, it was a vendor requirement, so we just wanted to make sure it wasn't something that was problematic enough to make us question the vendor's decision making.

dralley · on Sept 30, 2021

>All in all, my impression of MongoDB now is that they're one of those "fake it till you make it" success stories, where they leveraged their popularity into enough momentum to fix most their major problems.

The downside being that their reputation is now somewhat charred.

hodgesrm · on Sept 30, 2021

> All in all, my impression of MongoDB now is that they're one of those "fake it till you make it" success stories, where they leveraged their popularity into enough momentum to fix most their major problems.

That's not all bad. The same could be said of MySQL. Both DBMS prioritized ease of use over data integrity in the early going.

zozbot234 · on Sept 30, 2021

And yet PostgreSQL making the exact opposite choice has really paid off in the longer run. People used to dismiss it as simply a toy for academics to play with, and look where the project is today. It can easily surpass most NoSQL databases on their home turf.

hodgesrm · on Oct 2, 2021

To be fair PostgreSQL 15 years ago also had a lot of problems storing data reliability. Some of them manifested as performance issues. I also heard a fair number of war stories about corruption with "large" databases (e.g., 1TB+). PG replication lagged MySQL for many years as well. These seem to be non-issues today.

At this point there's effectively no difference in the durability of data stored in MySQL or PostgreSQL, so it's hard to argue that one or the other made a better choice. They just got there by different paths.

In fact, PostgreSQL is winning back share in part because of licensing. GPLv2 is limiting for a lot of applications, and there continue to be concerns about Oracle ownership. It's also absorbed a lot of features from other databases like JSON support. That's not special to PostgreSQL though. It's been a trend since the beginning for SQL RDBMS and explains why they have stayed on top of the OLTP market for decades.

salil999 · on Sept 30, 2021

And how exactly is that a problem?

tpxl · on Sept 30, 2021

Some things seem to have changed from 2018, but MongoDB was by far the worst database I ever had the displeasure of using (and Amazon DocumentDB was even worse).

https://jepsen.io/analyses/mongodb-3-6-4

https://jepsen.io/analyses/mongodb-4.2.6

jd_mongodb · on Sept 30, 2021

Posting old Jepssen analyses is like pointing at old bug reports. Everytime Jepsen finds a bug we fix it lickety-split. I know it's not cool to focus on that fact, but it is a fact. The Jepsen tests are part of the MongoDB test suite so when we fix those problems they stay fixed.

I would love to hear your personal experience of MongoDB as opposed to reposting old Jepsen reports. Perhaps there is something that we can address in 5.1 that is still a problem?

(I work in developer relations at MongoDB)

jenny91 · on Sept 30, 2021

The latest "old Jepsen report" is barely a year old. It's not like digging up dirt from years ago.

It also seems like there was quite a lot wrong even a year ago, quoting from there:

> Roughly 10% of transactions exhibited anomalies during normal operation, without faults.

It's just not a very reassuring response to say "when someone goes to dig a bit and finds a lot of show-stopping bugs, we address those specific bugs quickly".

To me it sounds like the architecture and care just isn't there for a robust data storage layer?

jiggawatts · on Oct 1, 2021

Something that was drilled into me decades ago is that there is no such thing as fixing multi-threaded (or distributed) code via debugging or patching it "until it works".

You either mathematically prove that it is correct, or it is wrong for certain.

This sounds like an oddly strong statement to say, but the guy who wrote the textbook that contained that statement went on to dig up trivial looking examples from other textbooks that were subtly wrong. His more qualified statement is that if a professor writing simplified cases in textbooks can't get it right, then the overworked developer under time pressure writing something very complex has effectively zero chance.

The MongoDB guys just don't understand this. They're convinced that if they plug just one more hole in the wire mesh, then it'll be good enough for their submarine.

PS: The professor I was referring to is Doug Lea, who wrote the "EDU.oswego.cs.dl.util.concurrent" library for Java. This was then used as the basis for the official "java.util.concurrent".

bmcahren · on Oct 1, 2021

If you use MongoDB as a document store, arguably it's core functionality, you're not exposed to any of the shortcomings Jepsen rightly identified and exploited weaknesses in.

Transactions are new to MongoDB and they are not necessary for most. Structure your data model so you only perform single-document atomic transactions ($inc, $push, $pull) rather than making use of multi-document ACID transactions. It's possible, we're doing it for our ERP.

Sharding is something we've intentionally avoided opting for application-layer regional clusters. We specifically were avoiding other complexities related to shards that are not a concern for replica sets. Durability and maximum recovery time during emergency maintenance caused us to avoid them.

sigmonsays · on Sept 30, 2021

where is the latest jepsen test results published?

mayankkaizen · on Sept 30, 2021

That was sarcasm. But yeah, you can search for MongoDB and you'll come across many many posts criticizing it.

It can be said Mongodb is hated ad much as Postgres is loved.

Personally I have no opinion about mongodb.

The_Colonel · on Sept 30, 2021

Yes, and most of these love/hate memes are blowned out of proportion by people who don't actually have any real expertise in those technologies, but just parrot whatever they've read in some memes.

staticassertion · on Sept 30, 2021

You're exactly correct. Tons of "XYZ is bad" because of some meme that they don't even understand or have context on that hasn't been relevant for years.

I have no idea if MongoDB is good or bad at this point, but the comments of "haha it's mongo" are completely devoid of meaningful content and should be flagged.

operator9A · on Sept 30, 2021

I was part of a team that operated a large Mongo cluster for most of the last decade. I would not have advised anyone to use Mongo as their durable source of truth database then, and I still don't think it's advisable to do so now. On numerous occasions, Mongo demonstrated the consequences of poor engineering judgment and an addled approach to logic in critical components responsible for data integrity. In addition, Mongo internalized many poor patterns with respect to performance and change management. Mongo did not, and does not provide the data integrity or performance guarantees that other databases internalize by design (the WiredTiger transition helped, but did not cure many of the issues).

PostgreSQL introduced JSONB GIN index support sometime around 2015, making Postgres a better fit for most JSON-based applications than Mongo.

staticassertion · on Sept 30, 2021

My issue isn't with people not liking Mongo. It's with contentless meme posts. Your post has real information that adds value to the conversation, and I appreciate that you took the time to write it out.

cztomsik · on Oct 1, 2021

Because it's (or at least it definitely WAS) true.

There are valid use-cases for mongo but for vast majority of things, you're better to start with postgres. And I say that as an early adopter - I really wanted mongo to succeed but it just failed all of my expectations. All of them.

BTW: this is post about postgres.

threeseed · on Sept 30, 2021

You can find posts criticising every database.

Most of the ones for MongoDB are from a decade ago and not at all relevant today.

TedShiller · on Oct 1, 2021

It's easier to ask how that is NOT a problem, because that list will be much, much shorter.

outworlder · on Sept 30, 2021

> Edit: Possibly misinformed but the last deep dive we did indicated there was not a way to use logical replication for seamless upgrades. Will have to research.

It is possible since PG10

https://severalnines.com/database-blog/how-upgrade-postgresq...

bmcahren · on Oct 1, 2021

We seem to have been misguided by all of the Amazon RDS and Aurora documentation. It seems Amazon prefers to implement postgres logical replication through their database migration service. All upgrades are typically done through pg_upgrade which does require downtime.

Interesting. I can't wait to see how PG12 influences future offerings from the cloud providers for more seamless major version upgrades.

ggregoire · on Sept 30, 2021

MongoDB and Postgres are like apples and oranges tho.

I'm not gonna choose MongoDB if I need a relational model… even if it offers zero downtime upgrades out-of-the-box.

bmcahren · on Oct 1, 2021

You might choose Postgres with JSON as an alternative to MongoDB though. There are plenty of people pushing the limits of MongoDB who are researching it if not just for access to a larger pool of DBAs who can work for them.

zozbot234 · on Sept 30, 2021

Logical replication across major releases for seamless upgrades has been supported and documented since pgSQL 10.

jeltz · on Sept 30, 2021

Migrations are quite highly prioritized in PostgreSQL. PostgreSQL has the by far best migration support of any database I have worked with. There is of course a lot of work left to do but that is true for many other areas in PostgreSQL.

Also I can't agree at all with "nobody coded it", patches in this area generally welcome. My first real patch for PostgreSQL was about improving migrations. And there has been talk about several of the ideas you propose, but nobody is working on any of them right now. So I mostly think it is a lack of resources in general.

skrebbel · on Sept 30, 2021

Sorry, I didn't mean offense. What I meant with "nobody coded it" is that the migration DX features that don't exist yet, likely don't exist simply because they haven't been made yet (and not because eg they're architecturally impossible or because the postgres team are stupid or sth).

Its hard to complain about OSS without attacking the creators, I tried to do that right buy clearly I failed nevertheless :-) Thanks for your contributions!

nix23 · on Sept 30, 2021

[flagged]

burnished · on Sept 30, 2021

What does SJP mean?

CRConrad · on Oct 1, 2021

"Social Justice" something, probably, like in "SJW" = "Social Justice Warrior".

Some people see politeness or compassion as weakness, and ttherefore use terms acknowledging it as pejoratives. The funny thing is, they think this makes objects of derision of their targets, and don't realise that it's themselves it does so.

burnished · on Oct 1, 2021

Ah, I suppose that makes sense. Its weird to me how 'Social Justice Warrior' is derogatory.. because it puts you against people fighting for justice. I get the mind games (I think) you have to play to arrive at the conclusion that SJW is a good insult but its got this "are we the baddies?" energy, to me.

oauea · on Sept 30, 2021

DDL migrations are amazing. Migrations between different postgres versions not so much.

aynyc · on Sept 30, 2021

I love postgresql, as long as I don’t have to do upgrade. I have yet to see a successful zero downtime upgrade. That being said, other databases aren’t that much better. Maybe except SQLite.

nix23 · on Sept 30, 2021

> far best migration support of any database I have worked with

BS, have you never worked with mysql or sqlite?

dijit · on Sept 30, 2021

MySQL migrations are beyond a joke.

It’s absolutely impossible that you’re not trolling with such a statement.

nix23 · on Sept 30, 2021

Copy/hardlink the data-files and start the new engine instead have to make a dump and import it?

It's a joke that you have to make a offline dump and import for upgrades with pgsql.

dijit · on Sept 30, 2021

Did you miss this? https://www.postgresql.org/docs/10/pgupgrade.html

nix23 · on Oct 1, 2021

Works one out of three without hiccups.

dijit · on Oct 1, 2021

Sounds like a fun bug then.

I’ve never had it fail for me. :/

phamilton · on Sept 30, 2021

They are slowly getting there. For example, postgres 12 added REINDEX CONCURRENTLY. Under the hood it's just recreating the index and then name swapping and dropping the old one. Basically what pg_repack was doing.

There's a huge collection of tricks out there that just need to become formal features. The trick I'm working with today is adding a check constraint with NOT VALID and then immediately calling VALIDATE because otherwise it takes a very aggressive lock that blocks writes. That could easy become ALTER TABLE CONCURRENTLY or something.

skrebbel · on Sept 30, 2021

Do you know a good resource with these tricks? I often struggle to predict exactly which schema migrations will lock tables aggressively and what the smartest workaround is.

williamdclt · on Sept 30, 2021

We've had good success with https://github.com/sbdchd/squawk to lint migrations. It tells you if a query is going to lock your table (as long as it's written in SQL, not some ORM DSL)

tomhallett · on Sept 30, 2021

An orthogonal migration issue which I'm hitting right now: we need to migrate from heroku postgres to aws rds postgres, and I'm stressed about the risk and potential downtime in doing so. If there was a way to make a replica in rds based on heroku, promote the rds replica to be the primary, hard switch our apps over to rds, that'd be a lifesaver.

I'm working through this blog post [1] now, but there is still a bit to be defined (including a dependency on heroku's support team) to get this rolling.

Why the migration is required? Heroku postgres doesn't support logical replication, and logical replication is required for any ELT vendor (Fivetran, Stitch, Airbyte) to use Change Data Capture to replicate data from postgres to snowflake (with replicating deleted rows efficiently).

Note: I've also read this ebook [2], but this approach requires downtime.

Note 2: I reached out to heroku support and asked if logical replication was on their short term roadmap. They said they've heard this quite a bit, but nothing tangible is on the roadmap.

If anyone has any thoughts on the above migration, I'd be all ears. :)

1) https://vericred.com/how-we-migrated-a-1tb-database-from-her...

2) https://pawelurbanek.com/heroku-migrate-postgres-rds

bastawhiz · on Sept 30, 2021

I did that exact migration. Unfortunately, to my knowledge, there's no way to do it with zero downtime. You need to make your app read only until the RDS instance has ingested your data, then you can cut over. For me, that was roughly one gigabyte of data and took about forty seconds.

My best advice is to automate the whole thing. You can automate it with the Heroku and AWS CLIs. Test on your staging site until you can run through the whole process end to end a few times with no interruptions.

oauea · on Sept 30, 2021

Yep, absolutely garbage that these clouds (Azure is another one) don't allow you to replicate with external systems. Pretty much devalues their entire hosted postgresql offering if you ask me, since it's just designed to keep you locked in (duh).

If you have any significant amount of data where you're worried about a migration, stay far away from hosted postgres offerings. You'll never get your data out without significant downtime.

barrkel · on Sept 30, 2021

There are other ways to handle this at the application level, to be clear, using dual read & write and backfill. More relevant when you have TB+++ of data.

tomhallett · on Sept 30, 2021

Interesting. I've done dual-writes at the application level to migrate the datastore for a smaller feature (branch by abstraction), but never for an entire application. And the code path was quite simple, so it was easy to think about all of the edge cases at one time in your head.

Do you have any resources which talk through the read/write/backfill approach?

Here's what I found so far: * https://medium.com/google-cloud/online-database-migration-by... * https://aws.amazon.com/blogs/architecture/middleware-assiste...

bastawhiz · on Oct 1, 2021

Jumping in again... Your post reminded me that I actually typed my migration up!

https://mattbasta.medium.com/migrating-from-heroku-to-aws-6d...

Hopefully it's somewhat helpful!

tomhallett · on Sept 30, 2021

Thank you for this - extremely helpful in validating the current approach and de-risking the developer time.

laurent92 · on Sept 30, 2021

So, basically, Postgres would have a replication port which can be used for both replication/clustering and transfer across cloud providers. And sharding. </dreaming>

bastawhiz · on Oct 1, 2021

I mean, it essentially does. Heroku's managed postgres has it disabled.

craigkerstiens · on Sept 30, 2021

We've moved a number of customers from Heroku over to Crunchy Bridge with essentially no down time, am currently helping one customer with 7TB through that process. It's not over to RDS, but would be happy to talk through process if helpful. And we do support logical replication and have many people using wal2json/logical replication with us.

zozbot234 · on Sept 30, 2021

> Why the migration is required? Heroku postgres doesn't support logical replication

You could possibly hack together some form of higher-layer logical replication via postgres_fdw and database triggers. A comment ITT references this as a known technique.

sidmitra · on Sept 30, 2021

One possible solution for the ETL stuff might be to use Heroku Kafka for the Change Data Capture and then from that Kafka you can move it someplace else.

See https://blog.heroku.com/streaming-data-connectors-beta Heroku's own Kafka seems to have slightly more native support than if you use a 3rd party like Confluence.

We've not yet tried any of this, but it's been bookmarked as a possible solution to explore.

tomhallett · on Sept 30, 2021

Interesting that you bring this up. I looked into heroku's streaming connectors to facilitate an integration with materialize.com, but Heroku's support team wasn't confident we could sync all 187 postgres tables under 1 connection.

I thought about using Debezium and Kafka to roll my own micro-batch ETL solution, but listening to this podcast made me walk away slowly: https://www.dataengineeringpodcast.com/datacoral-change-data...

gunnarmorling · on Sept 30, 2021

Interesting, what was it from that podcast that made you reconsider? Always eager to learn about opportunities for improving the experience of using Debezium.

Disclaimer: I work on Debezium

tomhallett · on Sept 30, 2021

Oh wow, by "work on" you mean "the core maintainer of". Thank you for replying. :)

The main part I reconsidered based on was the level of effort taking the data from kafka and landing into snowflake, especially around handle postgres schema changes safely. I also have no experience with kafka, so I'd be out of my depth's pretty quickly for a critical part of the architecture. He also expressed the need for building quality checks into the kafka to snowflake code, but those details were a bit sparse (if i recall correctly).

Note: all of the above are probably outside the scope of debezium. :)

Note 2: your article [1] on using cdc to build audit logs w/ a "transactions" table blew my mind. Once I listened to your data engineering podcast interview [2], I knew there was some implementation of "event sourcing lite w/ a crud app" possible, so I was excited to see you had already laid it out.

1) https://debezium.io/blog/2019/10/01/audit-logs-with-change-d...

2) https://www.dataengineeringpodcast.com/debezium-change-data-...

gunnarmorling · on Sept 30, 2021

Gotcha, yeah, there's many things to consider indeed when setting up end-to-end pipelines. Thanks for the nice feedback, so happy to hear those resources are useful for folks. As far as event sourcing is concerned, we got another post [1] which might be interesting to you, discussing how "true ES" compares to CDC, pros/cons of either approach, etc.

[1] https://debezium.io/blog/2020/02/10/event-sourcing-vs-cdc/

another · on Oct 1, 2021

We faced this migration, too. My sympathies.

Adding to your list of options that still require _some_ downtime: we used Bucardo [0] in lieu of logical replication. It was a bit of a pain, since Bucardo has some rough edges, but we made it work. Database was ~2 TiB.

[0] https://bucardo.org/

tomc1985 · on Sept 30, 2021

When you subscribe to managed services instead of running the software yourself, these are the kinds of trade-offs that get made

smileysteve · on Sept 30, 2021

logical replication, but this is one of the walls that heroku creates.

tomhallett · on Sept 30, 2021

Coming from the outside, with zero understanding of the internal details, my hunch is the same: lack of support for logical replication is more of a business decision than a technical decision. (But again, this a hunch -- partially based on how good heroku is from a technical perspective)

oauea · on Sept 30, 2021

It's absolutely an evil business decision, and all the clouds are playing this game. Don't ever use a hosted database solution if you're thinking about storing any significant amount of data. You will not be able to get it out without downtime.

coded · on Oct 2, 2021

It looks like gcp supports logical replication now: https://cloud.google.com/blog/products/databases/you-can-now...

oauea · on Oct 2, 2021

The big question is: Can you enter arbitrary IP addresses, or do you have to replicate to another GCP instance? Azure does the latter.

coded · on Oct 3, 2021

Yes you can enter an arbitrary IP: https://cloud.google.com/sql/docs/postgres/replication/confi...

dragonwriter · on Sept 30, 2021

> I feel like there's a certain level of stockholm syndrome in the sense that to PG experts, these things aren't that painful anymore

I don't know if I’m a PG expert, but I just prefer “migration tool” to be a separate thing, and for the DB server engine to focus on being an excellent DB server engine, with the right hooks to support a robust tooling ecosystem, rather than trying to be the tooling ecosystem.

JohnBooty · on Sept 30, 2021

As a developer I fully support the notion of splitting the tools out from the server engine, like things are today.

But, realistically, pg_upgrade's functionality would need to be integrated into the server itself if we're ever going to have zero-downtime upgrades, right?

I don't know how other RDBMSs handle this, if at all

dragonwriter · on Sept 30, 2021

Yeah, the comment I was responding to addressed two different kinds of migration—schema migration and version upgrades—and my comment really applies more to schema migration than version upgrades; more support for smoothing the latter in the engine makes sense.

JohnBooty · on Sept 30, 2021

Agreed: Postgres' transactional schema migration is freaking sublime.

I used and abused it pretty hard at my previous gig and now it's hard to imagine ever living without it.

At my gig before THAT, we had MySQL and schema migrations were so very very painful.

throw0101a · on Sept 30, 2021

> If I may hijack the thread with some more general complaints though, I wish the Postgres team would someday prioritize migration.

A thing I'm interested in is a 'simple' replication setup to reduce single points of failure. We currently use Galera with My/MariaDB/Percona and it's quite handy for HA-ish needs: we can have two DBs and the garbd running on the web app server.

Pointers to tutorials for Debian/Ubuntu to accomplish something similar would be appreciated. (We run things on-prem in a private cloud.)

fanf2 · on Sept 30, 2021

Take a look at Patroni, PAF, repmgr, Stolon, etc.

https://github.com/zalando/patroni

https://clusterlabs.github.io/PAF/

https://repmgr.org/

https://github.com/sorintlab/stolon

zozbot234 · on Sept 30, 2021

There's no such thing as "simple" when it comes to HA setups, the requirements are simply too varied. PostgreSQL has great documentation for their HA featureset, but when it comes to systems-level concerns (detecting that a primary is down and arranging promotion of a replica to primary) you're expected to address those on your own.

throw0101a · on Sept 30, 2021

With our Galera setup we have a keepalived health check look at the local system, and if it fails/times out it stops sending heart beats so the other sides takes over the vIP. If one system crashes the vIP fails over as well.

harikb · on Sept 30, 2021

Doesn’t PG already support inplace version upgrade?

Also PG is one of the few that support schema/DDL statements inside a transaction.

skrebbel · on Sept 30, 2021

"one of the few" is a pretty low bar though, I don't know a DB that doesn't suck at this.

jeltz · on Sept 30, 2021

Or maybe it is you who are underestimating the technical complexity of the task? A lot of effort has been spent on making PostgreSQL as good as it is on migrations. Yes, it is not as highly prioritized as things like performance or partitioning but it is not forgotten either.

znpy · on Sept 30, 2021

My complain would be that there is no standard multi-master solution for postures, whereas mysql now has group replication as a native multi-master solution.

brightball · on Sept 30, 2021

There’s a Ruby gem called strong_migrations that does this. It’s fantastic and I include it on all of my Ruby projects.

why-el · on Sept 30, 2021

The gem does not solve these issues, merely tells you about them (and even then it can't catch all of them, only the ones that Rails defines).

brightball · on Sept 30, 2021

It gives you mitigation strategies.

The mere presence of it on the dev workflow keeps your developers thinking about these types of issues as well, which goes a long way.

hyper_reality · on Sept 30, 2021

PostgreSQL is one of the most powerful and reliable pieces of software I've seen run at large scale, major kudos to all the maintainers for the improvements that keep being added.

> PostgreSQL 14 extends its performance gains to the vacuuming system, including optimizations for reducing overhead from B-Trees. This release also adds a vacuum "emergency mode" that is designed to prevent transaction ID wraparound

Dealing with transaction ID wraparounds in Postgres was one of the most daunting but fun experiences for me as a young SRE. Each time a transaction modifies rows in a PG database, it increments the transaction ID counter. This counter is stored as a 32-bit integer and it's critical to the MVCC transaction semantics - a transaction with a higher ID should not be visible to a transaction with a lower ID. If the value hits 2 billion and wraps around, disaster strikes as past transactions now appear to be in the future. If PG detects it is reaching that point, it complains loudly and eventually stops further writes to the database to prevent data loss.

Postgres avoids getting anywhere close to this situation in almost all deployments by performing routine "auto-vacuums" which mark old row versions as "frozen" so they are no longer using up transaction ID slots. However, there are a couple situations where vacuum will not be able to clean up enough row versions. In our case, this was due to long-running transactions that consumed IDs but never finished. Also it is possible but highly inadvisable to disable auto-vacuums. Here is a postmortem from Sentry who had to deal with this leading to downtime: https://blog.sentry.io/2015/07/23/transaction-id-wraparound-...

It looks like the new vacuum "emergency mode" functionality starts vacuuming more aggressively when getting closer to the wraparound event, and as with every PG feature highly granular settings are exposed to tweak this behaviour (https://www.postgresql.org/about/featurematrix/detail/360/)

jedberg · on Sept 30, 2021

> Also it is possible but highly inadvisable to disable auto-vacuums.

When I was running my first Postgres cluster (the reddit databases), I had no idea what vacuuming was for. All I knew was that every time it ran it slowed everything down. Being dumb, I didn't bother to read the docs, I just disabled the auto vacuum.

Eventually writes stopped and I had to take a downtime to do a vacuum. Learned a few important lessons that day. I also then set it up to do an aggressive vacuum every day at 3am, which was the beginning of low traffic time, so that the auto-vacuuming didn't have as much work to do during the day.

williamdclt · on Sept 30, 2021

Everytime I've seen people having "vacuuming too expensive" problems, the solution was "more vacuum"!

infogulch · on Sept 30, 2021

"vacuum during the times when I want instead of randomly at peak traffic"

jedberg · on Sept 30, 2021

As it turns out, yes!

mattashii · on Sept 30, 2021

> Each time a transaction modifies rows in a PG database, it increments the transaction ID counter.

It's a bit more subtle than that: each transaction that modifies, deletes or locks rows will update the txID counter. Row updates don't get their own txID assigned.

> It looks like the new vacuum "emergency mode" functionality starts vacuuming more aggressively when getting closer to the wraparound

When close to wraparound, the autovacuum daemon stops cleaning up the vacuumed tables' indexes, yes. That saves time and IO, at the cost of index and some table bloat, but both are generally preferred over a system-blocking wraparound vacuum.

darksaints · on Sept 30, 2021

What is wrong with using a 64 bit, or even 128 bit transaction id?

hyper_reality · on Sept 30, 2021

It would increase disk usage by a significant amount, since transaction IDs appear twice in tuple headers (xmin/xmax). Essentially they are overhead on every database row. This submission has a discussion on it: https://news.ycombinator.com/item?id=19082944

polskibus · on Sept 30, 2021

I wonder how does MS SQL work differently.

sitharus · on Sept 30, 2021

By default MS SQL uses pessimistic locking, depending on isolation levels. There's only one version of the data on disk and the isolation level of a transaction determines what happens - for example if a transaction in SERIALIZABLE reads a row SQL Server takes a shared read lock on that row preventing any other transaction from writing to it.

MS SQL also has snapshot (and read committed snapshot) isolation levels. These are much more like the Postgresql isolation levels - in fact Postgres only has two 'real' isolation levels, read committed and serializable, you get upgraded to the next higher level as permitted in the spec.

In snapshot isolation instead of taking a lock SQL Server copies the row to a table TempDB when it would be overwritten, additionally it adds a 14-byte row version to each row written. There's a lot of detail here: https://docs.microsoft.com/en-us/sql/relational-databases/sq...

This is also why MS SQL maintains a clustered index on the main table - the main table only contains the latest globally-visible version, so it can be sorted. Postgres stores all versions (until vacuum removes dead rows), so the main table is a heap, only indexes are sorted.

outworlder · on Sept 30, 2021

It has a similar concept if you need MVCC (with InnoDB). It also has a concept of transaction IDs. And also need to clean them up (purge). They will both have table bloat if not done.

Since details matter, there's a post that explains it far better than I could:

https://www.enterprisedb.com/blog/mysql-vs-postgresql-part-2...

polskibus · on Sept 30, 2021

I asked about MS not My though.

namibj · on Sept 30, 2021

Yeah, making this not be optional is the issue. Only some situations don't get by with 32 bit txids, but imposing the cost on everyone would be bad.

Oh, and C codebases make such changes far harder than more rigid newtyping in e.g. Rust, which is why I assume no one made the necessary patches yet.

SigmundA · on Sept 30, 2021

I believe mostly how much code needs to be changed, which they are working towards slowly, but there is more overhead (memory / disk) associated with those larger data types which are used everywhere:

https://wiki.postgresql.org/wiki/FullTransactionId

cutler · on Sept 30, 2021

grzm · on Sept 30, 2021

Site Reliability Engineer

https://en.wikipedia.org/wiki/Site_reliability_engineering

MrWiffles · on Sept 30, 2021

Thank you for this explanation!

mattashii · on Sept 30, 2021

Once again, thanks to all the contributors that provided these awesome new features, translations and documentation.

It's amazing what improvements we can get through public collaboration.

postgresapp · on Sept 30, 2021

If you want to test the new features on a Mac, we've just uploaded a new release of Postgres.app: https://postgresapp.com/downloads.html

abdusco · on Sept 30, 2021

I love this app on Mac, but I wonder if there is a similar app for Windows (i.e. portable Postgres)?

TedShiller · on Sept 30, 2021

This app is amazing. Highly recommend it.

darksaints · on Sept 30, 2021

I know this isn't even a big enough deal to mention in the news release, but I am massively excited about the new multirange data types. I work with spectrum licensing and range data types are a godsend (for representing spectrum ranges that spectrum licenses grant). However, there are so many scenarios where you want to treat multiple ranges like a single entity (say, for example, an uplink channel and a downlink channel in an FDD band). And there are certain operations like range differences (e.g. '[10,100)' - '[50,60)'), that aren't possible without multirange support. For this, I am incredibly grateful.

Also great is the parallel query support for materialized views, connection scalability, query pipelining, and jsonb accessor syntax.

jkatz05 · on Sept 30, 2021

Multiranges are one of the lead items in the news release :) I do agree that they are incredibly helpful and will help to reduce the complexity of working with ranges.

buro9 · on Sept 30, 2021

Suppose I had a "friend" with a PostgreSQL 9.6 instance (a large single node)... what's the best way to upgrade to PostgreSQL 14?

dylanz · on Sept 30, 2021

If you're on a single node, probably something like this. This is what my "friend" is doing later this week. He's migrating from 9.5 to 13:

    sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
    wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
    sudo apt-get update
    sudo apt install postgresql-13
    sudo service postgresql stop
    sudo mkdir -p /secure/pgsql13/data
    sudo chown -R postgres:postgres /secure/pgsql13/data
    sudo -u postgres /usr/lib/postgresql/13/bin/initdb -D /secure/pgsql13/data
    sudo rm -rf /var/lib/postgresql/13/main
    sudo ln -s /secure/pgsql13/data /var/lib/postgresql/13/main
    sudo chown -hR postgres:postgres /var/lib/postgresql/13/main
    sudo ln -s /etc/postgresql/9.5/main/postgresql.conf /var/lib/postgresql/9.5/main/postgresql.conf
    sudo chown -hR postgres:postgres /etc/postgresql/9.5/main/postgresql.conf
    cd /tmp
    sudo -u postgres time /usr/lib/postgresql/13/bin/pg_upgrade --old-bindir /usr/lib/postgresql/9.5/bin --new-bindir /usr/lib/postgresql/13/bin --old-datadir /var/lib/postgresql/9.5/main --new-datadir /var/lib/postgresql/13/main --link --check
    sudo -u postgres time /usr/lib/postgresql/13/bin/pg_upgrade --old-bindir /usr/lib/postgresql/9.5/bin --new-bindir /usr/lib/postgresql/13/bin --old-datadir /var/lib/postgresql/9.5/main --new-datadir /var/lib/postgresql/13/main --link
    sudo service postgresql start 13
    sudo -u postgres "/usr/lib/postgresql/13/bin/vacuumdb" --all -j 32 --analyze-only
    sudo -u postgres ./delete_old_cluster.sh

throw0101a · on Sept 30, 2021

> wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -

Is there a reason to do this rather than putting the key in /etc/apt/trusted.gpg.d/?

idoubtit · on Sept 30, 2021

Putting the key in /etc/apt/trusted.gpg.d/ gives it too much power over all the repositories. A malicious (maybe compromised) third party repository could publish a package that replace an official Debian package.

So trusted.gpg.d/ is not the recommended method. For more information, see the official Debian wiki which states "The key MUST NOT be placed in /etc/apt/trusted.gpg.d" https://wiki.debian.org/DebianRepository/UseThirdParty

throw0101a · on Sept 30, 2021

Actually it states:

> The key MUST NOT be placed in /etc/apt/trusted.gpg.d or loaded by apt-key add.

And yet the snippet that I quoted has the latter command.

Further: the link has the example "[signed-by=/usr/share/keyrings/deriv-archive-keyring.gpg]". Perhaps it's my BSD upbringing showing through, but shouldn't only/mostly OS-provided stuff generally live in /usr/share? Shouldn't locally-added stuff go into /usr/local/share? Or perhaps creating a /etc/apt/local.gpg.d would be appropriate?

pabs3 · on Oct 1, 2021

The idea is you put the keyring into the same place where the -keyring package will be installed, so the -keyring package will overwrite it, so you don't then have a temporary keyring you downloaded with wget to delete, because it was overwritten by the -keyring package.

pabs3 · on Oct 1, 2021

Unless you are auditing all third-party packages before installing them, any package can modify /etc/apt/trusted.gpg.d/ by dropping a file in there, or from its postinst/etc scripts at installation time etc. So using the signed-by mechanism isn't much of a security enhancement without the auditing.

foresto · on Sept 30, 2021

And while you're there, pay attention to the signed-by option. It's what allows you (the admin) to limit APT's trust in the keys you add.

sharadov · on Sept 30, 2021

I second this, upgraded 100 odd instances using pg_upgrade. And it's lightning fast, takes less than a min whether the db cluster is a couple GB or a few TB. Just make sure that you always run the check mode to catch incompatibilities between versions.

GrayShade · on Oct 1, 2021

I think you can use -o '-D /etc/postgresql/9.5/main' -O '-D /etc/postgresql/13/main' to avoid making the symlink to postgresql.conf. This should also make it easier if postgres.conf tries to includes files from a conf.d subdirectory.

nicoburns · on Sept 30, 2021

Looks like the best approach might be to use in-place upgrade tool that ships with Postgres to upgrade to v10 (use Postgres v10 to perform this upgrade). From there you'd be able to create a fresh v14 instance and use logical replication to sync over the v10 database. Before briefly stopping writes while you swap over to the v14 database.

EDIT: Looks like pg_upgrade supports directly upgrading multiple major versions. So maybe just use that if you can afford the downtime.

yarcob · on Sept 30, 2021

Why would you first upgrade to PG 10?

zozbot234 · on Sept 30, 2021

Native logical replication (making it possible to upgrade without downtime) was introduced in pgSQL 10. But if you're going to have downtime anyway, there's no reason not to do the upgrade in a single step. pg_upgrade should support that quite easily.

pilif · on Oct 1, 2021

also, logical replication has some issues that make it not necessarily perfect for cross version migration: schema changes and sequence usages are not synced.

While schema changes are probably not much of a problem because that's something you can easily prevent during migration, sequence usage is because once you fail over to the logically replicated secondary, you will have to reset all sequences to their current values or all further inserts into tables with sequences will fail.

the other option, of course is to not use sequences but rather use UUIDs for your primary keys, but those have their other collection of issues

nicoburns · on Sept 30, 2021

Because version below 10 don't support logical replication. The alternative would be to use a 3rd party extension for replication, which may well be a good option but I don't have any experience with that so I can't really comment.

sudhirj · on Sept 30, 2021

With downtime, I guess the pg_upgrade tool works fine.

Without downtime / near-zero downtime is more interesting though. Since this is an old version, something like Bucardo maybe? It can keep another pg14 instance in sync with your old instance by copying the data over and keeping it in sync. Then you switch your app over to the new DB and kill the old one.

Newer versions make this even easier with logical replication - just add the new DB as a logical replica of the old one, and kill the old one and switch in under 5 seconds when you're ready.

yarcob · on Sept 30, 2021

I think I remember a talk where someone manually set up logical replication with triggers and postgres_fdw to upgrade from an old server with zero downtime.

Diggsey · on Oct 1, 2021

pg_upgrade has a `link` option that when used reduces the time take to ~15 seconds.

IMO, that counts as "near-zero downtime".

sudhirj · on Oct 1, 2021

Sort of, yeah. I usually consider zero-downtime to be ‘imperceptible to users’. This option is interesting, though, first I’ve heard of it. Thanks!

jeltz · on Sept 30, 2021

If you can afford a short downtime I would recommend just running pg_upgrade and upgrade directly to 13. Preferably test it out first before doing it in production.

jeremy_sikich · on Sept 30, 2021

Pglogical, replicate to a new database server. There will be minimal downtime and it allows you to test the process thoroughly.

bananaoomarang · on Sept 30, 2021

Yeah this is how we do it... Even with terrabytes of data it seems to be pretty efficient and robust, and easy to just keep up the replicating version until you're satisfied it all looks good.

yarcob · on Sept 30, 2021

Depends on whether you can shut down the server for some time or not. If you can live with some downtime, just use pg_upgrade if you install the new server on the same machine: https://www.postgresql.org/docs/current/pgupgrade.html

briffle · on Sept 30, 2021

We are currently using pg_upgrade to go from 9.6 to 13 in our systems. its supported and works well. We are using hte 'hardlinks' feature that makes it extremely fast as well.

zozbot234 · on Sept 30, 2021

Do be aware that using the "hardlinks" feature will mean that restarting the old database version will not work properly and may not be safe. You should make sure to copy the old directory beforehand if you might need that, or simply restore from backup.

Tostino · on Sept 30, 2021

I'd argue you shouldn't be performing an upgrade at all without a proper backup. But yes, absolutely do not use a hardlinks upgrade unless you have a backup laying around.

olavgg · on Sept 30, 2021

This is one of the major pain points with PostgreSQL. Unless you absolutely need any new features and uptime is important, you can just continue using PostgreSQL 9.6 even thought it is EOL. https://www.postgresql.org/support/versioning/ It will most likely work great for many more years.

I wish future versions of PostgreSQL will have some backwards compatibility for old system/data tables/datastructures and be able to do live migration when running a newer release.

pg_upgrade will not work if the internal data structure for your data changes. It only recreates the system tables. https://www.postgresql.org/docs/14/pgupgrade.html

Tostino · on Sept 30, 2021

This is not really true for a number of reasons. pg_upgrade is absolutely the preferred method if you are able to spare small downtime. It will absolutely work to upgrade your existing data structures to be compatible with the new Postgres version, so not sure what you even mean.

It doesn't automatically re-create indexes (to take advantage of new features) for example, but that is likely something you don't want to do right away (incurring extra downtime) when you are doing an upgrade anyways. You can easily just REINDEX after the upgrade is complete.

olavgg · on Oct 1, 2021

If you read the link about pg_upgrade, you're not guaranteed that pg_upgrade will work if the underlying data storage format changes. pg_upgrade may not work in a future version of PostgreSQL. It works for version 14 though.

maxpert · on Sept 30, 2021

I converted from MySQL (before whole MariaDB and fork), and I've been happier with every new version. My biggest moment of joy was JSONB and it keeps getting better. Can we please make the connections lighter so that I don't have to use stuff like pgbouncer in the middle? I would love to see that in future versions.

paulryanrogers · on Sept 30, 2021

FWIW Mysql 8 has gotten a lot better in standards compliance and ironing out legacy quirks, with some config tweaks. While my heart still belongs to PostgreSQL things like no query hints, dead tuple bloat (maybe zheap will help?), less robust replication (though getting better!), and costly connections dampens my enthusiasm.

assface · on Sept 30, 2021

> maybe zheap will help?

According to Robert Haas, the zheap project is dead.

eurg · on Sept 30, 2021

Where did Robert Haas say this? Quick search didn't surface much, except for a blog post from July by Hans-Jürgen Schönig: https://www.cybertec-postgresql.com/en/postgresql-zheap-curr... Doesn't sound like they discontinued the project.

Tostino · on Sept 30, 2021

I'd love to see that info, been interested in why it was taken up by Cybertec and EDB seemingly dropped it.

darksaints · on Sept 30, 2021

There has been a lot of improvement in this release.

https://www.depesz.com/2020/08/25/waiting-for-postgresql-14-...

the_duke · on Sept 30, 2021

The benchmark linked in the comments shows 7-38% improvements. Nothing to scoff at, but if you need PGBouncer that probably won't make enough of a difference.

darksaints · on Sept 30, 2021

It certainly isn't much of an improvement in connection latency (the connections are still pretty heavy), but it is a massive improvement in transaction throughput with higher numbers of connections. If you scroll down a bit, there is now higher TPS even at 5000 connections than previously could be had at 100 connections. That fixes a massive amount of pain that previously only could be solved with pgbouncer.

jkatz05 · on Sept 30, 2021

I'd be curious to see if the concurrency improvements in PostgreSQL 14 help with increasing the threshold for when you need to introduce a connection pooler.

darksaints · on Oct 3, 2021

Check out the Azure team's benchmarks. Pretty damn impressive.

https://techcommunity.microsoft.com/t5/azure-database-for-po...

xwdv · on Sept 30, 2021

Lighter connections would finally allow for using lambda functions that access a Postgres database without needing a dedicated pgbouncer server in the middle.

jeltz · on Sept 30, 2021

Yes, but this patch does not make startup cheaper, it only decreases the performance cost of concurrent open connections.

plasma · on Sept 30, 2021

Not sure if you’re using this, but AWS has RDS Proxy as a service, in case you’re hosting your own

TOMDM · on Sept 30, 2021

PostgreSQL is one of those tools I know I can always rely on for a new use-case. There are very few cases where it can't do exactly what I need (large scale vector search/retrieval).

Congrats on the 14.0 release.

The pace of open source has me wondering what we'll be seeing 50 years from now.

gk1 · on Sept 30, 2021

I recall seeing some library that adds vector search to Postgres. Maybe https://github.com/ankane/pgvector?

Also there's Pinecone (https://www.pinecone.io) which can sit alongside Postgres or any other data warehouse and ingest vector embeddings + metadata for vector search/retrieval.

gavinray · on Sept 30, 2021

ElasticSearch has "dense_vector" datatype and vector-specific functions.

  https://www.elastic.co/guide/en/elasticsearch/reference/current/dense-vector.html
  https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-script-score-query.html#vector-functions

ZomboDB integrates ElasticSearch as a PG extension, written in Rust:

  https://github.com/zombodb/zombodb

I dunno what exactly a "dense_vector" is, but if you can't use the native "tsvector" maybe you could use this?

anentropic · on Sept 30, 2021

I think a dense vector is the opposite of a sparse vector

i.e. in a dense vector every value in the vector is stored

whereas sparse vectors exist to save space when you have large vectors where most of the values are usually zero - they reconstruct the full vector by storing only the non-zero values, plus their indices

gavinray · on Sept 30, 2021

  "a dense vector is the opposite of a sparse vector"

I think there's another thing besides vectors that are a bit dense in the room here, eh? Yeah that makes sense hahaha -- thank you.

TOMDM · on Sept 30, 2021

ZomboDB has always seemed really interesting to me.

Elastics search, with PostgreSQL's everything else sounds like a perfect match, but I've never seen it used anywhere which is kind of spooky to me.

Do you know of any non-toy use cases I can look at?

gavinray · on Sept 30, 2021

You can ask Eric Ridge on the ZomboDB Discord, he's really nice:

https://discord.gg/hPb93Y9

Also probably one of the most knowledgeable people about both Postgres and Rust I've ever met.

He makes his living primarily from supporting enterprises using ZomboDB (I think), not sure how much he can say, but it's worth asking.

TOMDM · on Sept 30, 2021

You're not kidding, they were incredibly friendly and helpful.

Makes me wish I had a big deployment so I could justify sponsoring them.

I paid in cat pictures for now.

TOMDM · on Sept 30, 2021

I need a self managed solution, so I'm not sure Pinecone is feasible and I don't think pgvector scales well enough for my use-case (hundreds of millions of vectors).

So far I think I'm going to go with Milvus[1], ideally I'd just have a foreign data wrapper for Milvus or FAISS.

[1] https://github.com/milvus-io/milvus

garyclarke27 · on Sept 30, 2021

Fantastic piece of software. The only major missing feature that I can think of is Automatic Incremental Materialized View Updates. I'm hoping that this good work in progress makes it to v15 - https://yugonagata-pgsql.blogspot.com/2021/06/implementing-i...

Tostino · on Sept 30, 2021

This has been a major one i've wanted for a long time too, but sadly it's initial implementation will be too simple to allow me to migrate any of my use cases to use it.

johnthuss · on Sept 30, 2021

This looks like an amazing release! Here are my favorite features in order:

• Up to 2x speed up when using many DB connections • ANALYZE runs significantly faster. This should make PG version upgrades much easier. • Reduced index bloat. This has been improving in each of the last few major releases. • JSON subscript syntax, like column['key'] • date_bin function to group timestamps to an interval, like every 15 minutes. • VACUUM "emergency mode" to better prevent transaction ID wraparound

mypastself · on Sept 30, 2021

Somewhat related, but does anybody have suggestions for a quality PostgreSQL desktop GUI tool, akin to pgAdmin3? Not pgAdmin 4, whose usability is vastly inferior.

DBeaver is adequate, but not really built with Postgres in mind.

cmg · on Sept 30, 2021

I've been trying out Beekeeper Studio [https://www.beekeeperstudio.io/] recently, and like that it supports both MySQL and PostgreSQL (and others I don't use). The interface takes a little bit to get used to, but it's been pretty powerful for me.

Before that, or when I'm in a rush to just get something done as I adjust to Beekeeper, I use Postbird (an Electron app) [https://github.com/Paxa/postbird]

rathboma · on Sept 30, 2021

Beekeeper Studio maintainer here. It's 100% FOSS and cross platform (Windows, MacOS, Linux).

I love Postgres and started Beekeeper because I wanted 'Sequel Pro for PSQL on Linux'. We have a lot of users so seems like we're doing something right.

Very happy for new PSQL.

If you have questions, let me know!

cmg · on Sept 30, 2021

Thank you for your work on it! I used Sequel Pro for a long time for MySQL databases and occasionally use its successor Sequel Ace as well, but I'm moving my workflow to Beekeeper Studio as much as I can.

rathboma · on Sept 30, 2021

Glad to hear! Please file issues if anything is confusing or missing, it's a pretty friendly GitHub community.

e12e · on Sept 30, 2021

In addition to the excellent Beekeeper, you might want to check out https://dbeaver.io/.

And azure data studio now has some pg support: https://docs.microsoft.com/en-us/sql/azure-data-studio/exten...

Personally I've landed on just using pgcli.com (and pg_dump etc).

thornygreb · on Sept 30, 2021

I pay for jetbrains datagrip, worth every penny.

apocalyptic0n3 · on Sept 30, 2021

Seconded. DataGrip is terrific and supports every database type I have ever come into contact with. And it's all JDBC-based so you can add new connectors pretty easily (from within the app, no less. No fiddling with files necessary). I had to do that to do help on a proposal a few years ago for a project that had a Firebird database and Datagrip didn't natively support it.

turbocon · on Sept 30, 2021

Going to second this, however I will warn, at least in my experience it is a little bit different from most DB IDEs. I didn't like it at all first time I used it, then a friend told me to give it another try. I've never looked back, fantastic tool.

dpcx · on Sept 30, 2021

One of my coworkers uses datagrip. Needing to install mysql specific tooling so that they can take a full database dump is kind of frustrating. Many other tools can do it out of the box, why not datagrip?

illuminated · on Sept 30, 2021

PgModeler [0], cannot recommend it enough. You have to compile yourself the free version (it's open source - you pay if you want to directly download the precompiled binaries) and it's a bit of work if you want to include the query optimizer plugin, but there's documentation and GitHub issues with solutions already in place. Once you compile it for the first time and start using it, you'll keep doing it again with each new version. I haven't found a better Postgres tool yet.

edit: it's a cross-platform tool, supporting Linux, Mac and Windows (32/64)

[0]: https://pgmodeler.io/

0XAFFE · on Sept 30, 2021

I really like the diff function to generate migrations. Make changes to your model and generate a sql file that syncs your database to the model.

I_am_tiberius · on Sept 30, 2021

Especially cool that you can buy a license using Bitcoin! Do you know if the backup/restore functionality is reliable?

MrWiffles · on Sept 30, 2021

Commercial, but worth every Penny and then some in my opinion: https://tableplus.com/

It looks good, performs very well, and supports WAY more than just pg. It’s an indispensable part of my dev tooling. Bought it what last year, year before maybe? Very happy customer here, zero problems, all gravy.

bfelbo · on Sept 30, 2021

How does it compare to Postico for working with PG databases? TablePlus seems very similar to Postico, but with support for other DBs, which we don’t need.

cpursley · on Sept 30, 2021

Tableplus is fantastic software. I have it open all day every day.

paozac · on Sept 30, 2021

I like Postico, but it is mac-only and not free.

mypastself · on Sept 30, 2021

Ah, I should have clarified that I’m using Linux and Windows.

joshcrews · on Sept 30, 2021

Postico is my favorite desktop software of any kind

pestkranker · on Sept 30, 2021

Navicat for PostgreSQL (https://www.navicat.com/en/products/navicat-for-postgresql)

nicoburns · on Sept 30, 2021

I second Postico on Mac. I usually recommend TablePlus for other platforms.

bfelbo · on Sept 30, 2021

Interesting. I’m currently using Postico on Mac, but considering TablePlus. Why do you find Postico superior?

rancor · on Sept 30, 2021

I've been using SQL Workbench/J [https://www.sql-workbench.eu/] for quite a while now. Uses JDBC so it'll connect to anything, good SQL formatter, builtin scripting commands, support for XLS import/export, headless mode, and most importantly fast even when loading massive result sets.

mixmastamyk · on Sept 30, 2021

Happy with dbeaver, what are its shortcomings re pg?

mypastself · on Sept 30, 2021

I guess I’m still getting used to it, but it always takes a few extra steps to open things like views and functions, autocomplete needs configuring to work properly, and a couple of other features I’m blanking on at the moment. It’s all small stuff, but it can add up.

mixmastamyk · on Sept 30, 2021

Yes, they are in separate folders, but don't think it has to do with PG per se. My autocomplete works well, don't think I did anything special. Maybe added a schema to the "search path?" May have been pgcli (which I also recommend).

mypastself · on Oct 1, 2021

Yeah, ultimately it’s the best alternative I’ve used so far, and it also supports other languages, so I’m likely to continue using it.