We currently use MongoDB and while Postgres is attractive for so many reasons, e...

TedShiller · on Sept 30, 2021

The problem with MongoDB though is that you're on MongoDB

kbenson · on Sept 30, 2021

As someone that mostly shared that opinion for the last decade or more, I recently set up a cluster for work, and everything seems much more production level quality than I remember or what I assumed it was going to be like. I'm not the one using it for queries every day, but I did do a bunch of testing for replication and failed nodes to confirm that I understood (and could rely) on the claims of robustness, and it seemed to be stable and with good documentation of what to expect in different scenarios and how to configure it (which is not what I experienced doing the same testing back in 2010-2011).

All in all, my impression of MongoDB now is that they're one of those "fake it till you make it" success stories, where they leveraged their popularity into enough momentum to fix most their major problems.

outworlder · on Sept 30, 2021

One thing that turned me away from MongoDB was their utter lack of care for your data integrity that they displayed for years. Some of those instances were even documented. Then there were some bad defaults - some could _also_ cause data loss.

For any component that's viewed as a database (as opposed to, say, cache), data integrity is one of the most important metrics (if not THE most).

In contrast, PostgreSQL data loss bugs are rare - and are treated extremely seriously. Defaults are sane and won't lose data. It's one of the few databases I'm pretty confident that data will be there even if you yank a server power cord mid writes.

Has MongoDB improved? Yes, leaps and bounds(seems to still fail Jepsen tests though). But I can't help but feel that it should have been released as a beta product, instead of claiming it was production ready. It wasn't. Maybe it is now. I'd still evaluate other alternatives before considering it.

That said, one thing that always amuses me is how MongoDB gets mentioned in the same context as PostgreSQL. If PostgreSQL would meet your needs, it's unlikely that MongoDB would. And vice-versa(but maybe something else like Cassandra would).

btown · on Sept 30, 2021

Postgres with tables that are just an ID and a JSONB column nowadays give you practically everything you'd want out of MongoDB.

You can add deep and customized indices as desired, you can easily shard with Citus, and if you want to live without transactions you'll see equally good if not better performance - with the option to add ACID whenever you want. The developer experience argument, where the ->> operator was more confusing than brackets, is now moot.

As a former MongoDB user, there were good synergies between MongoDB and Meteor back in the day, and I loved that tech, but between Materialize and Supabase, you have vastly more options for realtime systems in the Postgres ecosystem.

TedShiller · on Oct 1, 2021

Although MongoDB claims in an undated article entitled "MongoDB and Jepsen"[65] that their database passed Distributed Systems Safety Research company Jepsen's tests, which it called “the industry’s toughest data safety, correctness, and consistency Tests”, Jepsen published an article in May 2020 stating that MongoDB 3.6.4 had in fact failed their tests, and that the newer MongoDB 4.2.6 has more problems including “retrocausal transactions” where a transaction reverses order so that a read can see the result of a future write.[66][67] Jepsen noted in their report that MongoDB omitted any mention of these findings on MongoDB's "MongoDB and Jepsen" page.

from https://en.wikipedia.org/wiki/MongoDB#Bug_reports_and_critic...

threeseed · on Sept 30, 2021

Those defaults were changed a decade ago and were never an issue if you used a driver eg. Python.

And the Jepsen tests are part of the core test suite so do you some evidence they are still failing.

It’s so ridiculous and pointless to be rehashing the same issues a decade later.

Actually more a testament to the company that it’s still hugely successful and depended on by some very large applications.

dralley · on Sept 30, 2021

Perhaps, but mongodb was responsible for something I have bookmarked as "the worst line of code ever".

Which decided whether or not to log connection warnings based on Math.random()

https://github.com/mongodb/mongo-java-driver/blob/1d2e6faa80...

threeseed · on Sept 30, 2021

a) This is a line of code from 2013 and was fixed weeks after.

b) Based on the JIRA [1] it was designed to only log 10% of subsequent failures where there is no connection to prevent log flooding. You would still get the initial failure message.

Pretty reasonable technique and hardly the worst code ever.

[1] https://jira.mongodb.org/browse/JAVA-836

__turbobrew__ · on Sept 30, 2021

https://github.com/mongodb/mongo-java-driver/commit/d51b3648...

AtlasBarfed · on Sept 30, 2021

Jepsen test suite completely tore them a new one. I don't trust any allegedly distributed database that gets excoriated that badly by Aphyr.

https://jepsen.io/analyses/mongodb-4.2.6

That's just a bit more than a year ago. Come on.

MongoDB is like Mysqldb. I am so so so tired of hearing "that's been fixed, it's great now", doing a paper-thin dive into things, and seeing there are massive problems still.

I used MongoDB with Spring Data, it is impressively seamless.

It's just that there are way too many people who have sold snake oil for a decade-plus now, and I don't trust what they say anymore, and won't for a long long time.

TedShiller · on Oct 1, 2021

Even worse, MongoDB lied about having fixed these bugs.

https://en.wikipedia.org/wiki/MongoDB#Bug_reports_and_critic...

kbenson · on Oct 1, 2021

Let's be clear, I definitely don't think it's great. It's just that my immediate response prior to six months ago was to laugh at the mere suggestion it be put into production.

The only reason it actually was put into production is because we had a vendor requirement on it (and why they thought it was sufficient, I'm not sure).

There's a difference between "not suitable for anything because it's so buggy and there's been so many problems over the years" and "not suitable as a replacement for a real RDBMS for important data". For the former, I think my opinion was possible a little harsh for the current state of it. For the latter, yeah, I'm not going to blindly trust it for billing data and processing yet, that's for sure.

AtlasBarfed · on Oct 1, 2021

So did you do app-level code to verify writes? Double checking, etc?

kbenson · on Oct 2, 2021

I wrote a few small test programs to run doing continuous inserts to the master, and tested shutting down, firewalling off, and killing the process of different members of the cluster and how it recovered and if data loss was experienced by comparing data sets.

It was sufficient for me to not feel like we were taking on undue risk by using it, and since our use case is not one where we're in major trouble if a problem does come about (restoring from daily backups should be sufficient) and we're not doing anything transactional, that's good enough. As I mentioned earlier, it was a vendor requirement, so we just wanted to make sure it wasn't something that was problematic enough to make us question the vendor's decision making.

dralley · on Sept 30, 2021

>All in all, my impression of MongoDB now is that they're one of those "fake it till you make it" success stories, where they leveraged their popularity into enough momentum to fix most their major problems.

The downside being that their reputation is now somewhat charred.

hodgesrm · on Sept 30, 2021

> All in all, my impression of MongoDB now is that they're one of those "fake it till you make it" success stories, where they leveraged their popularity into enough momentum to fix most their major problems.

That's not all bad. The same could be said of MySQL. Both DBMS prioritized ease of use over data integrity in the early going.

zozbot234 · on Sept 30, 2021

And yet PostgreSQL making the exact opposite choice has really paid off in the longer run. People used to dismiss it as simply a toy for academics to play with, and look where the project is today. It can easily surpass most NoSQL databases on their home turf.

hodgesrm · on Oct 2, 2021

To be fair PostgreSQL 15 years ago also had a lot of problems storing data reliability. Some of them manifested as performance issues. I also heard a fair number of war stories about corruption with "large" databases (e.g., 1TB+). PG replication lagged MySQL for many years as well. These seem to be non-issues today.

At this point there's effectively no difference in the durability of data stored in MySQL or PostgreSQL, so it's hard to argue that one or the other made a better choice. They just got there by different paths.

In fact, PostgreSQL is winning back share in part because of licensing. GPLv2 is limiting for a lot of applications, and there continue to be concerns about Oracle ownership. It's also absorbed a lot of features from other databases like JSON support. That's not special to PostgreSQL though. It's been a trend since the beginning for SQL RDBMS and explains why they have stayed on top of the OLTP market for decades.

salil999 · on Sept 30, 2021

And how exactly is that a problem?

tpxl · on Sept 30, 2021

Some things seem to have changed from 2018, but MongoDB was by far the worst database I ever had the displeasure of using (and Amazon DocumentDB was even worse).

https://jepsen.io/analyses/mongodb-3-6-4

https://jepsen.io/analyses/mongodb-4.2.6

jd_mongodb · on Sept 30, 2021

Posting old Jepssen analyses is like pointing at old bug reports. Everytime Jepsen finds a bug we fix it lickety-split. I know it's not cool to focus on that fact, but it is a fact. The Jepsen tests are part of the MongoDB test suite so when we fix those problems they stay fixed.

I would love to hear your personal experience of MongoDB as opposed to reposting old Jepsen reports. Perhaps there is something that we can address in 5.1 that is still a problem?

(I work in developer relations at MongoDB)

jenny91 · on Sept 30, 2021

The latest "old Jepsen report" is barely a year old. It's not like digging up dirt from years ago.

It also seems like there was quite a lot wrong even a year ago, quoting from there:

> Roughly 10% of transactions exhibited anomalies during normal operation, without faults.

It's just not a very reassuring response to say "when someone goes to dig a bit and finds a lot of show-stopping bugs, we address those specific bugs quickly".

To me it sounds like the architecture and care just isn't there for a robust data storage layer?

jiggawatts · on Oct 1, 2021

Something that was drilled into me decades ago is that there is no such thing as fixing multi-threaded (or distributed) code via debugging or patching it "until it works".

You either mathematically prove that it is correct, or it is wrong for certain.

This sounds like an oddly strong statement to say, but the guy who wrote the textbook that contained that statement went on to dig up trivial looking examples from other textbooks that were subtly wrong. His more qualified statement is that if a professor writing simplified cases in textbooks can't get it right, then the overworked developer under time pressure writing something very complex has effectively zero chance.

The MongoDB guys just don't understand this. They're convinced that if they plug just one more hole in the wire mesh, then it'll be good enough for their submarine.

PS: The professor I was referring to is Doug Lea, who wrote the "EDU.oswego.cs.dl.util.concurrent" library for Java. This was then used as the basis for the official "java.util.concurrent".

bmcahren · on Oct 1, 2021

If you use MongoDB as a document store, arguably it's core functionality, you're not exposed to any of the shortcomings Jepsen rightly identified and exploited weaknesses in.

Transactions are new to MongoDB and they are not necessary for most. Structure your data model so you only perform single-document atomic transactions ($inc, $push, $pull) rather than making use of multi-document ACID transactions. It's possible, we're doing it for our ERP.

Sharding is something we've intentionally avoided opting for application-layer regional clusters. We specifically were avoiding other complexities related to shards that are not a concern for replica sets. Durability and maximum recovery time during emergency maintenance caused us to avoid them.

sigmonsays · on Sept 30, 2021

where is the latest jepsen test results published?

mayankkaizen · on Sept 30, 2021

That was sarcasm. But yeah, you can search for MongoDB and you'll come across many many posts criticizing it.

It can be said Mongodb is hated ad much as Postgres is loved.

Personally I have no opinion about mongodb.

The_Colonel · on Sept 30, 2021

Yes, and most of these love/hate memes are blowned out of proportion by people who don't actually have any real expertise in those technologies, but just parrot whatever they've read in some memes.

staticassertion · on Sept 30, 2021

You're exactly correct. Tons of "XYZ is bad" because of some meme that they don't even understand or have context on that hasn't been relevant for years.

I have no idea if MongoDB is good or bad at this point, but the comments of "haha it's mongo" are completely devoid of meaningful content and should be flagged.

operator9A · on Sept 30, 2021

I was part of a team that operated a large Mongo cluster for most of the last decade. I would not have advised anyone to use Mongo as their durable source of truth database then, and I still don't think it's advisable to do so now. On numerous occasions, Mongo demonstrated the consequences of poor engineering judgment and an addled approach to logic in critical components responsible for data integrity. In addition, Mongo internalized many poor patterns with respect to performance and change management. Mongo did not, and does not provide the data integrity or performance guarantees that other databases internalize by design (the WiredTiger transition helped, but did not cure many of the issues).

PostgreSQL introduced JSONB GIN index support sometime around 2015, making Postgres a better fit for most JSON-based applications than Mongo.

staticassertion · on Sept 30, 2021

My issue isn't with people not liking Mongo. It's with contentless meme posts. Your post has real information that adds value to the conversation, and I appreciate that you took the time to write it out.

cztomsik · on Oct 1, 2021

Because it's (or at least it definitely WAS) true.

There are valid use-cases for mongo but for vast majority of things, you're better to start with postgres. And I say that as an early adopter - I really wanted mongo to succeed but it just failed all of my expectations. All of them.

BTW: this is post about postgres.

threeseed · on Sept 30, 2021

You can find posts criticising every database.

Most of the ones for MongoDB are from a decade ago and not at all relevant today.

TedShiller · on Oct 1, 2021

It's easier to ask how that is NOT a problem, because that list will be much, much shorter.

outworlder · on Sept 30, 2021

> Edit: Possibly misinformed but the last deep dive we did indicated there was not a way to use logical replication for seamless upgrades. Will have to research.

It is possible since PG10

https://severalnines.com/database-blog/how-upgrade-postgresq...

bmcahren · on Oct 1, 2021

We seem to have been misguided by all of the Amazon RDS and Aurora documentation. It seems Amazon prefers to implement postgres logical replication through their database migration service. All upgrades are typically done through pg_upgrade which does require downtime.

Interesting. I can't wait to see how PG12 influences future offerings from the cloud providers for more seamless major version upgrades.

ggregoire · on Sept 30, 2021

MongoDB and Postgres are like apples and oranges tho.

I'm not gonna choose MongoDB if I need a relational model… even if it offers zero downtime upgrades out-of-the-box.

bmcahren · on Oct 1, 2021

You might choose Postgres with JSON as an alternative to MongoDB though. There are plenty of people pushing the limits of MongoDB who are researching it if not just for access to a larger pool of DBAs who can work for them.

zozbot234 · on Sept 30, 2021

Logical replication across major releases for seamless upgrades has been supported and documented since pgSQL 10.