Mango seems interesting, thanks for that.
I got thrown into a tight spot at work, where I'm building a lease calculator for the accountants, I need to build the whole thing as a self-contained application, due on Monday.
Long story is that it wasn't as easy as I thought, and I ended up having to replicate Journal entries, a trial balance and financial statements. Even though Mongo doesn't have transactions, I found a good way to make my journals balance. I'm using aggregating framework to post journals to TB and FS.
The last part I was worried about is the self-contained bit, so I'm going to give CouchDB a try, and use Mango to prevent redoing everything
No mention of Mongo "/dev/null as a service" DB's infamous issues with data loss?
Apparently their new replication protocol isn't as utterly broken as the old one, but my understanding is that data recovery after server crashes is still a problem.
I ran into a doozy in our production Mongo 3.2 replica set just this week, where initial sync completely failed because the oplog was growing too fast. It seems this class of errors was actually fixed ( https://docs.mongodb.com/manual/core/replica-set-sync/#initi... ) in 3.4, which also fixed a number of the data reliability issues you mentioned (see https://jepsen.io/analyses/mongodb-3-4-0-rc3 ). But since we hadn't upgraded, the only way to fix things was to throttle back our write-heavy background tasks during a period of low usage - definitely not the kind of thing you want to have to explain to users.
At the end of the day (see my other comment in this thread), I'm bullish about the developer experience of Mongo/Meteor, and I think that 363 days a year of significantly increased developer productivity is worth the 1 day of devops hell that might ensue from that stack choice, and 1 day of realizing that your performance problems all stem from Mongo query performance being much more reliant on manually creating indices on fields vs. an unindexed Postgres table of the same size. Sigh. On the reliability end, it helps that our (largely append-only) data model is such that inconsistencies are possible to clear up manually if needed. But on those couple days a year, I'm definitely not a Mongo fan.
I've seen this 'increased productivity' argument for Mongo over and over but I just don't see it in practice.
I've dealt with many apps and teams that use Mongo, and the amount of time spent tracking annoying bugs because of a lack of schema, writing migration scripts and the abysmal performance of the database leads me to believe that the only reason Mongo devs think they are more productive is because they can get 3 lines of code to make a new collection + record in the database in 5 seconds, and ignore the 5 months of time they spent down the line.
Developer productivity vs devops hell is a compelling argument for MongoDB for fast iteration when building out. I'm fortunate to deal with relatively low scale with my Mongo deployment, but I'm hoping by the time the seams show in Mongo for our needs, we'll be large enough to abide the long road to a hardened and reliable data infrastructure.
MongoDB 3.4 seems like it has a lot of great stuff and I hope to move to it soon.
> Mobile support: CouchDB stands out, in that it can run on an Android or iOS mobile device. In addition to being mobile, the database can also synchronize with a remote master database, allowing the data to be shared easily between mobile devices and servers.
Meteor actually provides exactly this for MongoDB; it has a "minimongo" package in the browser that supports Mongo's query language, running it synchronously against an in-memory copy of the collection [0]. And with Meteor, you can specify "subscriptions" declaratively that enable bidirectional synchronization while their owner components are in scope.
Mongo certainly has some reliability issues (see other comments here) but I've yet to find a full-stack system so painless to develop in, especially if you need realtime support. With things like ToroDB Stampede [1] and a general approach of "write all your code in React with Meteor dependencies factored out into containers," there's a clear migration path towards the relational-based separate-backend-frontend world when you need to go there.
Plus the CouchDB ecosystem has Couchbase for iOS/Android (with their own tradeoffs versus mainline CouchDB) and PouchDB which is a JS version of CouchDB that runs in the browser on just about any platform and can store directly in IndexedDB.
> Snapshots: Any changes to a document occur as a revision and appends the information to the file. This means you can grab a “snapshot” of the file and copy it to another location even while the database is running without having issues with corruption.
This is the main feature I sell when pushing CouchDB.
Use it to project events and you'll see what I mean.
I want to make an obvious point, but I know it confused me quite a bit for a while, so just in case it helps anyone else: There's CouchDB and then there's Couchbase which is similarly named but completely different.
I recommend anyone shopping for databases with easy master/master replication with eventual consistency and no single points of failure to consider Couchbase as well. It's not the same will have it's own set of pros and cons.
> MongoDB is schema-free, allowing you to create documents without having to first create the structure for that document. At the same time, it still has many of the features of a relational database, including strong consistency and an expressive query language.
The author clearly has a different definition of "strong consistency" than most. I don't see how any claims of consistency (in a data usage, not CAP sense) can be made of a database that can't properly store a number as a number or even guarantee that it's a number at all.
Also, does anyone actually like the Mongo query language? It was cute when I first saw it but I pity anyone trying to do anything complicated by manually writing those JSON strings.
MongoDB absolutely stores numbers as numbers. Always has.
And yes in a schemaless database you are required to manage the schema in your application layer as opposed to within the database. If we wanted a database with a rigid schema we would just use a SQL database.
Schemaless by definition means this is not guaranteed. The same property name can hold different types of data from one object to the next. Several other key-value/wide-column have the same exact setup.
Why would it guarantee that ? It's a schema less database. It's not supposed to do that.
And my point is that if you query a single document and the field is a number then it will be returned as a number i.e. it physically stores and understands numbers.
Elasticsearch is not schemaless. It typically requires you to define a mapping (schema) upfront. You can set it to automatically infer the mapping based on the available data but this does not make it truly schemaless.
MongoDB allows a free form schema for every document. You're again comparing apples and oranges.
Not having an enforced scheme means that there is no guarantee a given key exists or that if it does exist, the data type matches what you'd expect it to be. One errant piece of code could insert a bad record that could break some other of piece of code.
PouchDB is one of CouchDB's killer features. During development you can effortlessly switch between storing things locally, in the browser, and then sync with a remote database.
The reason I rejected PG's JSON store, was it's inability to update fields inside of a JSON doc, without replacing the whole document. In your case, does this constraint push you to use more types of smaller documents, or do you just read the whole doc, update it in the application, and then write it back to the DB?
Have you ever had an issue with conflicts, where multiple instances of the app read, modify and write different things to the same document at the same time?
> Have you ever had an issue with conflicts, where multiple instances of the app read, modify and write different things to the same document at the same time?
This problem is what pessimistic (select... for update) or optimistic (using a version column) locking is for. If you don't want any race conditions to sneak into your code, as a rule you should probably be using one or the other regardless of whether or not you use postgresql JSON.
Does PG's Jason support multi-master sync? Native db level support for that feature simplifies a lot of my use cases. It'd be interesting to see a SQLite/WebSQL <-> Postgres multi-master syncing system. It'd be the equivalent of CouchDB <-> PouchDB. Maybe even using the same CouchDB protocol! :-)
CouchDB and Couchbase both support only whole document updates. So you get document conflicts in those document stores as well, meaning your app needs to understand and handle 409's. But those conflicts are relatively easy to handle in most cases, at the cost of a new round trip. Mostly it's a matter of downloading the new document state and merging your change to it and re-post. If you're using Redux/Vuex/Event Sourcing this becomes trivial to support. Another way to handle it is to split a single large document into smaller pieces and write a map/reduce view that returns a composite document. That should be possible in Postgres as well with a prepared statement.
Mongo really isn't about storing JSON. At least that shouldn't be it's selling point. The selling point that is since it's basically just a glorified key value store it's very replicable and distributable. Postgres is very much not either of those things. The JSON is nice for a few things and I use it sometimes but Postgres is and will always be extremely difficult to cluster and replicate.
Until you need to reseed the original master, want to do quick rolling restarts or want any automation in automatic failover. PG has a long way to go.
A
Logical replication by default doesn't even handle DDL statements so something as simple as adding a new column requires extra process. Postgres is probably the weakest of all relational databases when it comes to scalability, both vertical and horizontal.
I don't agrre with the article saying MongoDB is better at reading.
From my experience MongoDB is fast, but CouchDB really shines when you have a read heavy application.
Also the article didn't mention Mango queries, which is a blessing (fast indexing as erlang views), but in my opinion this feature can be a lot better with stale results, for instance.
Partly because MySQL requires a full table lock and full table copy to implement those operations. Much of the nosql movement originated from implementing schema changes on production MySQL databases. In many if not most cases, NoSQL is really NoMySQL.
Many other engines, PostgreSQL for example, can add a new column without constraints as a nearly instant metadata change only. Data type changes that do not require validation (expanding a VARCHAR vs CHAR to INT) are also rapid.
I find MongoDB easier to deploy and maintain because it's easier to build and monitor. CouchDB with Erlang not so much (for our environment, it's totally alien).
EDIT: I made a statement about our preference, for our environment. I did not make broad claims about these DBs for other people. How am I upsetting HN?
It fails to mention that CouchDB now has Mango, which is a MongoDB-compatible query language.
Since 2.0, CouchDB also has Dynamo-like clustering thanks to Cloudant's open sourcing of the BigCouch code.
I wonder if the MongoDB side of the comparison is more up-to-date, or equally stale.