I think a lot of the "hating" is a side effect of MongoDB being consistently oversold in terms of its capabilities and architecture. Many people who are not experts on databases discover this the hard way later. If the claims about it were qualified a little better and the limitations acknowledged more openly by its proponents it would help to mitigate this outcome.
I am indifferent to MongoDB but I do caution people that the internals and implementation are quite primitive for a database engine. It is the sort of database engine that a programmer that knows little about database engines would build. Over the years that has repeatedly manifested as problems no one should ever expect a database engine to have if properly designed. There is a reasonable point where you should not have to read the documentation to see if your database has a basic design flaw; there is an assumption of competent architecture for its nominal use cases.
MongoDB has improved in this regard over time and many other popular NoSQL (and SQL) databases have similar problems. Users are being asked to cut the database some slack for less than competent engineering choices but users just want a database to work. Being "simple" isn't enough.
I like the concept of MongoDB. My main problem (which perhaps is not totally clear from my post, linked in the article) is that there are certain historical conventions about what a database should be. In particular, a database will not have the following default behavior:
- return from a write call silently, when the data wasn't written and will not be.
If you are going to break conventions on a multi-decade tradition, there should be warnings everywhere. Not just the Downloads page, but during installation (which is harder to ignore, as the Download page is not seen by people like me who use package managers).
The 32-bit issue is worse. There's no excuse for not warning people that the server they just started is a crippled data store limited to 2GB.
Again, I like MongoDB. Given my experience with software development, those issues are telltale signs of a product in its infancy. Hence the "10 years" title of my post, which of course is hyperbole. But it will take time for Mongo to be user-friendly enough.
By the way, those who say that a piece of software has no need to be as user-friendly as possible, and you should thoroughly read the manual before even experimenting with it simply live in a different world. The more of a head start the software can give you, the better. The fewer the gotchas, the better. That was the philosophy of IndexTank, and it worked.
I haven't got a 32 bit version of Mongo, but people pointed out in the other thread that 32-bit version of mongod displays a little warning every time you start it saying it can handle only 2GB of data.
Can you confirm this (because I don't have the 32-bit version installed)? It still stinks, especially because it "silently failed after hitting the threshold", but I personally would feel better about 10gen if this little story is true (i.e. they warn you about it not only in downloads page, but every time you run mongod).
This is true, in the Ubuntu packages this is printed to the default log at /var/log/mongodb/mongodb.log. It is also abundantly clear from the documentation. I struggle to understand how one could deploy a new datastore in production without reading the "getting started" level of documentation or looking in the log at some point.
The 2GB 32-bit limit of MongoDB seems like a complete non-issue to me.
Wed Sep 19 17:29:21 [initandlisten] MongoDB starting : pid=3765 port=27017 dbpath=/var/lib/mongodb 32-bit host=deepthought
Wed Sep 19 17:29:21 [initandlisten]
Wed Sep 19 17:29:21 [initandlisten] ** NOTE: when using MongoDB 32 bit, you are limited to about 2 gigabytes of data
Wed Sep 19 17:29:21 [initandlisten] ** see http://blog.mongodb.org/post/137788967/32-bit-limitations
Wed Sep 19 17:29:21 [initandlisten] ** with --journal, the limit is lower
Wed Sep 19 17:29:21 [initandlisten]
Wed Sep 19 17:29:21 [initandlisten] db version v2.2.0, pdfile version 4.5
Wed Sep 19 17:29:21 [initandlisten] git version: f5e83eae9cfbec7fb7a071321928f00d1b0c5207
Wed Sep 19 17:29:21 [initandlisten] build info: Linux domU-12-31-39-01-70-B4 2.6.21.7-2.fc8xen #1 SMP Fri Feb 15 12:39:36 EST 2008 i686 BOOST_LIB_VERSION=1_49
Wed Sep 19 17:29:21 [initandlisten] options: { config: "/etc/ mongodb.conf", dbpath: "/var/lib/mongodb", journal: "true", logappend: "true", logpath: "/var/log/mongodb/mongodb.log" }
Wed Sep 19 17:29:21 [initandlisten] journal dir=/var/lib/mongodb/journal
Wed Sep 19 17:29:21 [initandlisten] recover : no journal files present, no recovery needed
Wed Sep 19 17:29:21 [initandlisten] waiting for connections on port 27017
Wed Sep 19 17:29:21 [websvr] admin web console waiting for connections on port 28017
Normally people look at logs when things go wrong. Do you look at the log anytime something starts successfully and seems to be working? You must spend lots of time looking at logs.
This is a message that MUST be displayed on the console when you install the server for the first time. It's too important.
Also, you learn a tool before going into production. I never went into "production" with mongodb. All I did was experiment with a toy project. I never needed to look at the log.
How do you know that you're going to rely on it if it's new?
I install something new. I kick the tires. It takes a long time until I decide I'm going to rely on it. I'll look at the log at some point, but not necessarily the first time I install something. There are better things to do at that point.
Maybe you should make constructive comments instead of making assumptions about what people learn or don't (or what they know or don't, or what has worked for them or hasn't).
There are some really disrespectful people in this community.
> There are some really disrespectful people in this community.
Your article begun and ended with sarcastic remarks about the product. Realistically, what kind of response did you expect? The issues described in your article are very real, and very worthy of repeated discussion, but the article itself eschews discussion in favor of pontification, sarcasm and flamebait.
I didn't say you didn't learn anything. But you seem to be making excuses for why you shouldn't know some very basic things about MongoDB. It uses a memory-mapped file. How can that be larger than 2GB in 32-bit system?
It has async writes...this is pretty well documented by 10gen and is also something noted by a lot of tutorials, blog articles etc. You should have known something this basic about a database so important to your business.
None of that would bother me in the slightest if you were not still here defending such basic mistakes and blaming them on 10gen.
Come on, database (or any other system) that just silently fails to add new data?
2GB limit is clearly mentioned in the logs, that's fine, but anyone that sees this would expect that DB would start "screaming" loudly wherever it can (logs, response to the user on EVERY communication with the server, during any select,insert and others) that it reached this limit.
Async writes are async, so there cannot be a response to the user on failure unless you explicitly check(getLastError). This is very well documented behavior.
So important to my business? What business are you talking about? You are just confused.
All this time I've been talking about a toy app that I wrote to kick MongoDB's tires. If you don't understand that, there is no point in having a conversation.
I never had or have any plans to use MongoDB for any business.
You have been a "technology manager" at Bank of America in North Carolina since 1998. Do you try new technologies such as MongoDB professionally? What are your qualifications to make such a vague statement? More importantly, what is the lesson to be learned?
I made them in other replies, sorry for being so opaque.
The fact that a 32-bit memory-mapped file is limited to 2GB is basic comp-sci. It is also reported by MongoDB everytime it starts. Furthermore it is noted in the 10gen docs in several places.
Async writes...also extremely well documented.
So the failure here is two things: failure to properly research and understand a technology critical to their business. And then failure to take personal responsibility for the first failure, and instead to blame the vendor for not building a tradition DBMS despite the documentation about the stark differences.
I use MongoDB in a side-project if that is important to understand.
Then this whole post sounds like a non issue. You couldn't be bothered to even look at the log file for your app because it wasn't important enough, then you shouldn't be irate when it doesn't work as expected.
Every time I start something I tail -f the relevant log to make sure it starts correctly. Think of it as isolating a possible failure point as a sanity check.
The issue is not the limit; it's the silent failure. 32-bit Oracle doesn't do that, it tells you it can't extend the tablespace by raising an exception.
I'm not trying to defend Mongo, but the reason it doesn't tell you is because you didn't ask for it. If you care about the data (i.e. it's not a log or something that's not very important), you ought to always use "getLastError" to see if your data was actually stored or not (some drivers, like mongoose (for Node.js), just let you specify a simple flag ("safe:true") that does this automatically).
A shitty default, no doubt. But it can be changed easily. And "most" drivers offer that. And you usually connect to MongoDB using a driver.
> If you care about the data (i.e. it's not a log or something that's not very important), you ought to always use "getLastError"
Wait they are promoting this as a database. It is right there in black on yellow "database". I don't know about you but I suspect most people expect databases to try their hardest to protect data. That means also having safe default. As in when I install it and put data in it, by default it should try hardest to make sure that data doesn't get corrupted. If it doesn't, it doesn't deserve to call itself a database.
> A shitty default, no doubt.
This is not a "oopsie" this is a deliberate lie and misinformation in order to produce fast benchmarks.
> Can you confirm this (because I don't have the 32-bit version installed)? It still stinks, especially because it "silently failed after hitting the threshold", but I personally would feel better about 10gen if this little story is true (i.e. they warn you about it not only in downloads page, but every time you run mongod).
Just as a data point, I installed MongoDB on 32-bit Ubuntu 12.04 from the Ubuntu repo. At no point during install or service startup does it say anything.
Just installed it on ubuntu 32-bit (apt-get install mongodb). I now have a 32-bit mongodb process running. Minus all of the noise from apt about setting up xulrunner and the 101(!!!) other dependent packages, here's the entirety of the startup messages I see:
Any app that has a warning message will fail to warn you if it's output is rerouted to a log file. This sounds more like an issue with the package maintainers not understanding the limitation and thus not putting in a warning message post install.
Yes that is true for me, at least using the 10g binaries for Linux (not the Ubuntu package). You get this warning everytime you start the engine. You'd probably have to look at your logs to see it. It is also referenced in the installation documentation.
Look guys, MongoDB uses a memory-mapped file. How can it be larger than 2GB on a 32-bit system? There should not need to be "warnings everywhere".
I think expecting end-users to know or care about the implementation detail of a memory-mapped file is even sillier than all the other silly expectations floating around this thread.
Maybe stupid question, but isn't there mmap64 and mmap2 with which you can map larger files? You are limited to 2 gigs at a time, but that shouldn't be a complete show stopper.
When comparing e.g. 32bit firefox and 64bit one, the first runs much smoother on my system. So if I didn't have 8GB I would install 32bit system. (not to mention that all binaries take less space)
Both are clearly stated in the documentation. Anyone that puts any DB into production without reading all the documentation beforehand is foolish and has no room to complain. If there are only "experimenting" then at what point will they read the documentation?
I would guess almost every DBA and sysamdin in the world is foolish. I doubt almost nobody has read the entire documentation for PostgreSQL, Oracle or MySQL.
Still they manage to get those databases up an running without any data loss with just some generic knowledge about OS:es and hardware and reading choice parts of the documentation. Strange, huh?
If a default configuration of a _database_ product doesn't try its best to keep the data safe it shouldn't be just clearly stated in the documentation. It should be a flashing red banner on the front page and everytime its service process starts.
> In particular, a database will not have the following default behavior:
> - return from a write call silently, when the data wasn't written and will not be
There isn't some defined set of rules for how a database should operate. This attitude implies that an asynchronous database should never ever exist. If that's the case, how could I ever use a database for HTTP logging? I can't have every single HTTP request block on a database write, that's absurd. HTTP logging is impossible with MySQL or PostgreSQL for exactly this reason.
PostgreSQL allows you to select this behavior on a per-transaction basis using the synchronous_commit variable: the default (which what we are discussing here) is "don't return until the data hits disk", but you can set it as strict as "don't return until the data has not only hit a local disk but has been acknowledged by a standby slave" or as lax as "return immediately: sync my data when you get around to it, it isn't important"; (so, don't claim something is impossible with someone's tool without first looking into it deeply).
You're right, that's my fault. You can set synchronous_commit to off, which makes this possible. However, this is a database wide variable, so you can't use this database server for anything else, presumably, if you set synchronous_commit to off.
Again, no: it is per each and every individual transaction. (edit:) To be very clear, this means that a single HTTP log table in a single database could have some requests marked synchronous_commit (as they are to a resource that you charge for, and for which you need accurate logs), while for others it is not set (as you just want the fastest performance).
MySQL has INSERT DELAYED which is async. Of course, it's not the default - you have to specifically choose to call INSERT DELAYED when you know the data you're inserting isn't life-and-death.
INSERT DELAYED does not work for InnoDB tables and is now deprecated since MySQL 5.6. I am not a MySQL user though so I have no idea if there are any alternatives to INSERT DELAYED.
The reason for the deprecation seems to be the many gotchas with it combined with the doubtful performance gains. It seems to have been an ugly hack.
But even a async database should be able to write something to its log after it discarded the write request. From what I understand from the original article, this did not happen.
> This attitude implies that an asynchronous database should never ever exist.
Is a process that send data from socket > /dev/null a database as well then. Why not call that a database too?
> If that's the case, how could I ever use a database for HTTP logging?
In a normal database, you would possibly switch 'durable writes' and possibly 'time expiration' feature in the configuration file from the default OFF to ON.
I think a lot of the "hating" is a side effect of MongoDB being consistently oversold in terms of its capabilities
A large part of that is that 10gen are much better at marketing than they are at tech. For example, they like to pitch against the Oracle database, and for certain use cases, MongoDB is better than Oracle RDBMS. But if you were to take those cases to Oracle, they would say well don't use the database for that, use our other product Coherence (which was Tangosol before they bought them). And Coherence spanks MongoDB in every possible way. You could write a bestselling novel about it, 50 Shards Of Grey.
And if you hate Oracle, there are a bunch of free things that do what MongoDB does a hell of a lot better. Why waste time with its cheesy MapReduce when you could have ICE http://www.zeroc.com/overview.html for example?
"Oversold" is the key word here. And that's a marketing concept. To dig deeper, the problem has and always will be that MongoDB toes the line between a caching product and a database product, yet 10gen markets it simply as a database product.
MongoDB is flexible and fast, but practically speaking it's a hybrid product that defies simple categorization. That means it must be well understood to leverage it. Ultimately that may also be it's Achilles heel, but in the short term, it's extremely frustrating that 10gen can't embrace this fact. Instead they perpetuate marketing that causes the product to be perceived as flawed by their target audience.
Once you've discovered this, you can use it appropriately (either by overriding the default silent failure to use it as for durable persistence, or only using it for caching or as an eventually consistent store) to reap it's flexibility and speed.
> I think a lot of the "hating" is a side effect of MongoDB being consistently oversold in terms of its capabilities and architecture.
Agreed, though I think its also coupled with just a lack of understanding of what MongoDB does beyond just "its NoSQL!". So many people are used to working with RDBMS. They just seem to make incorrect assumptions about MongoDB based on what they thought were universal rules about databases.
Mongo's a tool. Its not right for all situations and it definitely has some maturing to go still, but its good to use in certain situations, provided you're properly informed about it.
I would only add a lack of thinking on the part of developers who jump on the NoSQL bandwagon for the buzzword without understanding the implications of not having a relational database.
I'm not a RoR fan but the ActiveRecord model is probably enough for 75% of people to get their projects started without having to dig too much.
I agree. In some regards it is "oversold", but I also think part of that is from a lot of the marketing efforts we've seen with newer databases like Mongo and even Riak.
I am not saying those are bad, but with more established DBs like MySQL and PostgreSQL, you never really saw the same kind of marketing efforts towards developers, startups, etc. It is kind of a newer concept.
...with more established DBs like MySQL and PostgreSQL, you never really saw the same kind of marketing efforts towards developers, startups, etc. It is kind of a newer concept.
That's funny because MySQL used to do exactly the same kind of marketing fifteen years ago: comparing itself to Oracle, "it's a thousand times faster" when it obviously didn't do one thousandth of what a RDBMS does.
History repeats... and has a strange humour sense.
I don't think that is correct. They never compared themselves to Oracle. I spent some time with Monty parts of the team and also tried working with their business side and they never went after the Oracle workloads.
Their huge popularity come with the LAMP stack and their ease of use. They went after this along with data warehousing.
I was going to say I remember MySQL doing pretty much the same thing back in the 3.x days. Ok, a different style (I don't remember them comparing themselves to Oracle, but I do remember comments like "who needs foreign keys anyway?").
Actually I think that MySQL is essentially the fore-runner of NoSQL. People who have cut their teeth using MySQL as a single-app persistence store or were trying to code to lowest common denominator in order to be portable now see benefits in ditching the idea of a shared database altogether because, well, they never used a shared database.
My problems were to do with data-loss and unrecoverable corruption. I didn't write a blog-rant, but maybe I should have.
I have since learned that I could have changed some configuration options to make MongoDB less likely to corrupt and/or lose data. I'm still wary though - the fact that the default configuration was prone to unrecoverable data loss suggests that any time I use MongoDB, I must carefully research the feature I'm using to make sure I don't do it in a way that causes data loss.
I believe dangerous configurations should never be the default, and dangerous features should be clearly labeled. The default method for writing data should not fail silently, for example. The fast, fire-and-forget write should be called something like "unchecked_write".
It's not just that the defaults are dangerous, but that if you change the default, you give up much of the performance benefit of choosing Mongo in the first place.
They did it more when it first came around. Redis wasn't in the same category. It was usually compared to CouchDB, another one I don't remember, and then traditional SQL DBs. Couch for example looked much worse in benchmarks because it tried a bit harder to take care of users' data and didn't just fling it over the fence. But it mostly lost out because OMG benchmarks!
> I didn't write a blog-rant, but maybe I should have.
Yes do. Everytime I tell people how Mongo can lose their data and silently corrupt it. They say "ha, but you never experienced it, how do you know". Well, apart from explaining the way write work in theory I have the "There is this one guy on the internet,... let me find his blog". But I suspect there is more than one guy.
This is a nice summary of many of the problems with MongoDB, and a good set of fumbled excuses as to why they might not actually be problems. Most of the answers boil down to "Well, yeah, but if you know how to [long complicated solution here], then it's not a problem," while simultaneously admitting that the problem is present and that the solution (if it exists) is extremely difficult and requires specialized knowledge.
All I can do is laugh and keep on using Postgres...
Interestingly I just wrote an article on dzone (published in their NoSQL zone ;-) ) about building encapsulated data models on PostgreSQL. I discuss a bunch of ways of doing this. I totally agree with the idea that one should encapsulate data in application, but that's what an RDBMS is for if you know how to use it.
My approaches to db interfaces have been greatly inspired by REST and SOAP. The thing is giving up on the RDBMS is usually the wrong choice. If using NoSQL, usually it is best as an adjunct to the traditional RDBMS (particularly for pre- and post- processing).
We've been using MongoDB in production for our main content store at Conversocial for over a year now. We've gone from 1.8 to 2.0 to 2.2 and we're happy. We have ~400gb of data, 220 million documents all on Amazon's SSD instances.
We have definitely had our moments where we've screamed that we hate Mongo and are going to rip it out. That's normally where we've overlooked a detail of how it works... and we've had this same experience with every database technology we use - including MySQL and Redis.
The day after, when we've cleared our heads... we're happy again. The same as the other technologies.
I think that with all technologies you're going to get bitten by some detail you didn't know about or had forgotten. The trick is to mitigate these disasters by thinking about your failure cases.
They're not particularly exciting. Example: we accidentally left detailed logging enabled on a secondary server and weren't monitoring the space left on the drive for logs. The disk got full and the secondary failed. Annoying that it failed but it was our fault for not monitoring that space and also leaving detailed logging on!
Well of course it's not exciting, but that's exactly the kind of examples I like hearing about because now I know I should always have detailed logging off.
Anyways, I'm sure a lot of us would appreciate it if you wrote about your experiences, regardless of how mundane they are.
+1 to share your experience. Kiip and other companies do that, and if you need to decide on a certain technology for your upcoming development, these are really exciting and far-from-boring posts.
At our company we have a much bigger MongoDB dataset in production. I can relate well to your comments, since it is the same stuff that I hear from our developers as they defend Mongo after yet another horrific situation. They have a tough time admitting that Mongo is the problem, and instead always find ways to say that it was their fault.
This is such a strange thing for me to observe, but I think that it has to do with the fact that MongoDB works so smoothly initially with a default install that it hooks you in, and then later when you have a large dataset and it stops working well, it's hard to understand how what was such an amazing technology can now fail so badly.
It's also a huge problem that once you run MongoDB at scale, you desperately need experts to fix things, but it's so hard to find any so-called experts who can help. 10gen did a great job of marketing to developers, but unfortunately they seemed to spit in the face of DBAs and Ops people a long time ago by proclaiming them to be unnecessary and archaic. Running any significant MongoDB database in production requires as much expertise as someone running a big MySQL instance, but there isn't a community of database lovers around MongoDB who you can hire.
The DBAs and operations people I know dislike or even hate MongoDB, often for valid reasons that 10gen should address such as the as-yet-unfixed 'write lock' issue, code instability and inaccurate/misleading documentation. 10gen has done a great job at evangelizing to developers and making features that developers love. Now that 10gen has so much money, I hope that they can now afford to start making MongoDB a database that Ops people and DBAs can love.
As a final note, this ServerDensity guy is clearly looking at the world through mongo-colored glasses. It makes sense I suppose since he is all-in with MongoDB for his company. But we were using his service up until three months ago, and had a lot of problems with intermittent performance issues that seemed to be database-related since the site was still working and only certain pages would take 30 seconds or so to load. It's possible that the problems were temporary and no longer exist, but it brings home my point. If MongoDB experts can't make their own services 100% reliable, what hope does a regular startup have of getting MongoDB to work well at scale.
A developer may feel comfortable making the decision to go with MongoDB, but if they are wrong it won't cost them their job and they won't need to pull their hair out dealing with Ops issues all the time. If a DBA or Ops guy is being hired to manage a company's datastores, I don't see MongoDB (even 2.2) being a contender. At this point there are simply other DBs available that can perform the same or nearly the same without all the fussiness. Developers may be unhappy since nothing yet is as easy to develop on, but they'll be happier in the end when stuff 'just works'.
This is an excellent anecdote and one that others should pay attention to, regardless of the technology.
Developers often have confirmation bias for their choices—it's a common human fault. We do it for purchases, life decisions, and software decisions alike. But it's important to watch out for it and be aware when you might be in the thick of your own bias.
A friend of mine (let's call him "me") used Adobe Flex for a relatively large computationally-expensive project, and advocated for it because of many small details and relatively quick learning curve, and ease of UI prototyping. After we got entrenched, we started running into many shortcomings which I (er, my friend) should have realized early on were due to the nature of the platform itself, but I continued to defend Flex because it was my decision and I had put so much time into making it work.
In the end, we realized it wasn't the right fit for this specific job and moved to a more capable platform with fewer issues (aka, "any other platform than Flex").
Before making big decisions, I try to remember that one mistake, and get out of my own way first to look at it from an outside perspective. I try to differentiate when I'm making decisions based on fact, or if I'm just trying to justify patching holes in the titanic. This "looking at the world through [insert technology]-colored glasses" idea is spot on, and everyone—not just those using or debating MongoDB—should be aware of.
Exactly. ..and while I can appreciate the frustrations and tales of woe, I read the #*%#k out of the docs as well as build a number of prototypes before I go all in. I also read all the top questions on StackOverflow on that tech, as well as any blogs that mention it.
Also: google "technology name" sucks to find out all the naysayers so you know what the downsides are from people in the field and make sure you are prepared to deal with them.
If you do the five minute quickstart guide, and read the feature list, and throw it into the mix, well...
Of course. But this isn't a question about what is or isn't in the documentation. Of course everything you need to know should be in the documentation, and you should RTFM.
No, it's about obfuscation vs clarity, and to an extent, about how good the overall design really is. Look at it as a measure of quality. If I have to read every word of the documentation to find the tiny part on page 18 where it tells about that one flag I need to start it up with in order to ensure that one feature works the way I want; versus a sensible default and clear documentation and even clear design where perhaps that flag isn't even required (think automated memory management that Just Works, versus a dozen command-line or config-file switches about memory buffer size and such).
When you run into issues, it's not a useful excuse to say "Well, you should have read the documentation, it was all there." That's like a shady credit card contract where the rate goes up 30% if you miss a payment. It's easy, even if you read it, to say "well I won't ever miss a payment," and even easier to miss that clause completely. Can you blame people for not reading the fine print? Sure, it was their responsibility. Will people still do it? Of course. Will people who read the contract fully still run into issues if they make one mistake? Probably.
And the point: is it a crappy contract? Yes. It has a crappy feature to begin with, and it's made worse by its obfuscation. This is almost unarguable: a credit card with a lower rate penalty is better. A piece of software with good defaults and sensible design is better.
The requirement for asinine and lengthy documentation isn't just a big warning sign that you should read it—it's a sign of poor design, or at least the word everyone's been using to describe Mongo: immature.
Good design includes the whole experience of using the software, and takes into account good integrated systems and human interaction. Bad design requires you to read the documentation extremely carefully. These are not hard and fast rules, but they're certainly warning signs. Really clear and obvious warning signs. And the overall point is that the well-documented issues that come up with Mongo are not just stupid people who don't read documentation—they are statistically relevant pieces of evidence pointing to some poor design decisions.
You should really think of this whenever you see a trend. You have a choice. You can blame individuals for "not reading the documentation"—or you can look at the systematic trend and statistically evaluate the problem. The former allows you to quickly dismiss issues on an emotional basis and make yourself feel better, while doing absolutely nothing to solve the issue. The latter lets you collect useful information and make real changes that have a real impact on the issue at hand. Your call.
> But unfortunately they seemed to spit in the face of DBAs and Ops people a long time ago by proclaiming them to be unnecessary and archaic.
WTF ? I have never heard 10gen say anything remotely like this. And it surely hasn't come across in their marketing. I mean seriously. Which developer thinks that in a production environment they are going to be the ones supporting the database ? Nobody.
> If MongoDB experts can't make their own services 100% reliable, what hope does a regular startup have of getting MongoDB to work well at scale.
The fact that you base your impression of MongoDB based on ZERO evidence just conjecture that the slowness of their site is database related says a lot about you too. There are many other reasons it could equally be: app server, network etc.
> If a DBA or Ops guy is being hired to manage a company's datastores, I don't see MongoDB (even 2.2) being a contender.
Have you even worked at an enterprise company before ? DBA/Ops aren't the ones in control. If the development team wants MongoDB installed and have business justification it gets installed.
> Developers may be unhappy since nothing yet is as easy to develop on, but they'll be happier in the end when stuff 'just works'.
But it does 'just work' that's the whole point. Developers aren't stupid and MongoDB is not the only database around. You just need to (god forbid) understand how the thing works.
This means that MongoDB works right out of the box, and you can dive right into developing your application, instead of spending a lot of time fine-tuning obscure database configurations.
There goes the indirect jab to the RDBMS DBAs wasting their time.
How on earth do you equate that statement with "we don't need a DBA when we get into production" ?
Do you need a DBA to get MySQL running ? No. Oracle ? No. SQL Server ? No. That's all it means. Normal people understand that there is a difference between getting something running and deploying it into production.
Point me a doc(or a tiny little red note link, like in the 32 bit installer download page) where 10gen says that you will need a dedicated MongoDB DBA to maintain production server. You see 10gen marketing machine have successfully created a perception that MongoDB is so much magical that you don't have to worry about data. New programmers with less knowledge in RDBMS are buying this argument. And most of the time they start into development just after reading basic tutorial. This the problem GP is trying to address.
> The fact that you base your impression of MongoDB based on ZERO evidence just conjecture that the slowness of their site is database related says a lot about you too. There are many other reasons it could equally be: app server, network etc.
Those other things are trivial to fix (e.g. add more app servers) so I too think that it's safe to assume it's probably the data store's fault.
None of what you said makes ANY sense. Have you ever actually scaled an app before ?
Adding more app servers increases the number of requests you can handle. It doesn't make slow apps or latency faster. Both of those are possible causes for why certain requests may be slower. It's not necessarily a database problem.
> Did you read the article?
Yes. I read the documentation before installing MongoDB so I haven't had any problems (so far).
There's no reason other than the data store that would make their site that slow. Yes I have experience in this area. Until they come out and tell me otherwise I'm gonna bet that the data store is at fault. I could be wrong but I'm willing to give you 5 to 1 odds against.
Could you elaborate on the things that MongoDB brings to the table versus a database like MySQL? If you assume that the entire dataset fits in memory for both MongoDB and MySQL, as well as SSD backing for both, what particular advantages does MongoDB have for you in production?
With 400GB of data I'd imagine you are going to be ripping your hair out at some point or another regardless of what database you use ... there is no way to completely avoid tech issues over the lifetime of a product/service.
Headline Problem: Too many people are ranting about MongoDB.
Mistake: Too few data points. Overly defensive of MongoDB.
Comments: You should probably learn more about Riak. The "From MongoDB to Riak" is light on details but isn't really hand-waving. The author is saying that having a masterless database results in less fretting. http://wiki.basho.com/Riak-Compared-to-MongoDB.html
Also, why two rage faces? Can't you get your point across without resorting to silly memes?
Now that Couchbase Server 2.0 is beta, it's a reliable alternative to MongoDB. Same JSON goodness, different set of user stories. (hint: less rants). Get the beta download here: http://www.couchbase.com/couchbase-server/beta
> Comments: The 32 bit limit is noted (perhaps it should be a warning) on the download page but the main problem was the author did not know when writes started to fail. MongoDB uses unsafe writes by default in the sense that from the driver, you do not know if the write has succeeded without a further call to getLastError. This is because one of the often cited use cases for MongoDB is fast writes, which is achieved by fire and forget queries.
I read the referred article and the HN discussion...but wasn't quite able to tell how much of the debate was focused on the reasonability of this default rather than "developers should read the docs, or accept catastrophic failure" attitude (which is not inherently wrong).
What immediately comes to mind is the Rails ActiveRecord update_attributes vulnerability...the default was to allow the updating of all specified attributes with the assumption that no competent developer would trust unsanitized input from the browser. After a good Samaritan performed a spectacular hack on Github, the Rails team immediately changed the default.
Is that the situation here with the 32-bit silent fail default? That it is a sensible default, but could be changed if it's shown that competent devs will nonetheless screw it up?
Is that the situation here with the 32-bit silent fail default? That it is a sensible default, but could be changed if it's shown that competent devs will nonetheless screw it up?
It's kind of sad that we'd need an example to show that it really could happen in every single specific case. It should be common knowledge by now that competent devs screw things up all the time.
For example, I'm sure the MongoDB devs are extremely competent. But all the same, having a database management system default to letting writes fail silently is a pretty spectacular screw-up. I can't really blame other competent devs for taking it for granted that a DBMS wouldn't do something like that.
Let's not conflate "hating" (which is always pointless) with pointing out flaws in a design.
"Mongo sucks f*cking balls" is hating it.
"Mongo silently does not store data in some situations" is pointing out a flaw.
It is easy to dismiss valid criticism as hate.
It is also unfortunately true that some topics degenerate to flamewars on HN. Why Mongo is such catalyst is a phenomenon in itself. One that I find fascinating.
"Getting your working set in memory is one of the most difficult things to calculate and plan for with MongoDB."
I am a little bothered when I see a working set described as though it was a property of the user or workload.
A large part of database research has gone into allowing the user or the system to make the working set smaller. An obvious example is an index, which makes the working set smaller if you don't mind a random I/O or two per lookup (of course, that only works for indexable queries).
There are also many operations in a database which try to work within a limited amount of memory, and therefore must have a small working set regardless of the data size. Sort and HashJoin are two examples. HashJoin doesn't tell you what the working set of your data is, you tell it and it works as efficiently as it can in that amount of memory.
And you can design your data layout to have a smaller working set (again, so long as you allow a few disk accesses outside the working set). Normalization and vertical partitioning (i.e. splitting a table up into several tables with fewer columns each) can help here.
So, the "working set" isn't some passive constant that can't be managed.
Isn't one of the defining characteristics of Mongo-like systems that they tend to aggressively favor using RAM to achieve performance? I.e., their strategy is to complete each query as quickly as possible rather than attempt to optimize for fairness of long-running transactions.
I wasn't referring only to long running transactions. Indexes are clearly a way of reducing the working set (at least in a practical sense) but also vital for workloads with many short transactions.
I was not criticizing mongo per se, I was criticizing this post and other misguided statements about the working set as it relates to database software.
OK, it just seemed to me that the stated goal of Mongo-like servers was to take some of our accepted notions of the "working set as it relates to database software" and put them up for fresh discussion.
MongoDB is really easy to get started with. It's easy to install on a VM, and it's easy to try it out on Heroku. There's drivers for a bunch of languages, and its API works in a way that fits well with the way programmers already think. It dispenses with pretty much everything that makes getting started with a new database hard (schemas, new syntax, unfamiliar concepts, needing to edit server configs).
MongoDB "just works" out of the box for testing and development. It's practically the perfect database for any testing and development work. And their marketing documentation tells you that it'll just work in production and will scale really well. That makes a lot of people try MongoDB, and they're mostly very happy with it.
...the problems arise later. Because while MongoDB lives up to the hype for testing and development, it doesn't completely live up to the hype for production uses. It doesn't always scale that well. It isn't always as durable as you might hope. Performance isn't always very good. For some use cases, the default settings aren't appropriate; for other use cases the database design itself isn't appropriate. Backups and replication aren't quite as easy as you'd hope.
In short, the problem comes because MongoDB is as good or better than, say, Postgres or Riak during development in basically every way, but it's only better than Postgres or Riak in a few specialized ways during production.
Example: Riak basically won't work at all in a multi-tenant setup. Two competing services will provide you with a free development MongoDB instance on Heroku, but nothing of the kind exists for Riak; you basically need to roll your own cluster. That makes MongoDB really easy for someone thinking "hey, I wonder if Mongo would work for this proof of concept I'm working on" (hint: it will). Of course, MongoDB replication and scaling is honestly mediocre, while Riak's is basically magic. This makes Riak a better choice for someone who needs to scale a cluster of database servers - but you only find out that sort of thing first-hand during production. At which point, you make an angry blog post that hits Hacker News. And while plenty of people never run into MongoDBs weak areas, they never post anything that hits the Hacker News frontpage. :)
TL;DR: A lot of people hate MongoDB because it's amazingly easy to get started with, but harder later on, and they then feel betrayed. Other databases have smoother learning curves.
I had to use it a few years back, for some data reporting that had been added to a project after the fact, and... I found it to be awful, because the task was very much a join-heavy operation. Something that would have been a breeze with Postgres, or even Mysql got turned into this big, ugly, heavy, slow process. I was not impressed, especially since the quantity of data in the DB was also something that could have very, very easily been handled by a very conventional relational DB running on a VPS, to say nothing of dedicated hardware.
Basically, they had chosen MongoDB due to the 'cool/new' factor, and it came back to bite them on the ass.
>" I found it to be awful, because the task was very much a join-heavy operation. Something that would have been a breeze with Postgres, or even Mysql got turned into this big, ugly, heavy, slow process. "
If you can't be assured of structure of data input, you can't reliably and gracefully transform that data on output. That's a pretty fundamental tradeoff between SQL and NoSQL (and historically between PostgreSQL and MySQL too).
You beat me into writing this =) Great article, btw!
It seems to me that there are three camps of people when it comes to dbs - (1) The "I hate anything new" camp, (2) The "I hate anything old" camp, and finally (3) The "I will pick the best tool suited to my needs"
Those from camp (1) are the ones who hate anything that IS NOT relational or sql. The mongodb bashers would fall in this camp.
Those from camp (2) are those who hate that IS relational or sql. The "web-scale" people would flal in this camp.
Those from camp (3) are the silent majority who do their independent research, pick the right type of DB for their job, and live life happily.
For my startup (Semantics3), we deal primarily with JSON strings (which have no fixed structure). We use MongoDB because a really good use case for it is to just store JSON documents with a unique id. We only run simple queries on it - basically existential (does an id exist) or count.
We are aware that it's query performance is not as fast as that of a relational db. So we then index that data to ElasticSearch and run our advanced queries through that.
We are really happy with our current system and it has been working great.
If we had some sort of "relational" data with a fixed structure I sure has hell would have picked MySQL or PostgreSQL and used Sphinx on it for indexing.
Does everyone hate infomercial salesmen that sell crap at 4am in the morning on TV? I don't know. Some do some don't.
Those that see through the lies are probably laughing at them, those that already bought the product and found out it is useless and doesn't live up to touted capabilities probably hate them.
Now don't get me wrong. In a practical way (disregarding ethical consideration) they took a risk. They quickly gained a lot of customers because their benchmarks looked very good.
So in a certain way people probably already forgot about the competitors at that time, because those competitors never made it so far. Now MongoDB is here years later and stories of hidden data corruption (and this is one of worst thing that could happen to a database) have started to emerge. Yes, by this time Mongo has the mind-share but now it is time to also pay the price for their design choice. They are probably still ahead.
This article isn't exactly inspring me to give MongoDB a shot. I wasn't even aware Map-Reduce in MongoDB was single-threaded, doesn't that defeat the entire purpose? Isn't Map-Reduce an embarrassingly parallel algorithm?
If they're using a single-thread JS engine, can't they just multiprocess? No shared state, after all...
That is very trivial to do in your ORM if you so choose - and something like the Li3 PHP framework (lithium) would handle this quite nicely.
To be honest, I'd rather have on-the-fly compression of data fields. At the moment I need to do this on my end, but it would be nice to be able to mark fields as compressible. Ah well.
It's not about "hating", or at least I hope it's not.
NoSQL, in many ways with MongoDB leading the charge, came on to the scene very quickly with a lot of support on HN. When action outpaces education, then it's natural to expect education to play catch-up for a while, which is what's happening now.
Some of it is education by 10gen, some by users who made mistakes (the leading cause of education), and some by traditional database people who want to highlight the lessons learned that might have been missed by 10gen or mongo users.
Generalizing all of this as "hate" is not productive.
You raise a valid point. If this essay by pg holds true, you might be better off sticking to PostgreSQL (the other pg): http://paulgraham.com/javacover.html
"Many suggest this default is a good way to get favourable benchmarks but there are no official ones so I don’t think that’s relevant."
Sure it is. People download MongoDB, run some simple workload against it, and see how fast it is. It's not an official benchmark, but it's still a strong part of the marketing message.
So, if those people don't understand the significance of sync/async, they may be getting a false impression of the speed.
I really appreciated this post for its thoughtful quality. It provided a bunch of "in our experience" analysis of situations that others presented, noting both the mistakes of users and Mongo itself, without playing the blame game.
The presentation of mitigation strategies for both sources of issues was actually constructive for potential and current users of Mongo. As with any tool, Mongo is useful in some scenarios but not without its drawbacks. It's nice to have a point-by-point assessment of pain points.
Falling into the potential user category, I've opted not to use MongoDB for my current project because the particular benefits don't align nicely with my problem, but it was useful to see this and know a bit of what to expect with Mongo, good and bad.
They don't all hate MongoDB, they're just singling themselves out as people we shouldn't work with. Ranting about a product without knowing much about it or even reading the documentation simply points out what type of engineer you wouldn't want to deal with.
YES!!!!
We stopped using it in production and moved to Solr.
I don't hate MongoDB but I think it is overhyped. The fact that 10gen raised 42m is crazy. I would be surprised if those investors got any return greater than the money they invested.
I think it looks promising - it's not a panacea, there are things SQL works better for and there are things I can see Mongodb would really rock with.
With anything there are gotchas so you have to live and learn with em.
When I was researching Mongodb before going into it I saw forums talking about the 2gb limit on 32 bit systems; I would say the author of that article didn't see the posting - which could either be a case of not due diligence on the ins and out of the system or lack of diligence on the reference material he had originally been using to build his system.
I'm not sure I "hate" it. But every project I've used MongoDB on that grew the least bit of data complexity left me wishing for (and often converting to) Postgres. I'll still like MongoDB if I'm doing something very simple or rapid prototyping, etc. But for real work, forget it.
I wouldn't say I hate it, but I took MongoDB into production two years ago only to have it be a consistent source of problems for me. We integrated it to offload some activity from MySQL, it did a great job for a while, but it's a constant chore to maintain.
MongoDB powers my latex doc database, and the CMS my team and I use. It's awesome.
I have an event logger, that logs "interesting" MySQL server events to a tab separated file and then out to a central mongodb database. So I can query several servers results at once.
> The way MongoDB yielded was improved in 2.0 in a very significant way and this was taken further in 2.2 with the complete removal of the global lock as a step towards more granular concurrency.
Yes with 2.2, write locks have improved drastically. However, I'm still waiting until you can get to the point of not having to lock down whole collections (the near equivalent to a table - with caveats) as opposed to a document within the collection (the closest thing that you can call a record in db terms - with tons of caveats).
This being said, Mongodb does have really really awesome documentation and it is the right way to go for many applications and developers.
I've used mongodb for a recent prototype. It was brilliant, I iterated through the data model as I coded.
However, I am now very aware that I have an implicit schema in my code that will be harder to understand in 6 months than a database schema.
I traded fantastic flexibility in initial development against easily maintained longevity.
I suspect I will move to something more relational but have also taken on the need to really understand the trade-offs I'm making. I've started reading Seven Databases in Seven Weeks to fill in some of my gaps.
I am a huge fan of MongoDB and have been using it for years. That said, my default data store is PostgreSQL. I have to have a good reason for choosing to not use PostgreSQL.
I especially like MongoDB for data analytics: really handy for storing data and then doing experimental data mining. Putting a read replica on each server that does analytics/data mining works especially well (very good performance when MongoDB's indices fit in memory and the read mongo is on localhost to programs doing analytics).
I love that MongoDb allows me to get stuff done without much fuss even when I have not figured out the final data requirements.
I have been using RDBMS + ORM ( Hibernate) for a very long time; however I found this to be very "rigid" in a start up environment where the requirements are far from frozen. MongoDB is much more flexible and forgiving than traditional RDBMS in that respect.
If you haven't figured out the final data requirements, then you have work to do. Its a corollary of the "give me your data structures and I'll understand your program, give me just your code and I'll still be clueless" mantra. The Database should have a strict schema before code is written. Having a schemaless datastore is only flexible in the sense that it lets ou shoot yourself int eh foot easily.
I don't hate MongoDB: I use it on scenarios where it is useful (eg: forms apps collecting semi-structured data, ETL apps pushing/transforming data to achieve geo-near queries etc).
For a regular web-app with less exotic needs, my store of choice is PostgreSQL these days.
And next time I have to handle large scale data, I will probably use Riak.
Outstanding article. Well overdue for the Mongo community. There are always the haters whom find themselves ignoring documents then complaining about how things are setup.
Thank you for going back through mongo articles to iterate the key failures to understanding points.
No I like it if you know it's limits. Schema free supports agile development much better than e.g. MySQL. Compared to key/value stores it's easier if you need to query data (and don't have Twitter scale amounts of it)
I'll say this. I like MongoDB, but I've never built anything at big scale with it. For a small project, it's much less hassle to get rolling on than MySQL, so it's cool for that.
MongoDB is kind of like assembler, if you read processor errata sheets for fun you can make some really fast websites. Get that OS, and everything out of the way, just you and the network card.
However, if you just want to write a web app that doesn't lose data it's a really crappy idea and should probably use something creating for solving that problem, like a compiler and web framework.
hell no, it's been a breath of really fresh air for us, we've been building all sorts of things, no problems so far. (including logging several millions of requests per day and performing distributed map reduce for all sorts of statistical models on a tiny 3 machine "cluster" we run on AWS, I hope I never have to go back to modifying schemas again, so much simpler to run and build with this mindset)
> I hope I never have to go back to modifying schemas again
Honest question. Do you actually leverage the schemaless nature of MongoDB? Not 'worrying' about managing schemas is one thing, but actually leveraging the fact that your objects can have different fields is another. I have worked on projects that did need to store objects with different fields together but it seems to be a rare case.
From what I've seen, most projects actually have pretty standard objects with specific fields. People just don't like feeling constrained but in the end I don't think managing a schema is difficult at all. I actually like the safety and performance you get when you put in the effort.
I think the single most useful thing to see when starting a new dev job somewhere are the database schemas. If done properly, it will essentially tell you the story of that company and their data. You can think up questions and get answers such as "Is it possible for a Customer to have multiple Orders? Can a product have more than one Review?" (Stupid examples but they illustrate the kinds of relationships which are asked about all the time).
What I like is not the absence of a schema (because as you say there usually is one), but not having to fit it into the 2D SQL table model. An obvious case in when you have a many-to-one field that only ever contains a small number of values (e.g. a person's nationality); in a traditional SQL database my choices are make a separate table for it, or use a database-specific custom datatype (which never seem to be fully supported). In something like mongoDB I can just put a list in that field of the document.
Flexibility is useful in development. Once you deploy then things will change less frequently so the real benefit is when you do have to change something, you're not running a big ALTER statement.
In the enterprise world (at least in the Microsoft world) we have tools to manage these sorts of things. The one I'm currently using even for my small pet projects is called SQL Server Data Tools[1]. You basically just have to modify the original table creation scripts (add a column or whatever) and it actually generates all the change scripts for you. Even if the change is complex it handles making temp tables and transferring the data into the new table. It also notifies you and lets you handle cases where the change may lead to data loss.
I guess my point is that the SQL world is still innovating but it doesn't get that much attention. Unless I have a specific use-case for storing similar entities together with different fields then a schemaless DB is not the right tool for the job.
> Once you deploy then things will change less frequently so the real benefit is when you do have to change something, you're not running a big ALTER statement.
Instead, you're writing a bunch of code to deal with data that may be in the old format, or may be in the new format, or may be in the new-new format.
In practice, you're writing code that has to deal with records that were updated by versions 3, 5, and 7 of your code but not 4 or 6. After a few years you can be sure somebody got it wrong at some point, and now you have records in a few out of the 2^n states where not even the developers can anticipate the system's behavior. The fix is to make ALTER incremental and less painful, not to stop writing down and checking your schema.
And both MySQL and PostgreSQL have put in work to make ALTER less painful by reducing the situations where a full table rewrite is necessary. In PostgreSQL you can both add and remove columns without the table being rewritten as long as you do not set a default value. You can also increase the lengths of varchars without any rewrite.
I am indifferent to MongoDB but I do caution people that the internals and implementation are quite primitive for a database engine. It is the sort of database engine that a programmer that knows little about database engines would build. Over the years that has repeatedly manifested as problems no one should ever expect a database engine to have if properly designed. There is a reasonable point where you should not have to read the documentation to see if your database has a basic design flaw; there is an assumption of competent architecture for its nominal use cases.
MongoDB has improved in this regard over time and many other popular NoSQL (and SQL) databases have similar problems. Users are being asked to cut the database some slack for less than competent engineering choices but users just want a database to work. Being "simple" isn't enough.