Hacker News new | past | comments | ask | show | jobs | submit login

Seriously, another case of using Mongo incorrectly? I want to believe all the Mongo hate, but I can't because I always find out that the actual problem was one or more of:

* didn't read the manual

* poor schema

* didn't maintain the database (compactions, etc.)

In this case, they hit several:

" Its volume on disk is growing 3-4 times faster than the real volume of data it store;"

They should be doing compactions and are not. Using PostgreSQL does not avoid administration; it simply changes the administration to be done.

"it eats up all the memory without the possibility to limit this"

That's the idea -- that memory isn't actually used though; it's just memory mapping the file. It will swap out for something else that needs the space unless you are actively using all the data, in which case you really are using all your memory. Which is why you should put it on its own server...

"it begins to slow down the application because of frequent disk access"

"Finally we sleep quietly, and don’t fear that mongodb will drive out redis to swap once again."

You should be running Mongo on a server by itself. At the very least, if you're having disk contention issues, don't run it on the same server as your other database.

I'm not sure you always need to read the manual for everything, but for your production database, it's probably worth it.




> Seriously, another case of using Mongo incorrectly?

If a large proportion of MongoDB users are using it incorrectly, then I'd argue that it is a MongoDB problem, if only a documentation and messaging one. Clarity on what is and is not an appropriate use should be prominent.

So, what is this proportion?


Or, to be even more specific--if there's a Right Way to use a program, that Right Way should be encoded as defaults you have to override (if you know what you're doing), and automated actions you have to disable (if you know what you're doing.)


The primary "unreasonable" default seems to be that in the beginning, writes were not safe by default. Although they were explicit about it (if you read the docs), it probably was a bad decision. This has been changed, thankfully.

As far as maintenance, it's unreasonable to expect that you could have a zero maintenance configuration. What other software does that? Your operating system doesn't. Your browser doesn't either. Nothing is immune to entropy.


It has nothing to do with the defaults. It is all about people forgetting that MongoDB is a document database and not a relational one. I can write apps that will be 10x faster with MongoDB and 10x faster with PostgreSQL. It's all about matching your domain model to your database.


>t is all about people forgetting that MongoDB is a document database and not a relational one.

It's been a while since I looked into Mongo, but that was a Mongo marketing problem. They used to (still?) advertised themselves as a RDBMS replacement, literally.


Yes, but it's more complex than for example, replacing MySQL with PostgreSQL, where the basic structure is going to be the same.

You have to think and adapt to convert from a RDBS to MongoDB


What did you expect them to do ? Of course they are going to position themselves as a RDBMS solution. They want people to switch in order to make money. It doesn't mean as a developer you're going to be stupid or incompetent enough to ignore every piece of technical documentation and just focus on the marketing lines.


> If a large proportion of MongoDB users are using it incorrectly, then I'd argue that it is a MongoDB problem

Hey, that sounds a lot like the logic of Java haters!

Kidding aside, I'm afraid I'm not sure your logic is convincing, but that's for another debate.


Actually, it sounds like the logic of Java proponents.


> * didn't read the manual > * poor schema > * didn't maintain the database (compactions, etc.)

The real world dictates that this happens more often than not. You know why I like Postgres? When I don't read the manual, create a crappy schema, and forgot to maintain the database it STILL seems to work okay.


To be fair, Postgres has automatic vacuuming now, but it is a relatively new feature. Both projects seem to agree that it is not a high-priority item, though there is certainly something to be said about using a mature product, which Mongo is most certainly not.

Your comment has made me quite curious to know what people using mature databases of the time were saying about Postgres 19 years ago, when it was roughly the same age Mongo is today.


Yeah, autovacuum was added in 7.4 (2003) and not really to be trusted to do its job without monitoring until 8.3 (2008). But the priorities of the PostgreSQL project have changed in the last about 5 years. Today usability is important.


There's a huge difference. Postgresql was formally correct first, then it became fast, then it became easy to use. I always had the feeling I could trust it with my data. Postgresql, if wrong, would be wrong towards correctedness and towards data safety, at the expense of speed.

I definitely don't get the same vibe from MongoDB.


Back when Postgres didn't have automatic vacuuming, the need for manual vacuuming was one of the commonly-suggested reasons why MySQL was a better option - after all, MySQL just worked out of the box.


> They should be doing compactions and are not.

https://jira.mongodb.org/browse/SERVER-11763

It looks like compaction is an offline process. That really puts the user between a rock and a hard place.


In a proper production environment, you just compact each slave one at a time because you have a replica set rather than a single instance.

Of course, if you aren't replicating your business's production database, you have a whole world of problems.


Of course, if you aren't replicating your business's production database, you have a whole world of problems.

Not really. In RDBMS you can have offline backups (assuming you aren't building for high availability). I don't know if that exists for MongoDB.

Even pre-auto-vacuuming Postgres (ie, before Postgres v8) would allow you to vacuum the database while online. You had a performance hit, but there was no need to switch to a backup server or anything.


Good point, but that would make me a little nervous. What happens if the compaction takes a while and the replica gets far behind? And wouldn't the compaction time just keep getting longer and longer as data grows?

Can you repartition the data online to keep the cleanup work bounded?


> In a proper production environment, you just compact each slave one at a time because you have a replica set rather than a single instance.

That's the solution?? That sounds like a workaround in a production environment.


Yes and no. Deleting is a hard problem that many databases don't handle well. Explicitly deciding to punt the problem like this is a much better approach than allowing performance to degrade unpredictably.


In all fairness, the compaction is a major pain in Mongo. I get a little worked up about this because I cant think of another database that handles compaction this poorly, but feel free to correct me if Im wrong.


Have you tried turning on power of 2 allocation? In general, it makes compaction much less important. Though online compaction is definitely needed.


Yes we switched to it a month ago, which improved it but like you said, we are still having to compact frequently and having the hassle of switching the primary. Cassandra has turned out to be much more performant and easier to maintain for our use case.


> Seriously, another case of using Mongo incorrectly?

If everyone uses Mongo incorrectly, the problem is not Mongo. It is like the person crying out how everyone in the world is crazy.


I seem to have been able to use it correctly. In fact, I ran a cluster for years in production without any issues. I know of several other groups that have used it successfully as well.

As far as I can tell, a lot of people assumed it worked like a SQL database. It doesn't, which disappointed them. I'll even admit that some of the original defaults like the write concerns didn't really make sense as defaults. But that was all in the introductory documentation. Major subsystems like databases deserve at least a skim of the documentation if not a full read; if not up front then at least before putting them into production.


> Seriously, another case of using Mongo incorrectly? I want to believe all the Mongo hate, but I can't because I always find out that the actual problem was one or more of: > * poor schema

You're right. If people read the awesome mongodb docs before using it, they'd figure out that mongodb's ideal, good for performance schema has limitations that doesn't fit with a lot of projects. Of course this may have changed since mongodb evolves pretty quickly.


> "Finally we sleep quietly, and don’t fear that mongodb will drive out redis to swap once again."

MongoDB and Redis on the same box? Two data stores that need working set / all of the data to reside in RAM for performance? That is a recipe bound for failure.

Everyone seems to learn about physical separation the hard way.


What about locking? I heard that Mongo has a locking with only DB-granularity.


MongoDB therefore, is not a general purpose database. I recommend http://www.amisalabs.com


For what it is worth, I would think people actually try different things in the existing setup before they decide on doing a switch like this. It is not easy to pull off at all. My guess would be that if you have way more Postgres knowledge in the house, then it is more sensible to run Postgres.

This also drives the amount of administrative overhead needed.


The current stable node drivers silently throws away exceptions. Seriously, mongodb inc acknowledge it. Is this also a case of not using mongo correctly?


Mongo's disk format is extremely wasteful, the database files are gigantic. That is a real problem and there is no way to compact this to anywhere near the size something like Postgres would have for the same data.

Mongo is very bad at managing used memory. In fact it doesn't actually manage memory since it just mmaps its database file.

It also touches disk much more often than would be reasonable, especially for how much memory it uses.

It's a terrible database and it is perfectly legitimate to be annoyed at it being this terrible.


Although it is true that MongoDB uses a lot of disk compared to your average RDMS there are reasons for that.

1) MongoDB (and various other NoSQL solution) are schemaless and thus have to store document fields along with the values for each document. This alone usually results in roughly twice as much actual disk space being used compared to an RDBMS.

2) MongoDB preallocates fairly large chunks of disk for their mmap based storage (2Gb per database files by default). This means there will be up to 2Gb * N where N is the number of logical databases in "wasted" (more accurately, unused) space. This can be addressed somewhat through the --smallfiles option.

3) The biggest issue that I actually consider an design flaw is the ineffective reuse of disk space previously occupied by deleted or, more commonly, moved documents. MongoDB reserves a bit of padding for each document but since a lot of documents can grow over time these documents will be moved around on disk leaving holes in the data files. These holes are not adequately re-used and a compaction is required to make that space available again. Compaction is NOT a background process at the moment and blocks writes. The "usePowerOf2Sizes" option will help with this issue at the expense of always using a power of 2 size in bytes per document.

The above are factual reasons why MongoDB uses a lot of disk space. It's certainly a relatively young database and some issues do need to be addressed but this whole polarizing "it's terrible booo!" nonsense has to stop. Inform yourself, choose the tech appropriate for your project and post mortem aftwards.

Small note on the mmap thing; a lot of people consider the mmap based storage engine a big issue (I tend to agree). Tokutek offers what seems to be a better storage engine but does lag behind a bit on releases. I'm not affiliated with them but if you're interested you can check out http://www.tokutek.com/products/tokumx-for-mongodb/


"Small note on the mmap thing; a lot of people consider the mmap based storage engine a big issue (I tend to agree)."

This. Aside from general reliability issues I've had in the past, which are definitely fixable and might be now, this design decision is the thing that will cripple, and continue to cripple the db. The idea that some how the kernel is going to better at managing a db's memory is ridiculous. Kernel paging was designed for a completely different use case, it's not optimized to manage the memory of a database. The idea that somehow this line of reasoning: "oh hey guys, those Linux kernel hackers are smart and they can do it way better then use so let's just use their memory management system" was ok in the mind of the MongoDB creators really puts me off. It doesn't matter how many geniuses work on kernel memory management when they are solving a completely different problem then you need to solve, all memory management schemes aren't equal. When providing a db you need full control to be able to probably tune memory management to fit your needs, there is no magical fix for this.


Why "continue to cripple"? It's actually one of the things that can be changed quite easily at some point in time and doesn't introduce insurmountable backwards compatibility issues. And as I mentioned (and you cut out of your quote) there is at least one alternative available already.


> "it's terrible booo!" nonsense has to stop.

Nope. Pedal to the metal with that I say. They marketed the shit out of their database. Up until 2 years ago shipped with unacknowledged writes as a default. There is no excuse for that.

Don't want negative publicity -- stop making unrealistic marketing statements. That is the problem with MongoDB -- vis-a-vis their marketing, their underlying technology sucks. It is rather relative you see.

All those things you highlighted + compaction not being a background process, just mmapping a 2GB file for each DB. A global write lock. The unacknowledged writes problem, sorry, to me that screams "we don't know what we are doing, please don't let us near your data"


1. That's not necessarily the only possible implementation. It would be trivial to assign a number to each key and keep this map in the header of the db file.

2. That's not really the issue, I don't care about the size of small dbs. Large dbs have gigantic sizes.

3. That is absolutely abysmal, yes.

It doesn't just use 2x as much space as other dbs, in practice that can be up to 20-30x as much in bad cases. It's commonly at least 5x.

It really is absolutely terrible when compared to pretty much anything. It's slower than safer dbs that can do the same and more things. There is no selling point.

Just don't use it.


1) Not trivial at all. As documented in a rather broad range of papers and whatnot. You're almost certainly oversimplifying the problem.

2) For gigantic DBs the preallocation overhead is almost non-existent.

3) Fair enough.

Your last point is, again, is not fact. It's not slower but on average measurably faster than most commonly used RDBMs with equal "tuning" efforts if utilized for the same task. I can't help but feel you're not entirely objective here ;)


1) I don't think I am. The problem of (structural) subtyping is well known and researched and applying it to a database would not break new ground either.

2) The preallocation overhead is not what is the problem. The files are simply extremely large for the amount of data in them.

There has been plenty of measurement and I have done some of my own. With unacknowledged writes it's quite fast, but no other database client I know of does that because it's stupid. With the highest durability and safety mode it's slower than at least Postgres and Mongo still doesn't have transactions or generic indexes.


PostgreSQL and MySQL both support compression of data, so even if you do not solve the duplicated key problem you still can reduce disk usage.

The only benchmarks where I have seen MongoDB winning have been those where MongoDB was configured with different consistency requirements than MySQL and PostgreSQL. My suspicion is the MongoDB is slowest but I have never seen any high quality benchmarks.


Just want to harp on your first point - MongoDB's storage problem is uniquely Mongo's and not indicative of NoSQL design. Compactions can be a pain and the lack of compression + the some design features (such as no symbolized keys) can lead to a very bloated database.

We ran a MongoDB cluster with 5 nodes on 1TB SSDs that were 80% full and growing (and we had no replica sets, a nightmare). We switch to Cassandra, and even set the replica factor to 3 and we only managed less than 30% disk usage.

Now when I think about MongoDB administration, I'm now also thinking cost. Pound-for-pound (same number of replicas and nodes, and not even counting wanting to keep your nodes under 50%) our MongoDB cluster cost would 5x more while delivering worse performance.

It really isn't an issue of MongoDB using a large amount of space, its an issue of MongoDB using an order of magnitude more of space leading to uneccesary higher production costs.


And let's not forget the fact that it has a per database lock, which is a really strange choice for a document database.


The guys at MongoDB are working on moving this to a more fine-grained lock in the future. In practice, it's not usually a problem but it'll be less so going forward.


It is a problem in practice if you have any amount of load. You'll see latency quickly go up with lock times, which go up with concurrency.

Why bother, when you can use another db without this problem, right now?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: