Seriously, another case of using Mongo incorrectly? I want to believe all the Mongo hate, but I can't because I always find out that the actual problem was one or more of:
* didn't read the manual
* poor schema
* didn't maintain the database (compactions, etc.)
In this case, they hit several:
" Its volume on disk is growing 3-4 times faster than the real volume of data it store;"
They should be doing compactions and are not. Using PostgreSQL does not avoid administration; it simply changes the administration to be done.
"it eats up all the memory without the possibility to limit this"
That's the idea -- that memory isn't actually used though; it's just memory mapping the file. It will swap out for something else that needs the space unless you are actively using all the data, in which case you really are using all your memory. Which is why you should put it on its own server...
"it begins to slow down the application because of frequent disk access"
"Finally we sleep quietly, and don’t fear that mongodb will drive out redis to swap once again."
You should be running Mongo on a server by itself. At the very least, if you're having disk contention issues, don't run it on the same server as your other database.
I'm not sure you always need to read the manual for everything, but for your production database, it's probably worth it.
> Seriously, another case of using Mongo incorrectly?
If a large proportion of MongoDB users are using it incorrectly, then I'd argue that it is a MongoDB problem, if only a documentation and messaging one. Clarity on what is and is not an appropriate use should be prominent.
Or, to be even more specific--if there's a Right Way to use a program, that Right Way should be encoded as defaults you have to override (if you know what you're doing), and automated actions you have to disable (if you know what you're doing.)
The primary "unreasonable" default seems to be that in the beginning, writes were not safe by default. Although they were explicit about it (if you read the docs), it probably was a bad decision. This has been changed, thankfully.
As far as maintenance, it's unreasonable to expect that you could have a zero maintenance configuration. What other software does that? Your operating system doesn't. Your browser doesn't either. Nothing is immune to entropy.
It has nothing to do with the defaults. It is all about people forgetting that MongoDB is a document database and not a relational one. I can write apps that will be 10x faster with MongoDB and 10x faster with PostgreSQL. It's all about matching your domain model to your database.
>t is all about people forgetting that MongoDB is a document database and not a relational one.
It's been a while since I looked into Mongo, but that was a Mongo marketing problem. They used to (still?) advertised themselves as a RDBMS replacement, literally.
What did you expect them to do ? Of course they are going to position themselves as a RDBMS solution. They want people to switch in order to make money. It doesn't mean as a developer you're going to be stupid or incompetent enough to ignore every piece of technical documentation and just focus on the marketing lines.
> * didn't read the manual
> * poor schema
> * didn't maintain the database (compactions, etc.)
The real world dictates that this happens more often than not. You know why I like Postgres? When I don't read the manual, create a crappy schema, and forgot to maintain the database it STILL seems to work okay.
To be fair, Postgres has automatic vacuuming now, but it is a relatively new feature. Both projects seem to agree that it is not a high-priority item, though there is certainly something to be said about using a mature product, which Mongo is most certainly not.
Your comment has made me quite curious to know what people using mature databases of the time were saying about Postgres 19 years ago, when it was roughly the same age Mongo is today.
Yeah, autovacuum was added in 7.4 (2003) and not really to be trusted to do its job without monitoring until 8.3 (2008). But the priorities of the PostgreSQL project have changed in the last about 5 years. Today usability is important.
There's a huge difference. Postgresql was formally correct first, then it became fast, then it became easy to use. I always had the feeling I could trust it with my data. Postgresql, if wrong, would be wrong towards correctedness and towards data safety, at the expense of speed.
I definitely don't get the same vibe from MongoDB.
Back when Postgres didn't have automatic vacuuming, the need for manual vacuuming was one of the commonly-suggested reasons why MySQL was a better option - after all, MySQL just worked out of the box.
Of course, if you aren't replicating your business's production database, you have a whole world of problems.
Not really. In RDBMS you can have offline backups (assuming you aren't building for high availability). I don't know if that exists for MongoDB.
Even pre-auto-vacuuming Postgres (ie, before Postgres v8) would allow you to vacuum the database while online. You had a performance hit, but there was no need to switch to a backup server or anything.
Good point, but that would make me a little nervous. What happens if the compaction takes a while and the replica gets far behind? And wouldn't the compaction time just keep getting longer and longer as data grows?
Can you repartition the data online to keep the cleanup work bounded?
Yes and no. Deleting is a hard problem that many databases don't handle well. Explicitly deciding to punt the problem like this is a much better approach than allowing performance to degrade unpredictably.
In all fairness, the compaction is a major pain in Mongo. I get a little worked up about this because I cant think of another database that handles compaction this poorly, but feel free to correct me if Im wrong.
Yes we switched to it a month ago, which improved it but like you said, we are still having to compact frequently and having the hassle of switching the primary. Cassandra has turned out to be much more performant and easier to maintain for our use case.
I seem to have been able to use it correctly. In fact, I ran a cluster for years in production without any issues. I know of several other groups that have used it successfully as well.
As far as I can tell, a lot of people assumed it worked like a SQL database. It doesn't, which disappointed them. I'll even admit that some of the original defaults like the write concerns didn't really make sense as defaults. But that was all in the introductory documentation. Major subsystems like databases deserve at least a skim of the documentation if not a full read; if not up front then at least before putting them into production.
> Seriously, another case of using Mongo incorrectly? I want to believe all the Mongo hate, but I can't because I always find out that the actual problem was one or more of:
> * poor schema
You're right. If people read the awesome mongodb docs before using it, they'd figure out that mongodb's ideal, good for performance schema has limitations that doesn't fit with a lot of projects. Of course this may have changed since mongodb evolves pretty quickly.
> "Finally we sleep quietly, and don’t fear that mongodb will drive out redis to swap once again."
MongoDB and Redis on the same box? Two data stores that need working set / all of the data to reside in RAM for performance? That is a recipe bound for failure.
Everyone seems to learn about physical separation the hard way.
For what it is worth, I would think people actually try different things in the existing setup before they decide on doing a switch like this. It is not easy to pull off at all. My guess would be that if you have way more Postgres knowledge in the house, then it is more sensible to run Postgres.
This also drives the amount of administrative overhead needed.
The current stable node drivers silently throws away exceptions. Seriously, mongodb inc acknowledge it. Is this also a case of not using mongo correctly?
Mongo's disk format is extremely wasteful, the database files are gigantic. That is a real problem and there is no way to compact this to anywhere near the size something like Postgres would have for the same data.
Mongo is very bad at managing used memory. In fact it doesn't actually manage memory since it just mmaps its database file.
It also touches disk much more often than would be reasonable, especially for how much memory it uses.
It's a terrible database and it is perfectly legitimate to be annoyed at it being this terrible.
Although it is true that MongoDB uses a lot of disk compared to your average RDMS there are reasons for that.
1) MongoDB (and various other NoSQL solution) are schemaless and thus have to store document fields along with the values for each document. This alone usually results in roughly twice as much actual disk space being used compared to an RDBMS.
2) MongoDB preallocates fairly large chunks of disk for their mmap based storage (2Gb per database files by default). This means there will be up to 2Gb * N where N is the number of logical databases in "wasted" (more accurately, unused) space. This can be addressed somewhat through the --smallfiles option.
3) The biggest issue that I actually consider an design flaw is the ineffective reuse of disk space previously occupied by deleted or, more commonly, moved documents. MongoDB reserves a bit of padding for each document but since a lot of documents can grow over time these documents will be moved around on disk leaving holes in the data files. These holes are not adequately re-used and a compaction is required to make that space available again. Compaction is NOT a background process at the moment and blocks writes. The "usePowerOf2Sizes" option will help with this issue at the expense of always using a power of 2 size in bytes per document.
The above are factual reasons why MongoDB uses a lot of disk space. It's certainly a relatively young database and some issues do need to be addressed but this whole polarizing "it's terrible booo!" nonsense has to stop. Inform yourself, choose the tech appropriate for your project and post mortem aftwards.
Small note on the mmap thing; a lot of people consider the mmap based storage engine a big issue (I tend to agree). Tokutek offers what seems to be a better storage engine but does lag behind a bit on releases. I'm not affiliated with them but if you're interested you can check out http://www.tokutek.com/products/tokumx-for-mongodb/
"Small note on the mmap thing; a lot of people consider the mmap based storage engine a big issue (I tend to agree)."
This. Aside from general reliability issues I've had in the past, which are definitely fixable and might be now, this design decision is the thing that will cripple, and continue to cripple the db. The idea that some how the kernel is going to better at managing a db's memory is ridiculous. Kernel paging was designed for a completely different use case, it's not optimized to manage the memory of a database. The idea that somehow this line of reasoning: "oh hey guys, those Linux kernel hackers are smart and they can do it way better then use so let's just use their memory management system" was ok in the mind of the MongoDB creators really puts me off. It doesn't matter how many geniuses work on kernel memory management when they are solving a completely different problem then you need to solve, all memory management schemes aren't equal. When providing a db you need full control to be able to probably tune memory management to fit your needs, there is no magical fix for this.
Why "continue to cripple"? It's actually one of the things that can be changed quite easily at some point in time and doesn't introduce insurmountable backwards compatibility issues. And as I mentioned (and you cut out of your quote) there is at least one alternative available already.
Nope. Pedal to the metal with that I say. They marketed the shit out of their database. Up until 2 years ago shipped with unacknowledged writes as a default. There is no excuse for that.
Don't want negative publicity -- stop making unrealistic marketing statements. That is the problem with MongoDB -- vis-a-vis their marketing, their underlying technology sucks. It is rather relative you see.
All those things you highlighted + compaction not being a background process, just mmapping a 2GB file for each DB. A global write lock. The unacknowledged writes problem, sorry, to me that screams "we don't know what we are doing, please don't let us near your data"
1. That's not necessarily the only possible implementation. It would be trivial to assign a number to each key and keep this map in the header of the db file.
2. That's not really the issue, I don't care about the size of small dbs. Large dbs have gigantic sizes.
3. That is absolutely abysmal, yes.
It doesn't just use 2x as much space as other dbs, in practice that can be up to 20-30x as much in bad cases. It's commonly at least 5x.
It really is absolutely terrible when compared to pretty much anything. It's slower than safer dbs that can do the same and more things. There is no selling point.
1) Not trivial at all. As documented in a rather broad range of papers and whatnot. You're almost certainly oversimplifying the problem.
2) For gigantic DBs the preallocation overhead is almost non-existent.
3) Fair enough.
Your last point is, again, is not fact. It's not slower but on average measurably faster than most commonly used RDBMs with equal "tuning" efforts if utilized for the same task. I can't help but feel you're not entirely objective here ;)
1) I don't think I am. The problem of (structural) subtyping is well known and researched and applying it to a database would not break new ground either.
2) The preallocation overhead is not what is the problem. The files are simply extremely large for the amount of data in them.
There has been plenty of measurement and I have done some of my own. With unacknowledged writes it's quite fast, but no other database client I know of does that because it's stupid. With the highest durability and safety mode it's slower than at least Postgres and Mongo still doesn't have transactions or generic indexes.
PostgreSQL and MySQL both support compression of data, so even if you do not solve the duplicated key problem you still can reduce disk usage.
The only benchmarks where I have seen MongoDB winning have been those where MongoDB was configured with different consistency requirements than MySQL and PostgreSQL. My suspicion is the MongoDB is slowest but I have never seen any high quality benchmarks.
Just want to harp on your first point - MongoDB's storage problem is uniquely Mongo's and not indicative of NoSQL design. Compactions can be a pain and the lack of compression + the some design features (such as no symbolized keys) can lead to a very bloated database.
We ran a MongoDB cluster with 5 nodes on 1TB SSDs that were 80% full and growing (and we had no replica sets, a nightmare). We switch to Cassandra, and even set the replica factor to 3 and we only managed less than 30% disk usage.
Now when I think about MongoDB administration, I'm now also thinking cost. Pound-for-pound (same number of replicas and nodes, and not even counting wanting to keep your nodes under 50%) our MongoDB cluster cost would 5x more while delivering worse performance.
It really isn't an issue of MongoDB using a large amount of space, its an issue of MongoDB using an order of magnitude more of space leading to uneccesary higher production costs.
The guys at MongoDB are working on moving this to a more fine-grained lock in the future. In practice, it's not usually a problem but it'll be less so going forward.
Maybe I am just incredibly lucky, but mongodb has worked fine for ridewithgps.com - we are sitting at 670gb of data in mongo (actual DB size, indexes included) and haven't had a problem. Replica sets have been fantastic, I wish there was another DB out there that did auto-failover as cleanly/easily as mongo does. We've had a few server crashes of our primary, and aside from 1-2 seconds or so of errors as requests come in before the secondary is promoted, it's transparent.
With that being said, we are using it to store our JSON geo track data, most everything else is in a mysql database. As a result we haven't run into limitations around the storage/query model that some other people might be experiencing.
Additionally, we have some serious DB servers so haven't felt the pain of performance when exceeding working memory. 192gb of ram with 8 RAID10 512gb SSDs probably masks performance issues that other people are feeling.
Final note: I'll probably be walking away from mongo, due to the natural evolution of our stack. We'll store high fidelity track data as gzipped flat files of JSON, and a reduced track inside of postgis.
tl;dr - using mongo as a very simple key/value store for data that isn't updated frequently, which could easily be replaced by flat file storage, is painless. YMMV with other use cases.
It's actually very much over built. I had the budget, so I built for growth. This setup should easily last us through this year, assuming 4x growth over last year. I am actually putting together another similarly spec'd machine with 384gb of ram (why not, it's cheap) because the current secondary is on 15k disks with only 24gb of ram. Still more than enough to handle the load through this last year, but probably not this coming year, at least without a bunch more ram.
In regards to actual iops, not sure what this thing can peak at off the top of my head, but we'll easily be doing 100 queries a second this year, with a considerable portion of those queries pulling out ~1mb documents.
Playing it conservative, so I am moving towards gzipping those large documents (never need to access anything but the full data, > 90% of accesses are directly served to clients that can handle inflating the data). For now they will stay in mongo, but I am building out an evaluation of using a flat file structure and just letting nginx pass them out.
One thing to note that I left out above, I also use the same severs for mysql. Our mysql working set is something like 30gb now, so I have a decent chunk of that ram apportioned to mysql.
Additionally our mysql db sees many more queries than mongo, so the overbuilt hardware is a bit less overbuilt when taking that into consideration :)
670 gigabytes is a puny database size. You should be able to press so much power through a system with a disk system like the one you have. I would seriously consider a Postgres setup on a data set of that size. Additionally, I would probably just store the JSON data directly inside postgresql.
See comment below. Definitely a small database size. You hit the nail on the head. It's going to grow fast this next year, and I'd like to put off sharding as long as possible, hence gzipped storage of the data.
postgis isn't a good fit for the data we store in full fidelity, since it's not just geo data but also sensor data (heartrate, cadence, power in watts, temperature etc). However I'll be storing a point reduced version of the full track in postgis, so i can move to using actual intersection queries for matching tracks, instead of the current brute force approach (check everypoint in every track sharing a bounding box) that works now. All bets are out the window though with 2-4x the traffic and data we currently have, using that brute force approach.
I already run another beefy postgis setup (192gb ram, though spinning disks not SSDs) for serving OSM maps, and eventually OSM routing hence the ram.
> postgis isn't a good fit for the data we store in full fidelity, since it's not just geo data but also sensor data (heartrate, cadence, power in watts, temperature etc).
because it's faster to serve the full fidelity track, with all sensor data for every lat/lng pair, straight from disk as a gzipped file. I can store a point reduced version in postgis for actual geo operations, since that only needs lat/lng pairs, and not at 1hz sampling.
Pulling 21k points (1.8mb of JSON) for a single recording out of disk into memory, turning it into JSON, sending it over the wire to the client, is slow (500ms just to query, pull off disk and turn into JSON). Serving the same data pre-compressed, is 2ms. This isn't even considering what it takes nginx to compress it on the fly for every request.
So it's because you don't want to recreate the track from point data on each request. I can see why that's advantageous.
OK, new dumb question. It sounds like you're serving gzip'd JSON straight off the disk with nginx (serving gzip'd files direct from disk is one of my favourite nginx features). Where does Mongo come into the picture?
I think I peaked at 5% lock utilization this year, so I haven't seen any real issues.
Our actual track data isn't updated frequently. Mostly it serves as an archive for a user, and is only seen by 1-2 other people. Most people use our service to store all their activities, which for the most part are really boring. They are interested in aggregate metrics like "I've ridden 200 miles this month".
A smaller portion of our data is from planning a route using google maps, which has much more modest storage requirements, since it's optimized data (one point every mile if it's a straight line) instead of 1hz logging from a GPS unit. This stuff is edited, but I'd say only 10% of planned routes are ever modified, so actual updates on the track data are small.
> using mongo as a very simple key/value store for data that isn't updated frequently, which could easily be replaced by flat file storage, is painless. YMMV with other use cases.
This is our use case as well and MongoDB has been fine. We had some initial pain as we learned the product but it's great for this use case. Currently sitting around 1TB of data.
We build our own machines and host them out of a carrier neutral facility in Portland, OR. $200/mo unmetered unlimited 100mbit connection and $1100 for a full rack with redundant 20 amp feeds. It would cost us something like $5k/mo to run our 9 servers on amazon.
The title is a bit misleading. This is basically an announcement of a fork of Errbit that has Postgres support. Additionally, the fork was announced as an issue on errbit with no discussion or as an official pull request.
I would not consider this good etiquette. If you fork your project (especially without discussing the intention first), adding a bug to the original project isn't a very nice thing to do.
An official pull request would be nicer or, even better, don't bother the original project, but just announce your fork over other channels.
Even better would be to at least discuss the issue with the original project - maybe they agree and you can work together.
>even better, don't bother the original project, but just announce your fork over other channels
This is a rather bizarre interpretation of nice behavior: Make a very cool modification to a project, but don't even bother to tell the original maintainers/authors?
Github Issues is a perfectly reasonable place for this. Maybe the mailing list would be better, but, shrug. Issues != Bugs, by the way. There's a reason it's called Issues. And it's basically the only way to have a discussion on github about anything whether it's an issue or not.
Also, some maintainers get mad if you send a pull request without doing an issue first, so there's no right way.
I disagree, I think that opening an issue on github is a good way to start a discussion about a feature. Many projects accept feature requests this way and if anyone did the same for one of my projects, this would be the way I would prefer them to handle it.
... after thouroughly testing it in production for 11 months and verifying that they had a point.
This is perfectly valid. A pull req might have been better, but this starts a discussion as well, and might prompt the project owner to say "Sounds great! Submit a pull request," or alternatively, "Sounds cool! We'll provide a link to your fork on our page." Both good.
If you go back and look at it now, you'll see that this is a non-issue:
@realmyst Will you put up a Pull Request?
It sounds like MongoDB has no future indeed:
http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/
realmyst commented 19 minutes ago
@21croissants yes, I will.
Lets just say that PostgreSQL answers the criticisms of relational databases that led to NoSQL. The complaints all boiled down to saying that the RDBMS forced you to do things one way and that it was cumbersome. PostgreSQL evolved and fixed the most annoying issues like JSON support and schemaless key-value store support. That's the way open source is supposed to work. Now folks are learning that throwing out the baby with the bathwater leads to more complexity than just learning how to use a relational database. The pendulum has swung back.
Mongo doesn't really do that in a way you can reliably use in practice either, though. Its sharding offers a subset of operations over an inconsistent view of your dbs.
You can do that with Postgres trivially, and even automatically with postgres_fdw and writeable views.
This story has played out before. Last time, it was Object Oriented databases. What happens each time is that the traditional RDBMS's pick up a few of the features, and then we keep using them until the next contender comes along.
This is not true at all. The actual realization the past years is that strictly enforced relationality (is that a word?) and transactions are constructs that are not always or even rarely actually needed. Eventual consistency, schemaless data modelling and so on picked up steam and for good reasons. Every technology that survives the "Oh, new toy!" stage has a place or it wouldn't still exist. It is up to developers to choose the appropriate technology for them and their projects. That isn't to say that a lot of persistence problems cannot be solved by an RDBMS, a k/v store and a document store. In that case just base your decision on other drivers (comfort level, cost, and so on)
...may be possible, but almost always requires domain-specific concurrency-level understanding in your datastore, and is almost always harder to work with than strong consistency.
Saying that transactions are 'rarely' needed boggles my mind. Working inside transactions (where feasible, which is in the large majority of situations) vastly simplifies data storage.
Agreed. The first rule of computing is that anything can do everything -- it's just a matter of how challenging it is to implement.
The notion of transactions were invented not for performance, but because they are easier to reason about. So much easier that it often means the difference between a project that never finishes and a project that finishes so early that you have time to spend on optimization and caching.
You had me until the "like Cassandra" bit ;) If there's one thing where Cassandra loses the battle with other NoSQL tech then cluster managements is probably it. Also, it's a bit unfair to compare RDBMSs with Cassandra. Clustering is inherently going to be more complicated for RDBMSs. Incidentally it's actually where I feel MongoDB deserves a bit of credit.
Right. Just like MySQL/Oracle was put out to pasture.
And if you think MongoDB is only popular because it is a JSON store then it shows just little you know about the database landscape and about how developers actually use databases.
For those of us that use all 3 of those, your statement is in large part true. I'd also hazard that Mongodb isn't actually that popular outside the HN buzz bubble.
All the philosophical issues and /(No)?SQL/ discussions aside, as a heavy user of Postgres and a user of Errbit, this is very good news to me. I have not much experience with running Mongo, but I have a ton of experience with running Postgres.
Even better: The application I'm using Errbit the most for is already running in front of a nicely replicated and immensely powerful postgres install.
Being able to put the Errbit data there is amazing.
If you want to use MongoDB in a project and you don't intend to rely heavily on the aggregation framework, the consider TokuMX (http://www.tokutek.com/products/tokumx-for-mongodb/) as it alleviates many of the shortcomings of MongoDB (data compression, document level locking for writes, ...) + it adds transactions.
It's a drop in replacement so it will work with current drivers. (if you have a running mongo cluster however expect quite some work if you want to migrate)
(I have no affiliation with TokuTek whatsoever except that I use their product)
I'm no MongoDB expert, but recently started to look into this db. Can anyone tell me (from experience, not from promo materials) - for which use cases MongoDB is good fit and for which ones it's not? It's clear that it can't fit for everyone. That's why it would be good to know in advance, for what it most likely to find and for what it's most likely not to fit.
It has the benefits and ease of use of a json document store, it allows you to do SQL style where clauses, it takes about a minute to install and start using, there are a wide range of drivers available for many languages, and it has a simple javascript map/reduce.
on the flip side, it implements database level locking, uses more disk/RAM than it probably should, and can start to give you headaches if you try to do a lot of writes at once.
edit: to give you a real world example, we use mariadb for storing everything persistently. however, a lot of data like "number of teachers in school A" is aggregated and too difficult to run in real time when we render paged results. to get around that, we use mongo as a document store and use its SQL like querying to generate the paged search results. this lets us sort/filter on the data without having to do everything in SQL.
> to give you a real world example, we use mariadb for storing everything persistently. however, a lot of data like "number of teachers in school A" is aggregated and too difficult to run in real time when we render paged results. to get around that, we use mongo as a document store and use its SQL like querying to generate the paged search results. this lets us sort/filter on the data without having to do everything in SQL.
This use case should be possible to solve with the JSON type in PostgreSQL. The indexing in PostgreSQL is just as advanced in 9.3 and will be better than MongoDB in 9.4 if a couple of patches land.
It definitely is possible with PostgreSQL and it's honestly probably better to do with Postgre, but we're using MariaDB for relational data and it was just quicker/easier to dump everything to mongodb.
> however, a lot of data like "number of teachers in school A" is aggregated and too difficult to run in real time when we render paged results.
Does this mean you're using MongoDB as a kind of query cache? Was there a compelling reason to prefer it to other common caches? Or even building an ETL/DW into your existing database infrastructure?
Not a query cache so much as a document cache. I'm actually implementing a query cache in Redis which is, imo, the greatest tool on earth :P
To be honest the decision to use mongo as a document cache was made before I realized the scope of our data needs. Mongo was the quickest/easiest solution at the time (early 2012). If I could do this over again, and we actually will be in the next 6 months, I'd either set it up in PostgreSQL/json, or throw everything in a wide column store.
This is more like how it is in theory. I was hoping for more real life stories. "We had problem X, tried MongoDB, but failed because of ...", "We had problem Z, MongoDB works better than R, because..."
PostgreSQL has native json support now. What else is missing? Just a protocol implementation?
I'd love to see MongoDB give up and become a PostgreSQL consultancy.
Everybody I talk to in the field has the exact same Mongo story: "We love JSON! We use JSON everywhere! We just wanted a DB with native JSON support. We didn't look at the implementation details. We only looked at their marketing. Now we wake up at 3am to fix it every night and lose data every day. Somebody help us. We love JSON."
It is probably not as simple as "supports json now". Imagine if HN comments were stored as a JSON document:
Client A: Read JSON.
Client B: Read JSON.
Client A: Append new comment to json document.
Client B: Append new comment to json document.
Client A: Save JSON
Client B: Save JSON
A's comment will get deleted. My understanding is that Mongo DB does have a way to append a record within a document, but Postgres does not.
I am in no way advocating for MongoDB (I dislike it). I am just saying that I understand that MongoDB has much more sophisticated updates capability than Postgres.
Mongo also doesn't have a concept of a transaction so it needs these types of atomic update mechanisms in order to be able to do much of anything sanely. This isn't a problem with SQL which has ACID. I'm not saying that's always the right thing or a good thing, just that your example and conclusion isn't technically correct. Mongo's update capabilities are not more sophisticated but are necessarily different.
That being stated, I'll agree that similar mechanisms for the JSON fields would make sense for Postgres to consider in the future.
> That being stated, I'll agree that similar mechanisms for the JSON fields would make sense for Postgres to consider in the future.
I think they are working on adding modification functions for hstore right now, and the plan is to also add them to JSON once that is done.
You could easily implement your own such functions right now, since PostgreSQL already support atomic modifications in of columns with UPDATE with no overwriting. This statement is safe to use to increment a counter.
UPDATE c SET counter = counter + 1 WHERE id = 42;
And since that is the case this should also be safe.
UPDATE c SET data = my_json_array_append(data, 7) WHERE id = 42;
In that case you can (should) just "SELECT ... FOR UPDATE" in your transaction. This should prevent the issue. Client B will wait with the read until client A commits.
... or you can include an ETag (optimistic lock) in your rows/mongo documents and do a conditional update to ensure no writes are interposed with your reads.
Postgres CAN do exactly this, in fact, this is the sort of thing it excels at, the transactions in postgres can do this, and so much more.
I think what you're talking about is appending with a single statement, which Postgres can also do. You can deserialize JSON inside a postgres query, alter parts of it, serialize it again, then store it in a single atomic transaction using an UPDATE.
This is exactly what people call ACID TRANSACTION. Which completely lacked on Mongo. MySQL also lacked this, so people had to deal with data loss until they get InnoDB.
Next time you choose a database, always check the ACID compliance. It will make your life far easier.
I really would love it if somebody could go back in HN-Time and track the data on Mongo posts & comments. I've always been slightly skeptical about it, but it always seemed to me that there was a long love affair with it overall. Then people started voicing their frustrations and the community was divided and now it looks acrimonious for everyone.
Isn't that true for most of those 'flavour of the month' technologies?
First a shiny new piece of technology shows up that promises to solve all the issues you have with a mature and widely adopted solution. People get exited and at some point the media picks up on it und starts the hype cycle. More and more decision makers hear about the technology (probably aided by marketing) and decide to adopt it. Implementation takes place and the new technology is deployed into the live environment. Some time goes by and the first issues appear, workarounds and tweaks are devised to mitigate those. After even more time the technology's inherent flaws become apparent. At this point either someone else develops a new iteration of that technology with the promise of solving those issues or the technology is abandoned altogether as it is unable to deliver sufficient value and cannot be fixed.
Is there an algorithm which measures negativity in a given text? Considering the level of grammatical errors and sarcasm involved, one would need a really complicated system, I'd bet. A google search brought me to Sentiment Analysis page on Wikipedia[0], which, after the initial skimming, doesn't seem to link to any implementations.
> What else is missing? Just a protocol implementation?
Yes. Actually that's why I said that I'm nearly sure.
As a side note: We may also need some rumors on being "web scale" (Actually I don't even know much about the events/comments/whatever which lead to that famous video but I still find it funny)
Funny you mention this, we've written one as we transition from MongoDB to PostgreSQL. It was actually much easier than I expected, because MongoDB doesn't really do anything - all the JOIN logic is in code and it mostly consists of "find" and "findOne" calls with minor filtering that is easily translatable to SQL.
The hardest part is re-training all the devs to stop thinking like Mongo devs (ie. "I must make five queries and join the info in code") and let the DB do the heavy lifting it was designed to do.
Yeah, it would be cool if AR came with a `field` method that could be made non-optional through configuration, and enforce that only those fields are accessible.
Not exactly what you were looking for, but you can connect from PostgreSQL to MongoDB via Foreign Data Wrappers - https://github.com/citusdata/mongo_fdw
I know about Foreign Data Wrappers and use them extensively. To clarify, what I want is to have a quick engine swap for some legacy apps without touching anything but config files.
> That's sensible when you consider that postgresql used to be a SQL layer over postgres.
Were they ever actually separate layers? I thought that PostgreSQL was a rename of Postgres that happened shortly (one-two versions) after they swapped query languages from the Ingres-derived QUEL to SQL.
> Finally we sleep quietly, and don’t fear that mongodb will drive out redis to swap once again.
Well duh, Mongo was designed to live on its own server as it tries to claim all of the free memory available. Putting it on the same server with Redis makes no sense.
The case that caused you sleepless nights does not apply to 99% of projects out there.
In case you missed it, this submission is not about PostgreSQL vs MongoDB. It's about the crazy GIF parade in the comments interleaved with thumbs up emojis. You don't see such stuff often on github :)
This just makes me wonder why they chose Mongo in the first place. It sounds like they didn't really consider their needs when initially choosing databases. Mongo has some benefits that when properly implemented far outweigh the negatives. At the same time, it's still relatively young, and doesn't have the "maturity of process" that makes older SQL engines so easy to manage/implement. Eventually, I'm sure, Mongo will solve these issues and be a great database for those who need to utilize its many virtues.
MongoDB is easy. I'll be the first one to spit-roast MongoDB with war stories, but the biggest benefit I keep coming back to is ease of use for a developer. It's very easy to change your data model and rapidly iterate.
As soon as your project starts to solidify, the main benefit of MongoDB is gone.
It still lives in some of my personal projects (e.g. <100mb of data, because even flat files can't mess that up).
Because 1) a lot of startups seem to choose "startup technology", i.e. whatever famous startups are using, just because it seems fashionable and/or they don't consider if it'll actually solve their specific problem; or 2) they're technically curious and end up using it just for fun, even if it's not a good fit for their problem.
I've seen people using Redis for their MVPs, which is hardly necessary to serve 100 or 1000 or 10000 users. When you have a hammer at hand, everything looks like a nail.
As far as Redis goes, is there really much of anything in the space between bdb style KV stores and Redis? If you have design reasons for wanting your KV store in a separate process, why not use Redis?
I agree with your point, my example was about using Redis for performance reasons when designing/building an MVP, a stage where you are unlikely to have a (real) performance problem.
Does anyone have a recommendation for an authoritative guide to either Postgres or Mongodb? One that does more than show you where the levers are, that is.
* didn't read the manual
* poor schema
* didn't maintain the database (compactions, etc.)
In this case, they hit several:
" Its volume on disk is growing 3-4 times faster than the real volume of data it store;"
They should be doing compactions and are not. Using PostgreSQL does not avoid administration; it simply changes the administration to be done.
"it eats up all the memory without the possibility to limit this"
That's the idea -- that memory isn't actually used though; it's just memory mapping the file. It will swap out for something else that needs the space unless you are actively using all the data, in which case you really are using all your memory. Which is why you should put it on its own server...
"it begins to slow down the application because of frequent disk access"
"Finally we sleep quietly, and don’t fear that mongodb will drive out redis to swap once again."
You should be running Mongo on a server by itself. At the very least, if you're having disk contention issues, don't run it on the same server as your other database.
I'm not sure you always need to read the manual for everything, but for your production database, it's probably worth it.