SQLite is a embeddable SQL implementation which has been ported to dozens of platforms with no requirements.
Mongita is a Python library.
I like Python as much as the next guy, but the comparison is pretty far off whack. SQLite is popular because it embeds everywhere easily. This doesn't. I can't use this on my iPhone app. It's likely way too fat for Android and awkward at best on Android.
So, by their own benchmarks, unless you are doing totally random lookups of documents by identifier--and mostly reads, with very few writes--you should absolutely use SQLite with JSON values, which absolutely destroy this project in performance?...
The project isn't about performance, it's about providing an embedded version of MongoDB - it clearly states that if you grow too large, then you can easily migrate to the full MongoDB. It also clearly states not to use it if you want a relational database.
I think it would be very difficult to completely beat SQLite with a Python library. My goal wasn't to beat it but to have performance that's within an order of magnitude which I think I've achieved.
In my opinion, the MongoDB interface has a lot of advantages over SQL that make sense in a lot of use cases. Certainly, there are times when a traditional relational database is the right choice. But I do think Mongita fills a niche.
Separately, the JSON1 extension, which the top-level comment refers to, is nice technically but has a challenging interface IMHO https://www.sqlite.org/json1.html
This is an excellent idea and your library is proof of concept.
Surely one day performance can be improved in the future, and the core logic can be rewritten in a compiled language that can be used from other runtimes.
The main criticism seems to be the flawed connection to sqlite. As useful as Mongita might be, and I don't think anyone is denying that this is good work, it doesn't make sense to compare it to sqlite.
Yup. Considered it. I initially settled on no because it might compromise the design goal of 1:1 MongoDB/PyMongo compatibility. Now the idea is being revived as an option alongside the two existing backends. I am concerned there might be an issue with some square-peg high level MongoDB operations simply not fitting into the SQLite round-hole but it's certainly something that's being investigated.
SQLite is a production-grade DBMS: though it has fewer features compared to e.g. PostgreSQL, if those feature suffice, you can throw very sizable workloads at it.
The comparison is apt in that it's a library, not a separate server, and it can store data in memory, which is normally only available in Mongo enterprise.
I am not even sure what is going on now, as I thought the only reason people had ever cared about MongoDB in the first place was in an attempt to get performance they thought was impossible using a relational database (notably without realizing all of the tradeoffs... some inexcusable, such as simply coming with defaults that didn't call fsync, leading to a ton of memes about "database administrators running with scissors" and years of having to convince people that MongoDB was only a project you use if you don't care about data).
The irony, of course, was that it turned out that if you used PostgreSQL and simply stored json in it, you could get better performance without giving up on a relational database... and even more damning was that the translation from MongoDB's query syntax to PostgreSQL's query syntax was trivial, leading to people building adapter layers that were drop-in compatible (and yet still faster and safer than MongoDB). So I guess there was something fitting about seeing someone post a project that is trying to the SQLite of MongoDB, but with benchmarks right off the bar showing it slower than SQLite ;P.
I thereby continue to feel that if you want to use MongoDB, but want a library version of it, it would seem--based on this project's own benchmarks, unless you are doing a read-heavy workload of random documents by index--that what you probably want is a query translator to go from MongoDB's syntax to SQL (working with the SQLite json API), and then store your workload in SQLite... which is (critically) a battle-tested production-grade database engine used as a foundational storage layer for lots of projects.
That said, it isn't like SQLite was a drop-in replacement for other database engines: when I think of "X is to SQLite as MongoDB is to PostgreSQL" I picture something that is attempting to being benefits over SQLite -- in way of performance or scalability -- at the cost of losing the power of being a full relational database (and, because of Mongo's legacy, probably a lot of safety and stability guarantees ;P). (FWIW, I remembered there's being a project like this: UnQLite-- https://news.ycombinator.com/item?id=18101689 --where people ironically seem to have wanted to get it benchmarked against SQLite with its json API ;P.)
It sounds like maybe this project is just trying to provide MongoDB's query layer? You simply don't use SQLite until you "migrate to the full PostgreSQL"... they are designed for different scenarios, and while SQLite is good enough you might be able to use it in a place where PostgreSQL "was called for", a migration might be brutal (as the syntax and type system are different). The project tagline is thereby leading to the wrong mental space--as seemingly multiple other people have since now mentioned on this thread--particularly given that it is written in Python.
Having the best performance isn't necessary for a lot of use cases. Sometimes you just want to store and search a bunch of json objects and for that the mongo api is way more convenient then the postgres/sqlite json options
There isn't a relationship between the two in terms of API. However the team that designed the Berkeley DB storage engine went on to commercialise it via a company called SleepyCat. That company was acquired by Oracle. The team subsequently left Oracle to found WiredTiger (SleepyCat, WiredTiger, geddit? :-)). WiredTiger was focussed on building a modern database storage engine focussed on high CPU count, high memory footprint servers. In 2014 MongoDB acquired WiredTiger and that is now the main storage engine for MongoDB. Interesting Footnote, Michael Cahill is the primary author of "Serializable Isolation for Snapshot Databases" [0] the paper that introduced Serializable Snapshot Isolation (SSI). SSI is the concurrency control mechanism used in Postgres [1][2].
[0] - https://courses.cs.washington.edu/courses/cse444/08au/544M/R...
SQLite performs very well and it scales vertically to millions of ops pretty easily. Then you can say "well, we'll worry about more performance later". In this case, perhaps not so much.
Cool project but I have to admit I sure wish this was written on top of SQLite, rather than just mentioned it, implementing a query language shim for MongoDB on SQLite would be an amazing project. In the absence of such a project though, this is a pretty great alternative to have.
Thank you! I actually did consider doing exactly what you said. Early on, I decided one my goals for the project would be to make it easy to swap back and forth between Mongita and PyMongo/MongoDB. For me, that meant getting as close as possible to their implementation and using things like BSON, ObjectIds, etc. For that, I sacrificed some performance but I think most people who would use something like Mongita would prefer a faithful reproduction over a few clock cycles.
Oh thanks for the insight, that makes a lot of sense -- pretty sure mongita is one of the first projects I've seen even attempt this, and it makes sense to target the platform that fits your use case (and obviously everyone who uses PyMongo) the tightest.
Your project looks very high quality -- benchmarks, tests, and comparisons are basically an indicator in my mind. Looking through the code you've also already left me (or someone else) space to try doing the SQLite as long as we implement it as an Engine[0] -- am I understanding that right? If I trace the code from database.py to engines/.py it looks like that. I really like the balance you've picked between pragmatism and space for expansion/modification.
A couple questions:
- How do you feel about type annotations in* the code (as opposed to just the comments, as far as I can see)
I like your idea. As I understand it, since SQLite performs better on most benchmarks anyway, why not have an engine subclass that utilizes SQLite itself?
This would be a little trickier than your envisioning. The engine class is the lowest layer but doesn't handle the slow bits like the finds, managing the indicies, etc. So to really get the benefit of SQLite you would have to pull those slow bits in.
I do really like the idea of offering a third option so that you have
- Memory (fastest)
- SQLite (almost as fast but compromises on perfect MongoDB reproduction)
- Disk (slower but is faithful to MongoDB)
Happy to discuss it more if you want to email me.
> How do you feel about type annotations in* the code (as opposed to just the comments, as far as I can see)
I used type annotations for a while when they came out but didn't find them to offer advantages over docstrings. The world might have moved over to type annotations though and I've not been aware.
> PyPy? I wonder if you'd get a ~free speedup
I'm getting a lot of suspicion right now from other commenters who are convinced I have my thumb on the benchmarks and I don't want to give them any more excuses :tears-of-joy:. But joking aside, you're probably right and it's not something I had considered.
Funny enough I started something like that as a ruby library and stopped when I got to the query language shim because I just ran out of time and motivation. https://github.com/wa9ace/squongo
Thanks for sharing -- yeah honestly I think the shim is the huge chunk of work here -- you would basically become a mongo expert/core comitter at the end of it. The same can be said of SQL but it feels like SQL is a bit less ad hoc (because it is, if only for the amount of committees involved), and there are some libraries that already do the parsing and stuff.
I created something like this for PostgreSQL. It's a Node.js service that implements the MongoDB wire protocol and converts the query to use jsonb. It works pretty well, but it also got increasingly hard to make it compatible because MongoDB has a lot of quirks.
As a F/OSS freeloader, I would very much like you a link to this project... Feels like something I'd contribute to or do something crazy like try to start a business (2ndquadrant/edb style) on.
Thanks! I didn't realize I put the link. Here it is, http://github.com/thomas4019/pgmongo. I'd love to hear your thoughts on it and am happy to answer any questions.
SQL is a query language, MongoDB has a query language. I think that the tagline is reasonable and is made more precise in the first sentence of the README.
> Mongita is a lightweight embedded document database that implements a commonly-used subset of the MongoDB/PyMongo interface.
Yeah, a big benefit of SQLite is the fact that it's embeddable, and easily can be communicated with from any language that can FFI out to C (which is to say, essentially any language)
Author here. My goal with that one-liner was to convey immediately what it is - an embedded database that implements the PyMongo/MongoDB API.
Is it semantically perfect? Probably not. Will Mongita be useful to Python developers who are accustomed to saving their data in idiosyncratic JSON files because they don't want the overhead of a MongoDB server? Maybe! I hope so.
Thanks for sharing your project. It’s probably not a good comparison. SQLite is in such high esteem and such a well developed high quality mature product. So the comparison is sort of inviting criticism that gets in the way of people appreciating what is good about your project.
This is good and helpful feedback. Thank you. I'm humbled by how well this has done so far but definitely feeling the criticism. A title like, "Mongita is embedded MongoDB for Python", while being a statement without ambiguity, doesn't immediately convey the vision for why this might be useful to people. SQLite by contrast is a nice rhetorical anchor that people understand. I'll have to think through whether there is a better way to convey it that fits that goal.
Right, a little controversy never hurts to attract attention. You will have to balance between the positive attention through comparison to an known good vs. potentially unmet expectations that known good brings.
If your goal is to reach pythonistas with this, then suffering the “this isn’t really like SQLite” critique will be worth it. On the contrary, if the vision is really to be like SQLite, i.e. broadly portable to many languages and platforms and very robust high quality code then the disappointment in its current implementation may outweigh the near term attention.
In either case this is probably a good measure of demand for such a thing.
I am astonished that this is this is the sentiment. I've always of SQLite as a small, self-contained implementation of a SQL DB server. And so the comparison makes perfect sense to me.
To me it was a good analogy. But I think the key for me thinking it's reasonable is that I go into Show HN posts with milder expectations than other submissions.
I would never pick MongoDB over a traditional relational database, so I can't judge this well, but I can still see how people would find it useful. Nice work!
I don't think it is whatever it's written in, because I can't imagine what it would mean? Unless there's a DBMS implementation called simply 'SQL' that I'm not aware of? (Or if Mongo calls its query language 'MongoDB'?)
Seems like '.. to (My|Postgre)SQL' would be a better fit?
SQLite is a portable DBMS library that exposes a SQL language interface that is built and runs under every OS and language under the sun, for the most part. It is probably the single-most distributed library on the planet (in aggregate). And likely one of the most used libraries (top 10) on just about every given OS and platform/language.
> I question that it is the most distributed library on the planet. Is there something backing that up? It would be cool but very shocking.
I recall seeing that claim made ~2 years ago, with data to back it up. I think many people forget that it is shipped (or statically compiled into an application that is shipped) on every android device (handset, TVs, etc), every Linux installation, every Windows installation and every OSX installation.
I do not know of any other library that has that penetration.
I don't know about iOS.
I can't say it's the most, but it does get used a lot. Chromium and descendants, Firefox, bunch of chat apps (Telegram, WeChat), and so on. You probably have multiple copies per machine.
I guess the idea relates to "a SQL database" - might be technically incorrect, but with Mongo it's sort of embedded in NoSQL - that also originally meant "no SQL [language]", but has turned to mean "no SQL and not a classical RDBMS with databases, tables and columns, rows, views, procedures..."
Yeah, but, at least, with python, devs can prototype and verify, early and fast. Imagine implementing this in C/C++ or even Rust. You get extra layers of problems to handle in your brain.
The point is not using MongoDB here, and that means the database logic has to be reimplemented, likely in a different way because of different requirements. Using python does hurt embedding and performance, but at least one can easily get how such library should be designed, which is, in my opinion, the hard part.
It's a good question and to be accurate, depending on the benchmark, Mongita is about the same speed at SQLite to several-times slower.
There is less happening algorithmically than you would think. Where the tricky slow bits do exist, they have largely fallen into the happy-path of fast data structures in the Python language/stdlib. I also use sortedcontainers for indexes which helped quite a bit (http://www.grantjenks.com/docs/sortedcontainers/).
Very cool, I'm looking for MongoDB to be able to run on AWS Lambda and be more serverless. This seem like a step in the good direction.
Personally I think JS would have been a better choice to implement this in than Python given all the mongo queries are in javascript db.collection('users').find({_id:'AN_ID'})
Also having it made in javascript would have open the option to embed it in the browser later on.
This would be super nice to have when developing mobile apps. I’ve used sqllite when developing with react-native, but can see myself using a mongita type solution.
if stuff like mongita and sqlite would exist for all kinds of databases (graph,kv,xml; as for document and sql it already exists), couldn't we "just make distributed versions" if we put stuff ontop of it? like with dqlite/rqlite with sqlite?
or does there have to be some inherent mechanisms withIN the database to support distributed versions?
There are generic distributed consensus algorithms out there. The most famous are Paxos and Raft. In theory, you can jam those on top of any system you like, as long as it has well-defined state transitions.
Making it fast - or usable at all in the presence of heavy contention - is another story. Distributing a write-heavy workload over a cluster is useless if the cluster ends up rejecting most updates because they get preempted by some other write. Solving that problem usually means analyzing the underlying system to figure out which parts need to be truly atomic and which you can get away with doing in parallel. That job is a) really complex and b) filled with opportunities to make significant performance gains in exchange for weaker safety guarantees, like losing committed writes in a crash, or allowing individual nodes to reorder independent writes.
I think as I understand it, your question is whether we can't just put a distributed layer on top of basic embedded databases.
This is really interesting and is something I came across while writing this. It turns out that concurrency is actually quite difficult because either you have global locks, which means only one process can write to the database/indicies at once and slows things down considerably, or you have to do a lot of clever things to avoid those locks.
> I think as I understand it, your question is whether we can't just put a distributed layer on top of basic embedded databases.
Exactly!
> This is really interesting and is something I came across while writing this. It turns out that concurrency is actually quite difficult because either you have global locks, which means only one process can write to the database/indicies at once and slows things down considerably, or you have to do a lot of clever things to avoid those locks.
well, that would be also the case with traditional db services, the question can they have more granular mechanisms for more granular locking than embedded databases. but perhaps they even can have only less granular locking?
It's sad that author did not mentioned that disk engine basically stores copy of data in memory. And it seems from benchmark code that reading doesn't hit disk at all.
Okay, so looking at the first two tests - "Retrieve all documents" and "Get 1000 documents by ID" ...
If you switch the order around, does it make a difference to the benchmark? Because I suspect that the first test preloads all records into RAM, and the second test simply searches RAM, which is not what we usually do with SQLite. We don't cache all records before searching.
Switch those first two tests around, and lets see if it makes a difference.
You're right. Fixed. SQLite is more-so the winner but Mongita appears to squeak ahead in id lookups https://github.com/scottrogowski/mongita/blob/master/assets/... (to be fair, it might be the thin translation layer I built). Regular MongoDB struggles a lot and I don't think I'm being unfair to it in any way afaik.
O(n) cache size is fine (and IMHO, preferable) for small datasets. For large datasets, you're correct. Cache eviction is important when memory usage gets too high. I didn't have it explicitly on the README as something that needs to get done but now it is. So thank you for pointing it out.
afaict, the disk_engine does not lock the file when reading / writing. Also, the storage engine is bson and relies on cached offsets. There's also this non-atomic defrag method.
I dunno.. wouldn't touch it with a pole wearing a hazmat suit, sorry.
Using sqlite for storage and querying would've been better. Heck, that would be pretty great for moving a few smalller (server) applications off of mongodb. Although they're using ruby
Unfortunately every time I see the misspelling in this thread I involuntarily cringe. I suppose "MongoDB" is named after a slur used to insult people with Down syndrome, so maybe calling this project the Spanish equivalent of "Magolia", "Mamalian", or "Meercat" is a clever reversal of the insult into a form of self-deprecation on the part of the author, who is wittily feigning illiteracy? Or perhaps it is intended to ridicule the speling of Spainards and other speekers of Spansh? Or programmers who decided to yoke their applications to fake open source?
Even if correctly spelled, perhaps the name would be more appropriate to a debugging tool than to a hash table implementation.
You're welcome! I hope the information is useful in helping scottrogowski carefully consider how many people he wants to insult. However, the DLE is not a Wiki.
It seems more likely to me that it's not a misspelling of "monjita" but rather just a naive application of the Spanish -ito/-ita diminutive suffix to "Mongo".
These two possibilities are not mutually exclusive.
While your guess about the thought processes of the originator may well be correct, it is still the case that the result, "Mongita", is ① unambiguously Spanish and ② unambiguously pronounced in Spanish as [monxita], which is a real Spanish word, the diminutive of the common word monja, meaning "nun". But [monxita] is spelled "monjita".
The result is that what may well have been an incorrect application of the diminutive suffix (the correct result would be "Monguito") produced a misspelling of "monjita". It's just as clearly misspelled Spanish as "Ke keres aser?" or "yerba maté", if not more so. So you can expect most Spanish speakers to read it as ridiculing the literacy of an unspecified person—more so if they also know English, given that "mongo" has been an English word used for ridiculing someone's intelligence for many generations.
Anyone who reads Spanish fluently will react to the misspelling "Mongita"—whether with distaste, amusement, pity, or defensiveness—long before they have time to think through the possible motivations of the MongoDB developers.
SQLite is a embeddable SQL implementation which has been ported to dozens of platforms with no requirements.
Mongita is a Python library.
I like Python as much as the next guy, but the comparison is pretty far off whack. SQLite is popular because it embeds everywhere easily. This doesn't. I can't use this on my iPhone app. It's likely way too fat for Android and awkward at best on Android.