I have a database with 15k documents, each with around 70 pages of text, HTML formatted.
I'm using ElasticSearch currently, with the Searchkick gem.
30 min playing with MeiliSearch. So far:
- Blazing fast to index, like 10x more performant than using ElasticSearch / Searchkick;
- Blazing fast to search, at least 3x faster in all my random tests so far;
- Literally zero config;
- Uses 140MB of RAM currently, while in my experience ElasticSearch would crash with anything less than 1GB, and needs at least 1.5GB to be usable in production.
Since this got upvoted and I see the devs are replying to questions, here are some! I'm also going to point how ElasticSearch works for comparison.
- The docs state that `Only a single filter is supported in a query`. This is kind of a dealbreaker for my use case, since I need at least a `user_id` and a `status` filter. ElasticSearch can work with multiple filters. Also, don't understand why you call it `filters` instead of `filter` then. Are multiple filters in the roadmap?
- My search UI has a sort by `<select>`, where you can choose, for instance, `last updated asc` or `last updated desc`, amongst others. In my understanding, that would be cumbersome with MeiliSearch, since it would require (1) a settings change to alter the ranking rules order beforehand [0], which would not even work in production due to race conditions or (2) maintain multiple indexes each with a pre-defined ranking rule order and switch between them depending on the UI criteria?
- As an extension of the last question, I see that a lot of what you call "search settings" are considered by ElasticSearch query parameters. For instance, I can easily query ES for the title or description fields just by setting that as a parameter. In MeiliSearch that would require a change in the index settings beforehand, right?
PS: The docs, specially in the Ruby SDK, could use some work in the filters section. It took me a while to understand I should pass a string, like index.search("query", filters: "user_id:3"). I was trying a hash like `filters: { user_id: 3 }`.
Hi, many answers to these questions. But first, I'll put you on the link to the public roadmap. A lot of the stuff we're working on is in there. If you need/love a feature, please add a heart emoji on it. https://github.com/orgs/meilisearch/projects/2
- Custom ranking rules on the fly is something imaginable on our solution. We didn't do it yet because it complexifies the search query parameters. We are waiting for feedback like yours to implement this kind of feature.
- To return only the field you need, it's already possible during the search https://docs.meilisearch.com/guides/advanced_guides/search_p.... To restrict attributes to search in during the query. We had this feature on a previous version. But like the last answer, no one used it, and it complexifies the search query.
But... what happens if I need more than one instance? I'm genuinely curious. I hope this doesn't come off as an asshole comment. Isn't the whole point of ES versus just plain ol' lucene or solr the horizontal scalability of it?
I went looking, but found nothing regarding any operations management.
* How does this scale?
* How is it monitored? Where do I get the metrics for it? (indexing performance, search performance, etc.. Stuff not found in the OS)
* Are there any kind of throttling or queueing capabilities?
* What's the redundancy/HA approach?
* I'll ask about backups, though its the least of my worries as indexing databases like this and ES should be able to be rehydrated from source. However, snapshots may be faster to restore than reindexing.
This might be a nice local dev tool for something, but I'm not sure how you run a business critical application with it? I'm wondering if I'm missing something.
- Vertical scale: We use LMDB as a key-value store. This one uses the power of memory mapping. It made our search engine use mainly the disk and will do not need a machine that will have TB of RAM.
- Horizontal scale. We are working on sharding and replications (Raft). Development is progressing well, and the functionality should come out soon.
* As I said previously, we are working on HA with a raft consensus.
* We will add snapshots in no time (disk folder saved in s3). A little more time for backups (version agnostic, need indexing).
We are already working with Louis Vuitton on an application in production. The app is in production from 9 months, and there hasn't been a single problem.
You’re always bounded by max single shard latency AND by coordination latency.
Ignoring how expensive it would be, over-sharding and over scaling (I.e. low volumes of data per shard and low shards per host) could reduce max single shard/host latency, however it’ll increase coordination latency but also memory (which directly or indirectly will cause more coordination latency).
Perfect data per shard and perfect shard per host numbers are currently an unsolved problem. They heavily depend on the domain, I.e. data types, data volume, data ingest, mappings, query types, query load.
:) if anyone has found a way to consistently add hosts to reduce latency, please let me know!
More accurately, 99% of the use cases ES is appropriate for don't need sharding. Every time I've needed to shard ES has been a nightmare bad enough that ES was abandoned.
I had a typical case of ingesting a ton of logs into ES. I needed sharding to keep up with multi-threaded writes while something else is doing intensive search queries. I think sharding was very useful in processing a lot of data efficiently.
Yes, It's marked for Q3 because it's a pretty complex feature. And we had a lot of other features to do at the same time. But the good news is that it's very well advanced and is likely to be released in mid-Q2.
I know the project doesn't claim it, but the title somewhat implies this: I honestly don't understand people claiming ElasticSearch is hard to operate, especially not at small scales. If anything, ElasticSearch for me has been one of the easiest pieces of infrastructure to operate, for me pretty much "zero-config". Let me elaborate: You can run ElasticSearch via Docker command-line, if you want a cluster you just supply IPs of the other nodes. Then you start indexing documents with simple HTTP calls. You can add or remove nodes at any time and don't have to do anything but to start another ElasticSearch instance. If you run out of space or performance just start another node. Everything needed for management, indexing, search is available through HTTP APIs, no tools needed.
Clustered ElasticSearch has been rock-solid for me and I've used it in anger many times. The level of maintenance needed is close to zero, both initially and long-term. Compare that with the abysmal experience of setting up a sharded MongoDB cluster for example...
Please enlighten me how ElasticSearch is "a lot of work to operate" (heard that one multiple times), and what you're comparing it to.
I've been bitten by elasticsearch twice in my career, and I've seen others bitten by it as well. Once you put it in production, you can't just run it from docker on your workstation. You have to set up a cluster with enough capacity for whatever load you're going to throw at it, gracefully handle failures, updates, scaling up as load increases, etc.
There are so many switches and dials to tune, and unless you really learn it in depth, you won't know which ones you need. It's difficult to even determine what hardware requirements you have. And it's a hard sell to tell your business guys "I think elasticsearch will work better if we give it more... CPU? Memory? Disk speed? I'm not really sure." and can't provide any concrete metrics to back that up.
Another place where footguns abound is upgrading from one version to another, especially if you've got plugins installed. There are tricks that you have to learn the hard way.
At this point, I think long and hard before reaching for a solution like elasticsearch. If I've got a DBA whose entire job it is to master the tech and wield it expertly, that's one thing. But if I'm part of an early stage startup, I just can't justify the lost time and potential for catastrophe.
Not all data stores. You can go quite far with an out of the box Redis instance or even PostgreSQL. No fiddling needed unless you are in triple digit QPS ranges.
I once wasted a whole day trying to get two instances up on GKE. Permissions problems, about ten configs for the JVM alone, many more for ElasticSearch. You would fix one, restart, wait ten minutes, browse 50 pages of logs, google for half an hour, add a config, and goto 1. Never got it going in the end.
Personally I don't understand why there are so few search libraries/systems to choose from, given that "search" is one of the fundamental pillars of CS.
MeiliSearch is "zero-config" compared to ElasticSearch in terms of setup to make it work for end-user instant and relevant search engine. Our engine follows the Algolia engine in terms of typo-tolerance, relevancy, and speed.
Thanks, hadn't seen that, that makes a lot more sense. I agree that ElasticSearch is definitely not "zero-config" when it comes to building certain bespoke applications on it that go beyond simple filtering or query-relevance document search.
Elasticsearch is easier than mongo in some ways and harder in others.
I run a few 10TiB ES clusters (which, is not much to be fair) but infrequently find that I have to reindex or reshard the cluster because I can’t just add another node. There’s something to be said for understanding the index rotation too, and access patterns.
It’s easy to make an ES cluster, it’s difficult to maintain one, it’s nearly impossible to debug one.
- if you consider that “it’s slow” is what you have to debug.
That's approximately how large our clusters are. Fortunately, ours are read-only, so our admin story is:
- Hey, a node died!
- Run Terraform to stand up a whole new cluster and restore it from a snapshot.
- Update the app to point at the new cluster.
- Run Terraform to delete the old cluster.
> if you consider that “it’s slow” is what you have to debug.
This is exactly it. This is a problem you encounter with every database engine, but in most of them you can quickly find the bottleneck and fix it. With elasticsearch... it's a frustrating and expensive game of trial and error.
We built a "Yelp for Colleges" product several years ago. The product needed a unified search where students could search for either a course or a college or a question from the forums with typeahead / autocomplete to get them to where they wanted to go quickly, with support for misspellings.
In all there were about 50k documents, and we mostly cared about the title field. Elasticsearch would randomly bloat up to occupy a huge amount of RAM. Restarting it would make it work for a few days. It would also occasionally crash.
We got rid of it and went with some levenshtein distance based database query
I'd love to use it again sometime but the experience was not good, and Googling for information brought up all kinds of very complex use-cases shared by others
Go on and try MeiliSearch, 50k documents are easily handled by the engine and with not much RAM usage.
It will take you something like 10 minutes to start and populate MeiliSearch, you will be able to test it just by going to the server HTTP url in no time!
I implemented the student facing course catalog web interface for a single org. One of the funnests (most fun) parts was the heuristics in the query parser. Like patterns for recognizing course numbers and boosting those exact match results. Really helps you appreciate all the fit & finish that goes into proper search engines.
This was the olden days, when we just used Lucene directly.
Elasticsearch is very memory-intensive, and it's difficult to know exactly how much memory it will actually use, so you just have to throw a lot of RAM at it to avoid OOMing, then monitor it carefully, and hope your query concurrency won't accidentally blow the limits. Understanding why Elasticsearch is caving unpredictably under load is difficult, and GC pausing can be a significant performance sink.
> I honestly don't understand people claiming ElasticSearch is hard to operate, especially not at small scales.
The problem is that ES is deceptively simple to operate. As millions of people who have found things like their medical records shared with the world can attest.
I wanted to mention Sonic [1] as another lightweight document indexing alternative written in rust, when I found MeiliSearch to provide a thoughtful comparison page [2]
Mostly "made in Rust", but from the github readme[0] "MeiliSearch uses LMDB as the internal key-value store. The key-value store allows us to handle updates and queries with small memory and CPU overheads."; so a lot of the credit goes to LMDB, and safety implied by "made in Rust" is not, in fact, guaranteed.
Not that I'm complaining - I love LMDB, and it's been rock solid and bug free in my experience (thanks, Howard!) - but it's low level C, not rust, and if you expect the certainty that Rust provides w.r.t to security, race conditions and leaks, be aware that you are not completely getting it.
But other than that: Thanks! This looks like a great project!
True, but there are significant components in pure Rust, such as `fst` (full disclaimer, I wrote it). Which is written in purely safe Rust.
> and safety implied by "made in Rust" is not, in fact, guaranteed
Just about every Rust program depends on some C code, usually at least in the form of a libc. So you could lodge this criticism against almost every Rust project.
> and if you expect the certainty that Rust provides w.r.t to security, race conditions and leaks
Rust's safety story covers neither race conditions nor leaks.
I think you've disagreed with this in the past, and I don't know how to resolve that. But certainly, I think we can agree that saying that Rust's safety story prevents race conditions is, at minimum, very imprecise.
Ah yeah, it could have been someone else... Not sure. It was a while ago. Anyway, I don't personally have a strong opinion here on definitions here. (I'm not qualified to.)
(I find this whole post ironic. We made LMDB as a standalone library so other projects could use it, of course. But for applications like this - scalable, fulltext search, nothing comes anywhere close to OpenLDAP. Somebody else in this thread mentioned "triple-digit queries per second" as if that was a difficult achievement. OpenLDAP handles queries with complex filters at millions of queries per second. It also has a complete security model, providing fine grained access control, something none of these newer projects have even begun to think about. You guys all need to study existing tech better before starting to write your own solutions...)
The goal of ElasticSearch, I always thought, was that it scales horizontally and can handle the loss of multiple nodes without availability- or data-loss. It's interesting to build a single-server replacement, and this can likely work for many use-cases, but it's definitely a different approach from ElasticSearch itself.
Replication for MeiliSearch is on its way :) The main differentiator is that MeiliSearch algorithms are made for end-user search not for complex queries. MeiliSearch focus on site search or app search, not analytics on hyper large datasets
It's a dataset of 107M songs, 7.6 Gb of compressed files which represents 250 Gb of disk usage by MeiliSearch. We are indexing the release, song and artists names.
We also work with a dataset of 2M cities that we can index in less than 2 minutes when the db uses 3 shards.
We are working on both replication (for high availability and we may use the Raft consensus) and distribution (sharding to scale horizontally and keeping low latency)
If only one could set up Elasticsearch and Kibana using infrastructure-as-code (IaC). I spent several days trying and still haven't succeeded. Elasticsearch config is full of foot-guns.
There are tons of easy setup examples but they lack access control and encryption. All of my servers must write logs. When one of them gets cracked, the attacker must not be able to read all the other servers' logs and steal all the PII. An attacker can use an ARP attack to MITM server connections to Elasticsearch. Without encryption, that attack yields all the PII.
I hope Meilisearch can someday help fill this gap in the free DevOps toolset.
ok I just looked through things a bit but the phrase 0 config worries me - first off I could conceivably run ElasticSearch with 0 configuration but then it needs to make decisions as to what types things are, and how things should be analyzed, and sometimes those decisions are not what I want.
Often ElasticSearch makes a mistake in typing because the programmer has made a mistake in data format, if you fixed that mistake your data would now not fit the format that ElasticSearch has chosen for it (actually don't know if this is still a problem because it has been years since I have ran without all my fields being mapped first) but actually don't see how it couldn't be a problem.
so theoretically if you didn't want to go through the trouble of defining a wrapping you could just reindex all your data fixed in such a way that ElasticSearch will choose a better type for individual fields but why would you do this?
And I mean what does MelliSearch do? I wonder - because looking through this code here https://github.com/meilisearch/MeiliSearch/blob/master/meili... (and not being a rust guy my understanding of it is probably off) but it seems like maybe it is no configuration because it expects you to follow its semantics.
Which to be fair lots of things do, at the base level, everything has a title, description, date.
But if I have a domain with different or probably more advanced semantics what happens?
Search Engines are generally configurable because you want to add other fields and rank hits in those fields higher than other things, or maybe do a specific search that only targets those fields - like say Brands based search.
on preview: lots of other people with similar views it seems, I got maybe a bit ranty just because the title sets me off when it just is so wrong it even seems like lying.
Awesome, glad to see all the competition in the search space now. There are other projects like Sonic, Tantivy, Toshi and more that have more functionality if you need alternatives.
Hardly an "alternative to Elastic search" if only because the later is scalable beyond a single machine.
This overhyped description coupled with on-by-default analytics suggests to me MeiliSearch should be dismissed regardless of potential usefulness or technical merit.
"We send events to our Amplitude instance to be aware of the number of people who use MeiliSearch. We only send the platform on which the server runs once by day. No other information is sent. If you do not want us to send events, you can disable these analytics by using the MEILI_NO_ANALYTICS env variable."
It'd still be zero-config to provide it's primary function. I don't think anyone would say anything against MeiliSearch or not consider it zero-config had they decided to enable analytics off an env var rather than having analytics be sent by default.
I would like to clarify that by analytics we are only talking about 1 ping per day that sends a hash that allows us to uniquely identify a user. The privacy of the users is kept. It just serves us to know if our work is being used.
> I would like to clarify that by analytics we are only talking about 1 ping per day that sends a hash that allows us to uniquely identify a user.
This is already pretty invasive - it discloses activity, number of machines deployed, IP address which identifies location and often organizations and individuals (and which is considered protected personal data per the GDPR afaik).
> The privacy of the users is kept.
No it isn't, see above - even as the authors you don't get to decide what data does or doesn't infringe upon the users privacy.
> It just serves us to know if our work is being used.
This is irrelevant, if you want to condition the use of your work on being let known where and how it is being used then license it accordingly and abide by the applicable laws.
Currently we think this kind of security can be enabled by a simple nginx setup, allowing autorefesh of certificates easily (e.g. certbot). But in the future we will probably handle that in the engine itself.
I was thinking it might be nice to be able to have an HMACed token with an expiry as an option - so e.g. my main http-serving thing could provide one of those to allow the frontend to read for a bit but kick the user off after half an hour or whatever if the token isn't refreshed.
I've no issue with offloading SSL to a different process though, I tend to prefer doing that anyway a lot of the time.
I understand what you mean but is it for a specific usage of a search engine? I was thinking that this kind of time-restricted tokens could also be managed by the nginx instance, our engine doesn't support that for the time being.
Was just thinking for "simplest possible deployment" it would be nice to be able to have clients hit the meili instance basically directly to take maximum advantage of the speed.
Note that I'm not saying "this should be priority 1" or anything, I'm already thinking about how to configure nginx to handle the hmac crap if I try meili out :)
MeiliSearch appears to be more of an alternative to Lucene than it is to Elasticsearch. Lucene is the search engine that runs on a single instance; ES is the horizontally-scalable distribution and aggregation layer atop the instances. Absent a similar aggregation layer, MeiliSearch isn't "elastic" as the comparison implies.
Actually Lucene is the library for search that Elastic uses under the hood. Lucene does not provide any HTTP API, which Elastic does. Before using Lucene, you have to build the interface around it.
In this way MeiliSearch is comparable to ES, especially for site search and app search working out of the box as standard with its http api.
MeiliSearch does not offer distribution yet, but it is something the team is working on :)
My concern is that by comparing it to Elasticsearch, you implicitly minimize the amount of engineering effort required to go from single-node to a distributed system. It is a non-trivial exercise that you will undoubtedly realize once you get into the dirty details.
While this might be an alternative for that one specific use case (search bar), it does not feel like a viable alternative to ES. I am sure it is great at that specific case, and don't want to knock them on that. But, I have never used ES for a simple search like they are. when I use ES, I want to store billions of records redundantly and search them by text, time, and/or location. And then create visualizations with the results.
When I first read the title I thought it might be a Rust based Lucene engine or something, and thought that would be pretty cool. Though no idea how that would work. On its own, this is a pretty nifty little tool, however I think the framing as an ES alternative is what feels wrong to me, and apparently others in the comments as well.
I've seen ES used for meilisearch's precise use case quite a few times before now.
So it's not "an alternative to ES in general", it's "a thing designed to be an alternative for a subset of ES use cases", and the comparison document is pretty clear about this.
Seconding.
Text searching is a horribly hairy problem. I know 2 businesses for which the main source of income is tuning ES/Solr to particular user needs. Starting from performance, through templating case-specific queries to custom plugins.
> MeiliSearch can serve multiple indexes, with different kinds of documents, therefore, it is required to create the index before sending documents to it.
Indexes are config. This is not really zero-config if you require API calls before it can receive data.
Also, there's nothing about TLS or access control. These will be required for any production deployment. At the minimum, let us specify a TLS key.pem and cert.pem file and create write-only and read-only access tokens.
Does anyone know if this supports bulk indexing? My team has a lot of data in S3 in parquet format. (We could change the format to something else if that helps).
It would be really nice to be able to point tools like MeilliSearch or ElasticSearch to a data location and have it index all the data without me writing code to send individual records to the API.
This is not something that MeiliSearch supports currently but I am working on making the engine be able to index other formats than JSON, I saw great performance improvements when indexing simple CSVs.
We will probably make MeiliSearch accept different indexable formats (i.e. CSV, JSON, JSON-lines) in a future version.
Looks promising! Are there any docs coming on a production ready setup? Reading below it looks like you're working on high availability, but even in the single machine scenario, do you have recommendations for persistence, fault tolerance etc?
I would say that you must add your own nginx (or else) in front of our HTTP only engine, in term of fault tolerance we are working on high availability.
To make MeiliSearch expose the documents that are stored in your PostgreSQL (or any other database) you must extract them and store them in our engine using the HTTP API we provide to you. https://docs.meilisearch.com/references/documents.html#add-o...
For that you will need to also define the different attributes your document is composed of.
We thought about providing a simple tool to extract the documents from an SQL table into the MeiliSearch directly.
Hey B! Funny seeing you here. I'm now running product at http://sajari.com
You will find that most tools provide document level permissions to some degree by storing user/group IDs on the document and adding filters to the query. However, it generally requires custom implementation work to integrate it into your systems and prevent spoofing of the filters.
I have a database with 15k documents, each with around 70 pages of text, HTML formatted.
I'm using ElasticSearch currently, with the Searchkick gem.
30 min playing with MeiliSearch. So far:
- Blazing fast to index, like 10x more performant than using ElasticSearch / Searchkick;
- Blazing fast to search, at least 3x faster in all my random tests so far;
- Literally zero config;
- Uses 140MB of RAM currently, while in my experience ElasticSearch would crash with anything less than 1GB, and needs at least 1.5GB to be usable in production.