Hacker News new | past | comments | ask | show | jobs | submit login
MeiliSearch: Zero-config alternative to Elasticsearch, made in Rust (github.com/meilisearch)
437 points by qdequelen on March 25, 2020 | hide | past | favorite | 113 comments



I'm impressed.

I have a database with 15k documents, each with around 70 pages of text, HTML formatted.

I'm using ElasticSearch currently, with the Searchkick gem.

30 min playing with MeiliSearch. So far:

- Blazing fast to index, like 10x more performant than using ElasticSearch / Searchkick;

- Blazing fast to search, at least 3x faster in all my random tests so far;

- Literally zero config;

- Uses 140MB of RAM currently, while in my experience ElasticSearch would crash with anything less than 1GB, and needs at least 1.5GB to be usable in production.


Since this got upvoted and I see the devs are replying to questions, here are some! I'm also going to point how ElasticSearch works for comparison.

- The docs state that `Only a single filter is supported in a query`. This is kind of a dealbreaker for my use case, since I need at least a `user_id` and a `status` filter. ElasticSearch can work with multiple filters. Also, don't understand why you call it `filters` instead of `filter` then. Are multiple filters in the roadmap?

- My search UI has a sort by `<select>`, where you can choose, for instance, `last updated asc` or `last updated desc`, amongst others. In my understanding, that would be cumbersome with MeiliSearch, since it would require (1) a settings change to alter the ranking rules order beforehand [0], which would not even work in production due to race conditions or (2) maintain multiple indexes each with a pre-defined ranking rule order and switch between them depending on the UI criteria?

- As an extension of the last question, I see that a lot of what you call "search settings" are considered by ElasticSearch query parameters. For instance, I can easily query ES for the title or description fields just by setting that as a parameter. In MeiliSearch that would require a change in the index settings beforehand, right?

PS: The docs, specially in the Ruby SDK, could use some work in the filters section. It took me a while to understand I should pass a string, like index.search("query", filters: "user_id:3"). I was trying a hash like `filters: { user_id: 3 }`.

[0] https://docs.meilisearch.com/references/ranking_rules.html#u...


Hi, many answers to these questions. But first, I'll put you on the link to the public roadmap. A lot of the stuff we're working on is in there. If you need/love a feature, please add a heart emoji on it. https://github.com/orgs/meilisearch/projects/2

- Currently, we only support single filters. The multiple filters option is coming soon. https://github.com/meilisearch/MeiliSearch/issues/425

- Custom ranking rules on the fly is something imaginable on our solution. We didn't do it yet because it complexifies the search query parameters. We are waiting for feedback like yours to implement this kind of feature.

- To return only the field you need, it's already possible during the search https://docs.meilisearch.com/guides/advanced_guides/search_p.... To restrict attributes to search in during the query. We had this feature on a previous version. But like the last answer, no one used it, and it complexifies the search query.


Just to add a little note here, we are currently working on the functionality of multi-filter queries, because we are aware of our community!


But... what happens if I need more than one instance? I'm genuinely curious. I hope this doesn't come off as an asshole comment. Isn't the whole point of ES versus just plain ol' lucene or solr the horizontal scalability of it?


We are currently working on the sharding and the replication (Raft). Development is progressing well and the functionality should come out soon.


I agree with all your points but a minor nit pick that Solr has been horizontally scalable for quite a while now.


> in production.

I went looking, but found nothing regarding any operations management.

* How does this scale?

* How is it monitored? Where do I get the metrics for it? (indexing performance, search performance, etc.. Stuff not found in the OS)

* Are there any kind of throttling or queueing capabilities?

* What's the redundancy/HA approach?

* I'll ask about backups, though its the least of my worries as indexing databases like this and ES should be able to be rehydrated from source. However, snapshots may be faster to restore than reindexing.

This might be a nice local dev tool for something, but I'm not sure how you run a business critical application with it? I'm wondering if I'm missing something.

Edit: formatting

Edit2: also wondering about security too


Hi, to answer your questions.

* 2 parts.

- Vertical scale: We use LMDB as a key-value store. This one uses the power of memory mapping. It made our search engine use mainly the disk and will do not need a machine that will have TB of RAM.

- Horizontal scale. We are working on sharding and replications (Raft). Development is progressing well, and the functionality should come out soon.

* Currently, it is not monitored at all. This feature is planned. https://github.com/meilisearch/MeiliSearch/issues/523

* We use a queue for updates. You can find here the complete guide https://docs.meilisearch.com/guides/advanced_guides/asynchro...

* As I said previously, we are working on HA with a raft consensus.

* We will add snapshots in no time (disk folder saved in s3). A little more time for backups (version agnostic, need indexing).

We are already working with Louis Vuitton on an application in production. The app is in production from 9 months, and there hasn't been a single problem.


If you are looking for alternatives, check out Typesense as well:

https://github.com/typesense/typesense

It supports multiple filters and has HA for reads as well.


You can add more nodes to scale search speed with ES, can you do the same with this?


More nodes is more throughput, not lower latency.

You’re always bounded by max single shard latency AND by coordination latency.

Ignoring how expensive it would be, over-sharding and over scaling (I.e. low volumes of data per shard and low shards per host) could reduce max single shard/host latency, however it’ll increase coordination latency but also memory (which directly or indirectly will cause more coordination latency).

Perfect data per shard and perfect shard per host numbers are currently an unsolved problem. They heavily depend on the domain, I.e. data types, data volume, data ingest, mappings, query types, query load.

:) if anyone has found a way to consistently add hosts to reduce latency, please let me know!


That's marked for Q3: https://github.com/meilisearch/MeiliSearch/issues?q=is%3Aiss...

But probably 99% of users using ES don't need sharding.


More accurately, 99% of the use cases ES is appropriate for don't need sharding. Every time I've needed to shard ES has been a nightmare bad enough that ES was abandoned.


I had a typical case of ingesting a ton of logs into ES. I needed sharding to keep up with multi-threaded writes while something else is doing intensive search queries. I think sharding was very useful in processing a lot of data efficiently.


Yes, It's marked for Q3 because it's a pretty complex feature. And we had a lot of other features to do at the same time. But the good news is that it's very well advanced and is likely to be released in mid-Q2.


I know the project doesn't claim it, but the title somewhat implies this: I honestly don't understand people claiming ElasticSearch is hard to operate, especially not at small scales. If anything, ElasticSearch for me has been one of the easiest pieces of infrastructure to operate, for me pretty much "zero-config". Let me elaborate: You can run ElasticSearch via Docker command-line, if you want a cluster you just supply IPs of the other nodes. Then you start indexing documents with simple HTTP calls. You can add or remove nodes at any time and don't have to do anything but to start another ElasticSearch instance. If you run out of space or performance just start another node. Everything needed for management, indexing, search is available through HTTP APIs, no tools needed.

Clustered ElasticSearch has been rock-solid for me and I've used it in anger many times. The level of maintenance needed is close to zero, both initially and long-term. Compare that with the abysmal experience of setting up a sharded MongoDB cluster for example...

Please enlighten me how ElasticSearch is "a lot of work to operate" (heard that one multiple times), and what you're comparing it to.


I've been bitten by elasticsearch twice in my career, and I've seen others bitten by it as well. Once you put it in production, you can't just run it from docker on your workstation. You have to set up a cluster with enough capacity for whatever load you're going to throw at it, gracefully handle failures, updates, scaling up as load increases, etc.

There are so many switches and dials to tune, and unless you really learn it in depth, you won't know which ones you need. It's difficult to even determine what hardware requirements you have. And it's a hard sell to tell your business guys "I think elasticsearch will work better if we give it more... CPU? Memory? Disk speed? I'm not really sure." and can't provide any concrete metrics to back that up.

Another place where footguns abound is upgrading from one version to another, especially if you've got plugins installed. There are tricks that you have to learn the hard way.

At this point, I think long and hard before reaching for a solution like elasticsearch. If I've got a DBA whose entire job it is to master the tech and wield it expertly, that's one thing. But if I'm part of an early stage startup, I just can't justify the lost time and potential for catastrophe.


> Once you put it in production, you can't just run it from docker on your workstation.

But that's true for any data store. This isn't any different. Nor is an RDBMS. They all need HA/replication. And that is rarely trivial.

Honestly, I think this is why managed/hosted solutions (AWS RDS for example) are so popular - they remove a large part of the complexity for you.


Not all data stores. You can go quite far with an out of the box Redis instance or even PostgreSQL. No fiddling needed unless you are in triple digit QPS ranges.


I once wasted a whole day trying to get two instances up on GKE. Permissions problems, about ten configs for the JVM alone, many more for ElasticSearch. You would fix one, restart, wait ten minutes, browse 50 pages of logs, google for half an hour, add a config, and goto 1. Never got it going in the end.


My experience is roughly the same, unfortunately.

Personally I don't understand why there are so few search libraries/systems to choose from, given that "search" is one of the fundamental pillars of CS.


Just been bitten by the plugin issue after an apt upgrade.

First time it happened for me and I was pretty angry at it


MeiliSearch is "zero-config" compared to ElasticSearch in terms of setup to make it work for end-user instant and relevant search engine. Our engine follows the Algolia engine in terms of typo-tolerance, relevancy, and speed.

Here is a little comparison to enlighten your questions: https://docs.meilisearch.com/resources/comparison_to_alterna....


Thanks, hadn't seen that, that makes a lot more sense. I agree that ElasticSearch is definitely not "zero-config" when it comes to building certain bespoke applications on it that go beyond simple filtering or query-relevance document search.


May be adding Vespa [1] to comparison?

[1] https://vespa.ai


Elasticsearch is easier than mongo in some ways and harder in others.

I run a few 10TiB ES clusters (which, is not much to be fair) but infrequently find that I have to reindex or reshard the cluster because I can’t just add another node. There’s something to be said for understanding the index rotation too, and access patterns.

It’s easy to make an ES cluster, it’s difficult to maintain one, it’s nearly impossible to debug one.

- if you consider that “it’s slow” is what you have to debug.


That's approximately how large our clusters are. Fortunately, ours are read-only, so our admin story is:

- Hey, a node died! - Run Terraform to stand up a whole new cluster and restore it from a snapshot. - Update the app to point at the new cluster. - Run Terraform to delete the old cluster.

I'm pretty happy with this arrangement.


> if you consider that “it’s slow” is what you have to debug.

This is exactly it. This is a problem you encounter with every database engine, but in most of them you can quickly find the bottleneck and fix it. With elasticsearch... it's a frustrating and expensive game of trial and error.


> I run a few 10TiB ES clusters

For information, what does "10TiB" refer to in this context?

Is it the size of what ES takes in RAM, or the size of ES' index, or is it the total size of the corpus that ES must index? Or corpus size + index ?


Would be interested to hear why you can't add another node in some cases.


If you have 5 shards on an index, adding a 6th node isn’t going to improve performance.

(Contrived example for the sake of illustrating the point)


We built a "Yelp for Colleges" product several years ago. The product needed a unified search where students could search for either a course or a college or a question from the forums with typeahead / autocomplete to get them to where they wanted to go quickly, with support for misspellings.

In all there were about 50k documents, and we mostly cared about the title field. Elasticsearch would randomly bloat up to occupy a huge amount of RAM. Restarting it would make it work for a few days. It would also occasionally crash.

We got rid of it and went with some levenshtein distance based database query

I'd love to use it again sometime but the experience was not good, and Googling for information brought up all kinds of very complex use-cases shared by others


Go on and try MeiliSearch, 50k documents are easily handled by the engine and with not much RAM usage.

It will take you something like 10 minutes to start and populate MeiliSearch, you will be able to test it just by going to the server HTTP url in no time!


Neat.

I implemented the student facing course catalog web interface for a single org. One of the funnests (most fun) parts was the heuristics in the query parser. Like patterns for recognizing course numbers and boosting those exact match results. Really helps you appreciate all the fit & finish that goes into proper search engines.

This was the olden days, when we just used Lucene directly.


Elasticsearch is very memory-intensive, and it's difficult to know exactly how much memory it will actually use, so you just have to throw a lot of RAM at it to avoid OOMing, then monitor it carefully, and hope your query concurrency won't accidentally blow the limits. Understanding why Elasticsearch is caving unpredictably under load is difficult, and GC pausing can be a significant performance sink.


> I honestly don't understand people claiming ElasticSearch is hard to operate, especially not at small scales.

The problem is that ES is deceptively simple to operate. As millions of people who have found things like their medical records shared with the world can attest.


I've had issues scaling writes to it. You can get around it, but maybe this would be better in a high write environment.


I wanted to mention Sonic [1] as another lightweight document indexing alternative written in rust, when I found MeiliSearch to provide a thoughtful comparison page [2]

[1] https://github.com/valeriansaliou/sonic

[2] https://docs.meilisearch.com/resources/comparison_to_alterna...


Mostly "made in Rust", but from the github readme[0] "MeiliSearch uses LMDB as the internal key-value store. The key-value store allows us to handle updates and queries with small memory and CPU overheads."; so a lot of the credit goes to LMDB, and safety implied by "made in Rust" is not, in fact, guaranteed.

Not that I'm complaining - I love LMDB, and it's been rock solid and bug free in my experience (thanks, Howard!) - but it's low level C, not rust, and if you expect the certainty that Rust provides w.r.t to security, race conditions and leaks, be aware that you are not completely getting it.

But other than that: Thanks! This looks like a great project!

[0] https://github.com/meilisearch/MeiliSearch#how-it-works


True, but there are significant components in pure Rust, such as `fst` (full disclaimer, I wrote it). Which is written in purely safe Rust.

> and safety implied by "made in Rust" is not, in fact, guaranteed

Just about every Rust program depends on some C code, usually at least in the form of a libc. So you could lodge this criticism against almost every Rust project.

> and if you expect the certainty that Rust provides w.r.t to security, race conditions and leaks

Rust's safety story covers neither race conditions nor leaks.


> Rust's safety story covers neither race conditions nor leaks.

It covers a type of race condition, namely unsynchronized concurrent access to memory.


Data races and race conditions are orthogonal, according to some: https://blog.regehr.org/archives/490

I think you've disagreed with this in the past, and I don't know how to resolve that. But certainly, I think we can agree that saying that Rust's safety story prevents race conditions is, at minimum, very imprecise.


Oops, I wasn't trying to reopen an argument. I didn't recall us having discussed it before (still don't, but I forget things easily).

And yeah, I'd agree with your last sentence.


Ah yeah, it could have been someone else... Not sure. It was a while ago. Anyway, I don't personally have a strong opinion here on definitions here. (I'm not qualified to.)


(I find this whole post ironic. We made LMDB as a standalone library so other projects could use it, of course. But for applications like this - scalable, fulltext search, nothing comes anywhere close to OpenLDAP. Somebody else in this thread mentioned "triple-digit queries per second" as if that was a difficult achievement. OpenLDAP handles queries with complex filters at millions of queries per second. It also has a complete security model, providing fine grained access control, something none of these newer projects have even begun to think about. You guys all need to study existing tech better before starting to write your own solutions...)


> You guys all need to study existing tech better before starting to write your own solutions

that would be a first


The goal of ElasticSearch, I always thought, was that it scales horizontally and can handle the loss of multiple nodes without availability- or data-loss. It's interesting to build a single-server replacement, and this can likely work for many use-cases, but it's definitely a different approach from ElasticSearch itself.


Replication for MeiliSearch is on its way :) The main differentiator is that MeiliSearch algorithms are made for end-user search not for complex queries. MeiliSearch focus on site search or app search, not analytics on hyper large datasets


what is the size of the largest dataset that you have indexed with MeiliSearch?


We are currently working with this dataset: https://data.discogs.com/?prefix=data/2020/

It's a dataset of 107M songs, 7.6 Gb of compressed files which represents 250 Gb of disk usage by MeiliSearch. We are indexing the release, song and artists names.

We also work with a dataset of 2M cities that we can index in less than 2 minutes when the db uses 3 shards.


Is it just replication (can sustain node failures) or also sharding the data?


We are working on both replication (for high availability and we may use the Raft consensus) and distribution (sharding to scale horizontally and keeping low latency)


Is there a point of contact for this work? A GitHub issue open? This is an area I'd be interested in.


You might want to talk to kero (https://github.com/Kerollmops/). He is currently working on it !


The real power of Elasticsearch for me is the ability to filter logs by:

1. exact match this nested JSON field (with support for lists of values)

2. negative match this nested JSON field (with support for lists of values)

coupled with the ability to filter by "timeframe", then pump it through to visualizations (tables/graphs) in Kibana

MeiliSearch would be cool if it spoke the API Kibana expects from Elasticsearch


If only one could set up Elasticsearch and Kibana using infrastructure-as-code (IaC). I spent several days trying and still haven't succeeded. Elasticsearch config is full of foot-guns.

There are tons of easy setup examples but they lack access control and encryption. All of my servers must write logs. When one of them gets cracked, the attacker must not be able to read all the other servers' logs and steal all the PII. An attacker can use an ARP attack to MITM server connections to Elasticsearch. Without encryption, that attack yields all the PII.

I hope Meilisearch can someday help fill this gap in the free DevOps toolset.


ok I just looked through things a bit but the phrase 0 config worries me - first off I could conceivably run ElasticSearch with 0 configuration but then it needs to make decisions as to what types things are, and how things should be analyzed, and sometimes those decisions are not what I want.

Often ElasticSearch makes a mistake in typing because the programmer has made a mistake in data format, if you fixed that mistake your data would now not fit the format that ElasticSearch has chosen for it (actually don't know if this is still a problem because it has been years since I have ran without all my fields being mapped first) but actually don't see how it couldn't be a problem.

so theoretically if you didn't want to go through the trouble of defining a wrapping you could just reindex all your data fixed in such a way that ElasticSearch will choose a better type for individual fields but why would you do this?

And I mean what does MelliSearch do? I wonder - because looking through this code here https://github.com/meilisearch/MeiliSearch/blob/master/meili... (and not being a rust guy my understanding of it is probably off) but it seems like maybe it is no configuration because it expects you to follow its semantics. Which to be fair lots of things do, at the base level, everything has a title, description, date.

But if I have a domain with different or probably more advanced semantics what happens?

Search Engines are generally configurable because you want to add other fields and rank hits in those fields higher than other things, or maybe do a specific search that only targets those fields - like say Brands based search.

on preview: lots of other people with similar views it seems, I got maybe a bit ranty just because the title sets me off when it just is so wrong it even seems like lying.


Awesome, glad to see all the competition in the search space now. There are other projects like Sonic, Tantivy, Toshi and more that have more functionality if you need alternatives.

Here's a public list of search projects (in rust, c, go): https://gist.github.com/manigandham/58320ddb24fed654b57b4ba2...



That's a good overview of the alternatives. Nice work on this.


Are there any that fit the log searching use case, apart from loki which doesn't do full text searching?


Do you mean you want full-text search against logs? In that case they all work, you just have to ingest the logs as documents in each one.

Or try Seq which is a log-focused system: https://datalust.co/seq


I think vector[0] does what you are asking for.

[0] https://github.com/timberio/vector


Vector is a log/event forwarding system. It moves data between sources and sinks with optional transforms. It doesn't store or search it.


Thanks, but I'm looking for alternatives to the ElasticSearch in that diagram.

Vector looks like a good choice to fill the index, and should be easy enough to modify to support Meili.


Hardly an "alternative to Elastic search" if only because the later is scalable beyond a single machine.

This overhyped description coupled with on-by-default analytics suggests to me MeiliSearch should be dismissed regardless of potential usefulness or technical merit.


The analytics seem pretty benign.

"We send events to our Amplitude instance to be aware of the number of people who use MeiliSearch. We only send the platform on which the server runs once by day. No other information is sent. If you do not want us to send events, you can disable these analytics by using the MEILI_NO_ANALYTICS env variable."


The practice itself is malignant, either explicitly ask upon first run or require a MEILI_YES_ANALYTICS env variable to enable it.


That would require configuration. This is zero-config.


It'd still be zero-config to provide it's primary function. I don't think anyone would say anything against MeiliSearch or not consider it zero-config had they decided to enable analytics off an env var rather than having analytics be sent by default.


I would like to clarify that by analytics we are only talking about 1 ping per day that sends a hash that allows us to uniquely identify a user. The privacy of the users is kept. It just serves us to know if our work is being used.


> I would like to clarify that by analytics we are only talking about 1 ping per day that sends a hash that allows us to uniquely identify a user.

This is already pretty invasive - it discloses activity, number of machines deployed, IP address which identifies location and often organizations and individuals (and which is considered protected personal data per the GDPR afaik).

> The privacy of the users is kept.

No it isn't, see above - even as the authors you don't get to decide what data does or doesn't infringe upon the users privacy.

> It just serves us to know if our work is being used.

This is irrelevant, if you want to condition the use of your work on being let known where and how it is being used then license it accordingly and abide by the applicable laws.


Nice. This looks promising. Very clean API. I like the focus on a narrow use case.

Do you have any information on security topics like using TLS, client authentication, etc?


Currently we think this kind of security can be enabled by a simple nginx setup, allowing autorefesh of certificates easily (e.g. certbot). But in the future we will probably handle that in the engine itself.


I was thinking it might be nice to be able to have an HMACed token with an expiry as an option - so e.g. my main http-serving thing could provide one of those to allow the frontend to read for a bit but kick the user off after half an hour or whatever if the token isn't refreshed.

I've no issue with offloading SSL to a different process though, I tend to prefer doing that anyway a lot of the time.


I understand what you mean but is it for a specific usage of a search engine? I was thinking that this kind of time-restricted tokens could also be managed by the nginx instance, our engine doesn't support that for the time being.


I can make the nginx instance do that.

Was just thinking for "simplest possible deployment" it would be nice to be able to have clients hit the meili instance basically directly to take maximum advantage of the speed.

Note that I'm not saying "this should be priority 1" or anything, I'm already thinking about how to configure nginx to handle the hmac crap if I try meili out :)


MeiliSearch appears to be more of an alternative to Lucene than it is to Elasticsearch. Lucene is the search engine that runs on a single instance; ES is the horizontally-scalable distribution and aggregation layer atop the instances. Absent a similar aggregation layer, MeiliSearch isn't "elastic" as the comparison implies.


Actually Lucene is the library for search that Elastic uses under the hood. Lucene does not provide any HTTP API, which Elastic does. Before using Lucene, you have to build the interface around it.

In this way MeiliSearch is comparable to ES, especially for site search and app search working out of the box as standard with its http api.

MeiliSearch does not offer distribution yet, but it is something the team is working on :)


My concern is that by comparing it to Elasticsearch, you implicitly minimize the amount of engineering effort required to go from single-node to a distributed system. It is a non-trivial exercise that you will undoubtedly realize once you get into the dirty details.


You might be thinking of Solr. Which is the server developed by the Lucene team. Lucene is used in most full-text search systems written in Java.

Also for bonus points there is a distributed version of Solr called Solr Cloud.


Looks pretty good. The single filter approach is restrictive though.

We're currently leaning towards Manticore Search[1], which is a fork of Sphinx Search[2].

[1] https://manticoresearch.com [2] http://sphinxsearch.com



While this might be an alternative for that one specific use case (search bar), it does not feel like a viable alternative to ES. I am sure it is great at that specific case, and don't want to knock them on that. But, I have never used ES for a simple search like they are. when I use ES, I want to store billions of records redundantly and search them by text, time, and/or location. And then create visualizations with the results.

When I first read the title I thought it might be a Rust based Lucene engine or something, and thought that would be pretty cool. Though no idea how that would work. On its own, this is a pretty nifty little tool, however I think the framing as an ES alternative is what feels wrong to me, and apparently others in the comments as well.


https://github.com/tantivy-search/tantivy is a Rust based Lucene-alike.


I've seen ES used for meilisearch's precise use case quite a few times before now.

So it's not "an alternative to ES in general", it's "a thing designed to be an alternative for a subset of ES use cases", and the comparison document is pretty clear about this.

https://docs.meilisearch.com/resources/comparison_to_alterna...


Seconding. Text searching is a horribly hairy problem. I know 2 businesses for which the main source of income is tuning ES/Solr to particular user needs. Starting from performance, through templating case-specific queries to custom plugins.


Wow to see this popup is strange as I was just implementing this yesterday.

It is blindingly fast and easy to setup.


> MeiliSearch can serve multiple indexes, with different kinds of documents, therefore, it is required to create the index before sending documents to it.

https://github.com/meilisearch/MeiliSearch#create-an-index-a...

Indexes are config. This is not really zero-config if you require API calls before it can receive data.

Also, there's nothing about TLS or access control. These will be required for any production deployment. At the minimum, let us specify a TLS key.pem and cert.pem file and create write-only and read-only access tokens.


If you are looking for alternatives, check out Typesense as well:

https://github.com/typesense/typesense


How does it compare to Sonic https://github.com/valeriansaliou/sonic


Are the documents stored on disk or only in memory?


We are using LMDB as the key/value store, so the documents are memory-mapped (usually on disk, and in memory when needed)


Does anyone know if this supports bulk indexing? My team has a lot of data in S3 in parquet format. (We could change the format to something else if that helps).

It would be really nice to be able to point tools like MeilliSearch or ElasticSearch to a data location and have it index all the data without me writing code to send individual records to the API.


This is not something that MeiliSearch supports currently but I am working on making the engine be able to index other formats than JSON, I saw great performance improvements when indexing simple CSVs.

We will probably make MeiliSearch accept different indexable formats (i.e. CSV, JSON, JSON-lines) in a future version.


looks really easy to use. will use this instead of resorting to Postgres full text search for my next app(s)


Looks promising! Are there any docs coming on a production ready setup? Reading below it looks like you're working on high availability, but even in the single machine scenario, do you have recommendations for persistence, fault tolerance etc?


I would say that you must add your own nginx (or else) in front of our HTTP only engine, in term of fault tolerance we are working on high availability.


We use Algolia and use the public API keys with search filters encoded so they can only search their data (I.e. account_id:123)

Is there anything similar here? Otherwise all the queries need to go through our servers first to ensure the filter is present.


The current API key system is a simple and temporary solution.

We will work on a more feature-full API key system including the one you are talking about. This is on our roadmap IIRC.


Sounds more like a potential alternative to Sphinx than Elastic Search.

sphinxsearch.com/


Pretty heavy user of ES here, and one cannot compare the two products.


What is the rationale for not comparing the two?


Only one filter. Very fast limited search. Not great for anything remotely complex like searching with two conditions.


But Rust! Lol


Is there already a browser library that can talk to MeiliSearch?


Yes, there is, you can find all clients on this documentation page: https://docs.meilisearch.com/resources/sdks.html

Note that we are reworking the js library and there will probably be React integration too!


I've never used elasticsearch and only had a brief toy project with Algolia. The demo on the github repo looks awesome.

Can this run on top of my postgres database?


To make MeiliSearch expose the documents that are stored in your PostgreSQL (or any other database) you must extract them and store them in our engine using the HTTP API we provide to you. https://docs.meilisearch.com/references/documents.html#add-o...

For that you will need to also define the different attributes your document is composed of.

We thought about providing a simple tool to extract the documents from an SQL table into the MeiliSearch directly.


Does any tool in this category (This, Elastic, or whatever else) support something like permissions on a per document level?


Hey B! Funny seeing you here. I'm now running product at http://sajari.com

You will find that most tools provide document level permissions to some degree by storing user/group IDs on the document and adding filters to the query. However, it generally requires custom implementation work to integrate it into your systems and prevent spoofing of the filters.

Hope you're doing well!


Thanks for including performance metrics right up front!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: