Replication for MeiliSearch is on its way :) The main differentiator is that Mei...

rjammala · on March 25, 2020

what is the size of the largest dataset that you have indexed with MeiliSearch?

tpayet · on March 25, 2020

We are currently working with this dataset: https://data.discogs.com/?prefix=data/2020/

It's a dataset of 107M songs, 7.6 Gb of compressed files which represents 250 Gb of disk usage by MeiliSearch. We are indexing the release, song and artists names.

We also work with a dataset of 2M cities that we can index in less than 2 minutes when the db uses 3 shards.

nodesocket · on March 25, 2020

Is it just replication (can sustain node failures) or also sharding the data?

tpayet · on March 25, 2020

We are working on both replication (for high availability and we may use the Raft consensus) and distribution (sharding to scale horizontally and keeping low latency)

rockwotj · on March 25, 2020

Is there a point of contact for this work? A GitHub issue open? This is an area I'd be interested in.

tpayet · on March 25, 2020

You might want to talk to kero (https://github.com/Kerollmops/). He is currently working on it !