Hacker News new | past | comments | ask | show | jobs | submit login

Replication for MeiliSearch is on its way :) The main differentiator is that MeiliSearch algorithms are made for end-user search not for complex queries. MeiliSearch focus on site search or app search, not analytics on hyper large datasets



what is the size of the largest dataset that you have indexed with MeiliSearch?


We are currently working with this dataset: https://data.discogs.com/?prefix=data/2020/

It's a dataset of 107M songs, 7.6 Gb of compressed files which represents 250 Gb of disk usage by MeiliSearch. We are indexing the release, song and artists names.

We also work with a dataset of 2M cities that we can index in less than 2 minutes when the db uses 3 shards.


Is it just replication (can sustain node failures) or also sharding the data?


We are working on both replication (for high availability and we may use the Raft consensus) and distribution (sharding to scale horizontally and keeping low latency)


Is there a point of contact for this work? A GitHub issue open? This is an area I'd be interested in.


You might want to talk to kero (https://github.com/Kerollmops/). He is currently working on it !




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: