Hacker News new | past | comments | ask | show | jobs | submit | snikolaev's comments login


Meilisearch doesn't support BM25, does it?


Nope, it doesn't. It's based on Cascade Ranking, also called [bucket sorting][1]. We released our new Hybrid search ranking system, combining the best full-text search results (our Cascade Ranking) with semantic results (with arroy, our full-Rust Vector Store). You can try that at https://wheretowatch.meilisearch.com.

[1]: https://en.wikipedia.org/wiki/Bucket_sort


Check out Manticore Search for your use case. It's open-source, cost-effective, and doesn't require keeping everything in memory.

Key points:

- Columnar Storage: Efficiently handles large datasets on disk, ideal for terabyte-scale data. It's not enabled by default but can be set up easily with "CREATE TABLE ... ENGINE='columnar'".

- Faceted Search: Probably easier than anywhere else with just "FACET <field name>" added to your "SELECT" query.

- MySQL Protocol and SQL Support: If you’re familiar with SQL and MySQL, it's easier to get started compared to other search engines.


Thanks for your recommendation. I ended up going with Quickwit, since it lets me store data on S3.



> We filtered out PISA Search and Manticore because neither of them offers search-as-you-type and facet search features

Manticore does support facet search and it's quite powerful in Manticore:

- docs - https://manual.manticoresearch.com/Searching/Faceted_search#...

- interactive course - https://play.manticoresearch.com/faceting/

Search-as-you-type depends more on the client, not on the backend. However, Manticore provides the autocomplete and fuzzy features (both in beta stage though). More info here https://github.com/manticoresoftware/manticoresearch/issues/...


Manticore Search


MySQL's full-text ranking capabilities are quite limited and AFAIK full-text wasn't a priority for them lately. The related article is "Rankings with InnoDB Full-Text Search" [1]

If it works for you - great. If you need more flexibility in terms of data tokenization, matching and ranking you can consider Manticore Search [1] instead of Elasticsearch since it's a continuation (a fork made in 2017) of the Sphinx search engine mentioned in the article on mysql.com and has a better integration with MySQL than Elasticsearch (e.g. you can use Linux mysql client or any programming language mysql connector to make queries to Manticore).

[1] https://dev.mysql.com/blog-archive/rankings-with-innodb-full...

[2] https://github.com/manticoresoftware/manticoresearch


> I decided to experiment with this setup and the NY Taxi Dataset. The initial goal was to populate ElasticSearch with ~14 million rows, loading data from a compressed parquet file of ~350 MB.

> I tried multiple times, but the operation failed continuously, due to JVM memory constraints

Here's a script https://github.com/db-benchmarks/db-benchmarks/blob/main/tes... which loads 1.7B NYC taxi ride documents into Elasticsearch.


> Meilisearch focuses on simplicity, relevancy, and performance.

> excellent relevance out of the box

> if ease of use, performance, and relevancy are important to you, Meilisearch was made for you

Is there a benchmark that shows Meilisearch outperforming Elasticsearch in terms of relevance score? I couldn't find Meilisearch listed on https://github.com/beir-cellar/beir.


We are not in contact with beir or the owner of the bei-cellar oganisation.

However, we started tracking our relevancy with the TREC 4 & TREC 5 data which are provided by the NIST organisation [1]. I can only tell that the results are very good and that we continue to improve that. We will talk about that in a blog post.

[1]: https://nist.gov


It's not open source since 2017. The open source fork is https://github.com/manticoresoftware/manticoresearch


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: