Algolia Acquires Search.io

jabo · on Sept 15, 2022

Note: I work on Typesense [1], an open source alternative to Algolia.

This is an interesting acquisition from my perspective because we also just started working on adding vector search to Typesense about a month ago.

So you can now do nearest neighbor searches by bringing your vectors into Typesense. This lets you do things like similarity searches, recommendations, etc.

I’d love to have more beta testers use the feature and give us feedback. If you’d like to try it out, please send me an email: jasonb at typesense dOt org

In any case, congratulations Search.io / Sajari team!

[1] https://typesense.org/

theallan · on Sept 15, 2022

Just wanted to say thank you for Typesense. I use it for my own open source software [1] and integration with it was really easy.

More generally, I think it is great to see development in this area from Algolia and Search.io to Typesense and others. Being able to have a customisable search which is really fast, can make a bit difference on a web-site.

[1] https://datatables.net

jabo · on Sept 15, 2022

Thank you for saying that and thank you for building DataTables!

I've heard about DataTables in various contexts over the years, so it's really cool to hear that you've integrated it with Typesense.

slivanes · on Sept 15, 2022

Thank you for building DataTables - very useful web software.

flurly · on Sept 15, 2022

Typesense is so good (and affordable!). I'm so thankful y'all are providing an alternative to Algolia.

It feels similar to Render vs Vercel where clearly players like Vercel & Algolia make the big bucks from enterprise clients and thus make their services less accessible to small companies like mine.

Quick question with regards to vector search, do y'all intend on exposing some basic embedding service to your platform? I think it'd be pretty powerful to add a basic word2vec embedding model so that users who want to play around with vector search can simply just send some text and typesense would do the rest (convert text to embedding, index embedding, etc).

karterk · on Sept 15, 2022

Yes, we plan to do that, but we will start off with first supporting raw vector data type and search on that.

slig · on Sept 15, 2022

Thank you very much for working on Vector Search. Looking forward trying it out as soon as it gets stable. Thanks again!

bilalq · on Sept 15, 2022

We're currently A/B testing TypeSense and Algolia, but the pricing model difference alone makes me almost want to skip the whole process and just go with TypeSense. Algolia's price per search model is a little ridiculous for people building UIs with live search.

Beefin · on Sept 16, 2022

Awesome, I'll add it to the comparison folder in this repo: https://vectorsearch.dev

ashish01 · on Sept 15, 2022

Any books / papers you would recommend to cover basics of search engine implementation? Practicalities like elias fano encoding and WAND searching.

flatdog · on Sept 18, 2022

Out of all possible compression codecs and search algorithms, why did you ask about EF and WAND? Have you read about these elsewhere?

swyx · on Sept 16, 2022

love typsense but just some thoguhts on the messaging - most of us have no idea what it means to "bring your vectors", i dont have any vectors to bring - so when you launch this feature please do a little more handholding than usual.

good luck!

jabo · on Sept 16, 2022

Thank you for the feedback! Will make sure we have good documentation around this.

swyx · on Sept 16, 2022

thnk you for working on it! fwiw i listend to your latest changelog episode and it was good but i also left with the exact same question - what the hell does it mean to "bring my own vectors" - so need handholding

swyx · on Sept 15, 2022

some interesting points from this other piece

"While the acquisition price was undisclosed, media reports suggest Algolia paid more than $100 million for Search.io"

https://www.businessnewsaustralia.com/articles/french-unicor...

"Search.io’s mission is to “enable every organization to build smart search and discovery solutions.” The company was founded in 2014 by Hamish Ogilvy and David Howden (originally named Sajari, and recently rebranded to Search.io). "

Contra the business news article: "Search.io was founded in 2020 by Hamish Ogilvy, who will remain with the merged company in the new role of vice president of artificial intelligence."

---

alright can some non marketing person explain in practical usecases why this "hybrid search" is so disruptive? i feel like the article is trying really hard to communicate how big a deal it is, but it falls flat on me because i simply only have pedestrian search knowledge

dustincoates · on Sept 15, 2022

Vector search is incredibly powerful on matching on context or similarity. For example, automobile and car are semantically similar and, and one will rank well for the other in a search.

Vector search, though, isn't as good on handling typos and not good at all when it comes to as you type searching. Vehic won't match on auto, for example.

We believe that there is use for each of these approaches and a use in a single search, rather than choosing ahead of time or through heuristics after the fact which to choose.

(I'm a Principal PM for Semantic Search and Search Ranking at Algolia.)

marcinzm · on Sept 15, 2022

Aren't typos just a question of how you generate your vectors/embeddings? I'd be surprised if a transformer with a character level tokenizer trained on a representative source of data (ie: with typos) wouldn't be able to make sense of typos.

evrydayhustling · on Sept 15, 2022

Can confirm. We use sentence-level transformer embeddings for (vector) search, clustering, and classification tasks. As an old school ML guy I've been amazed at how robust they are to typos, slang, punctuation, etc.

However, I'm sure there are still applications where you don't have access to a robust embedding for your domain but can apply other techniques to deal with that domain's noise.

O__________O · on Sept 15, 2022

Here is decent intro to sentence level transformers & embeddings:

https://www.pinecone.io/learn/sentence-embeddings/

dustincoates · on Sept 15, 2022

Yes, good point. I still believe that net-net you're going to get better results on typos with a keyword-based search, but I didn't mean to imply that vector searching won't handle typos at all.

dnc · on Sept 15, 2022

> Vector search, though, isn't as good on handling typos and not good at all when it comes to as you type searching. Vehic won't match on auto, for example.

This is incorrect in general case and it entirely depends on the model that is used to produce word vectors and the text corpus the model is trained with.

For instance, fastText model is trained on words, but also their parts (n-grams), so it should produce word vectors that would be close (in cosine-distance) to vectors of their corresponding typos and partials, even if the text corpus that was used to train the model doesn't contain same typos and partially typed words verbatim.

fzliu · on Sept 15, 2022

I'd like to add that vector search works not just for natural language, but also for a variety of other types of unstructured data as well. Images, video, user profiles, and pretty much anything else that can be vectorized. Here's an example of image search: https://milvus.io/docs/image_similarity_search.md

tomrod · on Sept 15, 2022

Do yall have a technical blog? I would love to both understand the problem and methods, domains yall cross (eg biometrics and fuzzy matching?), and how yall integrate in different industries.

A good search partner is hard to find. PageRank is fun and all, but I believe better methods exist these days.

marcinzm · on Sept 15, 2022

There's two related problems here: finding relevant results and ranking those results. The first is historically done with massive inverted indexes. Page rank is for the second one of ranking those relevant results.

For the first part you can look into "embeddings" and "approximate nearest neighbor lookup" for the modern approaches. That said inverted indexes are still very popular.

The second one is generally called "learning to rank" so you can find a lot of things written on that topic. The biggest issue here imho is what training data you use which gives you examples of good rankings. The best algorithm trained on garbage will give you garbage.

dustincoates · on Sept 15, 2022

Here's a link to our engineering blog posts: https://www.algolia.com/blog/engineering/

And our CTO, Julien, wrote an "Inside the Engine" series on how our search engine works. It doesn't have the new "hybrid search" but it shows you the base of how we do search: https://www.algolia.com/blog/engineering/inside-the-algolia-...

kartoolOz · on Sept 15, 2022

These are relatively easy to build and can be used for a variety of tasks like Entity Resolution, https://news.ycombinator.com/item?id=32825679

peterstjohn · on Sept 15, 2022

It really does give you the best of both worlds - resistant to typos, handling synonyms without all the usual hand-written rules, but still able to handle direct searches like ISBNs.

(disclaimer: I work on Semantic Search at Lucidworks)

O__________O · on Sept 15, 2022

From the Search.io homepage, “we are the only search technology supporting full upserts. Your updates are instantly live in milliseconds, no matter the scale.“

Anyone able to speculate how they were able to achieve this? Or for that matter beyond good sales & marketing - what technically gave them an edge that market actually needed?

gk1 · on Sept 15, 2022

This is easily disproved: Pinecone (disclaimer: I work there) also supports live index updates... "no matter the scale." I think their marketing folks have gone a little far, and I say that as a marketing person.

mish15 · on Sept 16, 2022

Last I heard Pinecone doesn’t even support full text search, let alone hybrid indexes, what do you think you are disproving exactly?

Real-time upserts on hybrid and vector indexes is very unusual, please link to how you do this.

peterstjohn · on Sept 15, 2022

Heh, my eyes did pop at that one, considering we've also been doing that over here since 2020 at least ;)

kartoolOz · on Sept 15, 2022

Even self hosted milvus supports this.

mish15 · on Sept 16, 2022

On hybrid indexes with full text and vector support?

ddorian43 · on Sept 15, 2022

> Anyone able to speculate how they were able to achieve this?

vespa.ai does it and it's open source

chairmanwow1 · on Sept 15, 2022

This article is tripping over itself to tell you how great they are.

PR like this just feels like it was written by a college kid or makes me feel like they are compensating for technical inadequacy.

No thanks.

Cthulhu_ · on Sept 15, 2022

It's not written for technical people, it's for business, management, financial people, the SEC and (potential) investors.

mrwnmonm · on Sept 15, 2022

These days, I get to know new services only when they get acquired.

shermozle · on Sept 15, 2022

Congrats Hamish, Dave, Alex and team!

csmpltn · on Sept 15, 2022

> "... both keyword and semantic search in a single API. This new API platform is blazing fast, massively scalable, and, importantly, cost effective. No other vendor offers this today."

Where's the proof that "no other vendor offers this today"?

tullie · on Sept 16, 2022

Congrats to the team at Search.io! Looking forward to seeing what Algolia does with vector search.

Beefin · on Sept 15, 2022

I've been interviewing users of vector search and am cataloguing my findings in this repo:

https://github.com/esteininger/vector-search

feel free to watch for updates :)

julienfr112 · on Sept 15, 2022

What about the competition on the relevant market ?

donretag · on Sept 15, 2022

Here is a comparison of various dense vector solutions: https://dmitry-kan.medium.com/how-to-choose-a-vector-databas...

What is missing is Licence's implemention, which helps power Solr/OpenSearch/Elasticsearch

Beefin · on Sept 15, 2022

working on a comparison table: https://github.com/esteininger/vector-search/tree/master/fou...

dsmmcken · on Sept 15, 2022

Does this improve or impact docsearch?

esher · on Sept 15, 2022

I stopped reading after three words: "Algolia disrupts market …". Goodbye. I am still a fan of DocSearch.