Hacker News new | past | comments | ask | show | jobs | submit login
Algolia Acquires Search.io (algolia.com)
226 points by daolf on Sept 15, 2022 | hide | past | favorite | 46 comments



Note: I work on Typesense [1], an open source alternative to Algolia.

This is an interesting acquisition from my perspective because we also just started working on adding vector search to Typesense about a month ago.

So you can now do nearest neighbor searches by bringing your vectors into Typesense. This lets you do things like similarity searches, recommendations, etc.

I’d love to have more beta testers use the feature and give us feedback. If you’d like to try it out, please send me an email: jasonb at typesense dOt org

In any case, congratulations Search.io / Sajari team!

[1] https://typesense.org/


Just wanted to say thank you for Typesense. I use it for my own open source software [1] and integration with it was really easy.

More generally, I think it is great to see development in this area from Algolia and Search.io to Typesense and others. Being able to have a customisable search which is really fast, can make a bit difference on a web-site.

[1] https://datatables.net


Thank you for saying that and thank you for building DataTables!

I've heard about DataTables in various contexts over the years, so it's really cool to hear that you've integrated it with Typesense.


Thank you for building DataTables - very useful web software.


Typesense is so good (and affordable!). I'm so thankful y'all are providing an alternative to Algolia.

It feels similar to Render vs Vercel where clearly players like Vercel & Algolia make the big bucks from enterprise clients and thus make their services less accessible to small companies like mine.

Quick question with regards to vector search, do y'all intend on exposing some basic embedding service to your platform? I think it'd be pretty powerful to add a basic word2vec embedding model so that users who want to play around with vector search can simply just send some text and typesense would do the rest (convert text to embedding, index embedding, etc).


Yes, we plan to do that, but we will start off with first supporting raw vector data type and search on that.


Thank you very much for working on Vector Search. Looking forward trying it out as soon as it gets stable. Thanks again!


We're currently A/B testing TypeSense and Algolia, but the pricing model difference alone makes me almost want to skip the whole process and just go with TypeSense. Algolia's price per search model is a little ridiculous for people building UIs with live search.


Awesome, I'll add it to the comparison folder in this repo: https://vectorsearch.dev


Any books / papers you would recommend to cover basics of search engine implementation? Practicalities like elias fano encoding and WAND searching.


Out of all possible compression codecs and search algorithms, why did you ask about EF and WAND? Have you read about these elsewhere?


love typsense but just some thoguhts on the messaging - most of us have no idea what it means to "bring your vectors", i dont have any vectors to bring - so when you launch this feature please do a little more handholding than usual.

good luck!


Thank you for the feedback! Will make sure we have good documentation around this.


thnk you for working on it! fwiw i listend to your latest changelog episode and it was good but i also left with the exact same question - what the hell does it mean to "bring my own vectors" - so need handholding


some interesting points from this other piece

"While the acquisition price was undisclosed, media reports suggest Algolia paid more than $100 million for Search.io"

https://www.businessnewsaustralia.com/articles/french-unicor...

"Search.io’s mission is to “enable every organization to build smart search and discovery solutions.” The company was founded in 2014 by Hamish Ogilvy and David Howden (originally named Sajari, and recently rebranded to Search.io). "

Contra the business news article: "Search.io was founded in 2020 by Hamish Ogilvy, who will remain with the merged company in the new role of vice president of artificial intelligence."

---

alright can some non marketing person explain in practical usecases why this "hybrid search" is so disruptive? i feel like the article is trying really hard to communicate how big a deal it is, but it falls flat on me because i simply only have pedestrian search knowledge


Vector search is incredibly powerful on matching on context or similarity. For example, automobile and car are semantically similar and, and one will rank well for the other in a search.

Vector search, though, isn't as good on handling typos and not good at all when it comes to as you type searching. Vehic won't match on auto, for example.

We believe that there is use for each of these approaches and a use in a single search, rather than choosing ahead of time or through heuristics after the fact which to choose.

(I'm a Principal PM for Semantic Search and Search Ranking at Algolia.)


Aren't typos just a question of how you generate your vectors/embeddings? I'd be surprised if a transformer with a character level tokenizer trained on a representative source of data (ie: with typos) wouldn't be able to make sense of typos.


Can confirm. We use sentence-level transformer embeddings for (vector) search, clustering, and classification tasks. As an old school ML guy I've been amazed at how robust they are to typos, slang, punctuation, etc.

However, I'm sure there are still applications where you don't have access to a robust embedding for your domain but can apply other techniques to deal with that domain's noise.


Here is decent intro to sentence level transformers & embeddings:

https://www.pinecone.io/learn/sentence-embeddings/


Yes, good point. I still believe that net-net you're going to get better results on typos with a keyword-based search, but I didn't mean to imply that vector searching won't handle typos at all.


> Vector search, though, isn't as good on handling typos and not good at all when it comes to as you type searching. Vehic won't match on auto, for example.

This is incorrect in general case and it entirely depends on the model that is used to produce word vectors and the text corpus the model is trained with.

For instance, fastText model is trained on words, but also their parts (n-grams), so it should produce word vectors that would be close (in cosine-distance) to vectors of their corresponding typos and partials, even if the text corpus that was used to train the model doesn't contain same typos and partially typed words verbatim.


I'd like to add that vector search works not just for natural language, but also for a variety of other types of unstructured data as well. Images, video, user profiles, and pretty much anything else that can be vectorized. Here's an example of image search: https://milvus.io/docs/image_similarity_search.md


Do yall have a technical blog? I would love to both understand the problem and methods, domains yall cross (eg biometrics and fuzzy matching?), and how yall integrate in different industries.

A good search partner is hard to find. PageRank is fun and all, but I believe better methods exist these days.


There's two related problems here: finding relevant results and ranking those results. The first is historically done with massive inverted indexes. Page rank is for the second one of ranking those relevant results.

For the first part you can look into "embeddings" and "approximate nearest neighbor lookup" for the modern approaches. That said inverted indexes are still very popular.

The second one is generally called "learning to rank" so you can find a lot of things written on that topic. The biggest issue here imho is what training data you use which gives you examples of good rankings. The best algorithm trained on garbage will give you garbage.


Here's a link to our engineering blog posts: https://www.algolia.com/blog/engineering/

And our CTO, Julien, wrote an "Inside the Engine" series on how our search engine works. It doesn't have the new "hybrid search" but it shows you the base of how we do search: https://www.algolia.com/blog/engineering/inside-the-algolia-...


These are relatively easy to build and can be used for a variety of tasks like Entity Resolution, https://news.ycombinator.com/item?id=32825679


It really does give you the best of both worlds - resistant to typos, handling synonyms without all the usual hand-written rules, but still able to handle direct searches like ISBNs.

(disclaimer: I work on Semantic Search at Lucidworks)


From the Search.io homepage, “we are the only search technology supporting full upserts. Your updates are instantly live in milliseconds, no matter the scale.“

Anyone able to speculate how they were able to achieve this? Or for that matter beyond good sales & marketing - what technically gave them an edge that market actually needed?


This is easily disproved: Pinecone (disclaimer: I work there) also supports live index updates... "no matter the scale." I think their marketing folks have gone a little far, and I say that as a marketing person.


Last I heard Pinecone doesn’t even support full text search, let alone hybrid indexes, what do you think you are disproving exactly?

Real-time upserts on hybrid and vector indexes is very unusual, please link to how you do this.


Heh, my eyes did pop at that one, considering we've also been doing that over here since 2020 at least ;)


Even self hosted milvus supports this.


On hybrid indexes with full text and vector support?


> Anyone able to speculate how they were able to achieve this?

vespa.ai does it and it's open source


This article is tripping over itself to tell you how great they are.

PR like this just feels like it was written by a college kid or makes me feel like they are compensating for technical inadequacy.

No thanks.


It's not written for technical people, it's for business, management, financial people, the SEC and (potential) investors.


These days, I get to know new services only when they get acquired.


Congrats Hamish, Dave, Alex and team!


> "... both keyword and semantic search in a single API. This new API platform is blazing fast, massively scalable, and, importantly, cost effective. No other vendor offers this today."

Where's the proof that "no other vendor offers this today"?


Congrats to the team at Search.io! Looking forward to seeing what Algolia does with vector search.


I've been interviewing users of vector search and am cataloguing my findings in this repo:

https://github.com/esteininger/vector-search

feel free to watch for updates :)


What about the competition on the relevant market ?


Here is a comparison of various dense vector solutions: https://dmitry-kan.medium.com/how-to-choose-a-vector-databas...

What is missing is Licence's implemention, which helps power Solr/OpenSearch/Elasticsearch



Does this improve or impact docsearch?


I stopped reading after three words: "Algolia disrupts market …". Goodbye. I am still a fan of DocSearch.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: