Hacker News new | past | comments | ask | show | jobs | submit login
How not to break a search engine (sourcegraph.com)
57 points by akpa1 on July 4, 2021 | hide | past | favorite | 17 comments



If you want to see a very bad example of a broken search engine, use the Google voice Assistant to search for "how many raccoons can fit"


I'm not sure that is "broken" but more like inappropriate. I read an article with a list of bad default results (as in with no filter bubble) that are way worse. Like searching for "tween" on bing shows auto suggestions like "tween swimsuits inappropriate" and "tween budding images". Same for all search engines that uses bing like Brave and DDG (at least at the time I read about it). It seems pretty obvious what those searches are looking for.


Replying to myself here but I reported the issue to Bing. If they don't fix this then not only are they serving inappropriate search suggestions but they also run the risk of a huge shitstorm. I'm amazed it haven't been fixed ..


I had to see what the outrage was about. What's the big deal with a sears like store selling back to school clothes?

Something tells me your results are a little more personalized or you are in a country specific search that isn't the US.

Shitstorms these days blow over quickly. Someone is always ready for the next big outrage.


I don't know what you mean with Sears? Do you mean a search result from Bing? Because I'm not talking about results but about Microsoft taking what I'm guessing is often used searches from people looking for underage girls in inappropriate clothing or pictures of underage girl's breasts (budding, pokies, etc.) and using it as default auto-suggestions for all bing.com users (the suggestions that show up in the search bar when you type).

Are you saying you think Microsoft suggesting to people they might want to search for images of 9 to 12 year old girls (tweens) in inappropriate clothing or pictures of their breasts are a good choice of suggestions?

The article I read about it in were from the US, 5000+ km from where I'm at, and the results were the exact same as I get (and I don't use Bing). It has nothing to do with me or where I'm at but nice try implying otherwise.


i just did a regular text search for it and the top results where "how many raccoons can fit up your butt"... no voice assistant involved, and the destination urls were not google's.


For something a little more innocuous, type "Cicero" in the address bar in Chrome or Firefox and see what it autocompletes it to.


You mean the 4.5 millimeters? I believe that's a legitimate unit in use in the publishing business.


Yes, and it is; however, my (intended) point was that it is what almost no one is looking for when they type "Cicero", and if you simply search for Cicero by itself, you get nothing telling you that it's a typographic unit.


Wow, I was not expecting _that_ result.


Neither were the raccoons


@rijnard (the blog post author) is awesome, and all of the code changes he talks about in the blog post are public. You can see all of his recent changes to search code in https://sourcegraph.com/search?q=context:global+repo:%5Egith... in case you want to follow along by just reading the code (that query shows all of his diffs that touch paths containing `search`).


I want to know how to benchmark a search engine.


A good place to start might be the “Evaluation” chapter of Manning et al.‘a “Introduction to Information Retrieval”: https://nlp.stanford.edu/IR-book/html/htmledition/evaluation...

The whole book is superb, and while some of the content is a bit long in the tooth at this point, the evaluation chapter has aged extremely well.

Oh, and if you’re interested in how to evaluate search engine user interfaces, the equivalent chapter in Hearst’s book on Search UI has you covered: https://searchuserinterfaces.com/book/sui_ch2_evaluation.htm...


I'm sure there's tonnes of research and prior art on this subject, but it's an interesting inquiry.

off the top of my head, there's two meaning - performance benchmarking (how fast the search results comes back), and accuracy/fit-for-purposeness benchmarking (how good it is at finding something the user intends).

Performance is easy. It's the accuracy/fit-for-purposeness that would be an interesting benchmark.

I wonder if you have to use an empirical measurement for accuracy - that is, give a random sample of people a target piece of code (or file) to find, and see how long or how many queries it takes to find it.


For quality, you really do need to do human qualititative measures to get a full measure, with all of the fun that involves.

However, you can do things like generate search terms from your top N documents through some method, and then do the queries and confirm the document you generated the term from shows up in the top M results.

This can be circular though if you're not careful; the top N documents may not include important documents that nobody could find.


No live traffic experiment? Seems risky to launch without




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: