Fun to see this here again. Love the controversy and discussion this brings. I wrote it in a good humor and I still giggle at the some of reactionary responses. “How dare you call this a falsehoods list” and so forth.
Anyway things are getting better now. More people got into search and info retrieval since I dropped this list. And there’s a great growing community out there of people who find and adore the problem space. For those who enjoy reading, I’m glad you do! For the people who don’t, … :)
Just a PSA: "reactionary" doesn't mean "in reaction to". It specifically refers to opposition to social change ("reject modernity, embrace tradition"), and opposition to liberalism and progressivism in particular, like a conservative over-correction (not just rejecting further changes but a desire to revert to an idealized previous state that never was).
Of course the descriptivist in me would say that "reactionary" can also just mean "in reaction to" if enough people just use it that way and that people have picked up the political term through cultural osmosis and apply it to reactions in general validates this meaning, so make of that what you will.
I guess social politics are not the only field where you can have a progressive - conservative divide, each in their literal sense (note: these do IMO not cleanly map to political parties in general, i.e. you could be a conservative left or a progressive right). Thus you can be a reactionary in your field of profession in how you react to criticism.
These lists are a great way to produce a big list of strawmen that most programmers don't actually believe.
They should have picked statements that are often true or true in certain situations so that they are "false" in the sense that they are not always true, drawing zero distinction between "mostly true/situationally true" and "completely false" in a field where the answer to most questions about system design is "it depends".
Some of them definitely are more like "Falsehoods the client/marketing/sales believe about search." e.g. "Search can be added as a well performing feature to your existing product quickly." Have definitely gotten questions about "why can't you just quickly add search" based on this falsehood before.
I think the name implies some frustration with stupid technical limitations many systems have. "What do you mean my password can't contain special characters? What are you doing? What do you mean my first name can't be more than 15 characters?" Stuff like that.
You definitely can add search pretty easily and without a lot of thought and get something perfectly usable. As with anything, you can spend an arbitrary amount of time optimizing for specific use cases. It's like how complicated is a contact form? Does it have 3 fields and send an email? Or does have 30 cascading fields prefilled on user behavior and synced to a CRM?
I’ve written a few rudimentary regex-based searches engines and thought I was a badass until I encountered the need for ANDing and ORing, realizing that they are often (at least in English) interchangeable and also opposite in meaning, depending on context. That’s when you give up and use an actual search engine.
I would love a companion to these pieces that go into details. For instance, how is search different from a database? Why can't search be added to a product?
I had a draft sitting around somewhere, that I started doing just that and adding a paragraph or so for each. I got about 12 done before I moved on to something else.
For those two specific Qs you asked:
Search isn’t like a database because it isn’t ACID and shouldn’t be the point of record for data - even though lots of teams eschew this advice and use Elasticsearch or another engine as their record store…IMO content should be stored somewhere safer like a CMS, and added to the search engine for the search/discovery/recommendation use case.
Search can be plugged in to a product rather quickly, but the initial relevance is usually terrible - and it’s a beast that needs to be tamed and loved for awhile before it can make users happy.
The relations between data are different, and that relation can be defined in very different ways, with exceptions and so on. Could be weighted relations, and tweaking those relations could kill the search quality (ie. Google's ads pollution). There is not writen-on-rock way to do it right. BD's relations are much simpler and well defined.
On products, the article says it won't be easy, perform well, or give users a good impression every time. Again search quality issues.
》I would love a companion to these pieces that go into details.
They want you to give them a call, they do consultancy and the reason it's not complete.
I'm developing a next-generation search engine and the article is actually pretty light on topics, it's like search engine research got frozen around 2010. Google can't take user feedback for the recommendation without getting SEO-gamed way further, so we need a new kind of search to make it irrelevant.
I wonder if there's a corresponding list of "falsehoods users believe about how search works"
When I use a search engine, I know I am often choosing my search terms based on suppositions about how the search engine works, but if I'm honest, I really have no idea. It usually devolves to trial and error until I find results that are close to what I wanted.
My roommate's hamster had a porn addiction in his teens. That hamster really learned how to use a search engine.
Some notes I found scribbled in his cage:
1. Never use the onsite search function. It's broken, undocumented, limited. Use google or ddg with 'site:...'
2. Learn all the search operators like intitle, inurl, etc.
3. Try different search engines. Sometimes one engine happens not to have indexed what you're looking for yet
4. Search for the text in UI elements of websites. E.g if you'm looking for a movie made in a particular year, go to IMDB and look at the part of a movie page that says the year, then search for that particular string like this: 'site:imdb.com "Made in: 1996"' you can turn almost any recurring element of a website into a tag this way.
5. Most of the above tips work best if you have a specific site to search use 'site:...' So, divide and conquer. Find the site(s) that will probably contain what you want and only then search for the thing.
> That hamster really learned how to use a search engine.
Unfortunately, the things that hamster learned are no longer relevant. Using + and - on keywords no longer works with Google - your luck on other search engines will vary. Sometimes adding quotes around the keyword / keyphrase functions as +, sometimes it does not. There is no reliable way to get - behaviour anymore. And even verbatim search sometimes includes synonyms.
And allmighty help you if there is a product or popular media figure with the keyword you are searching. The search results will be flooded with links to some Marvel character that happens to share a name with the *nix daemon whose error messages you are searching.
i know the feeling. I have trouble getting results I need sometimes, while my colleague doesn't, because he types in full sentences and i try to hone on specific terms :(
You would think that really big companies would have search figured out on their own site. But Google usually works way better.
Site search gets tripped up on irrelevant false positives in some stack of fine print. Or even in a preview of a link to another page.
Or else it will totally ignore words and show partial matches high up, probably due to some terrible rank scheme trying hard to sell something.
Google's big issue for me is that it likes to show you those mirror sites that mirror reddit and github, sometimes in a way that makes it look like original content in the cache preview, but are actually boner pill ads if anyone but googlebot tries to view them.
We should start having GPT3 generate these lists or maybe a logic programming language like Prolog generate every combination of words with the relevant keywords.
Mojeek team member here. We'd take issue with "Queries for ‘C programming’ and ‘C++ programming’ will produce different results", but then we are a general web search engine and not specialists for site/commerce search. Having said that it's generally a very good list for web search too.
Faceted search is a technique that involves augmenting traditional search techniques with a faceted navigation system, allowing users to narrow down search results by applying multiple filters based on faceted classification of the items
Well, a lot of these are not only believed by programmers. Search is a surprisingly difficult feature to get right and do well, and this is often underappreciated in planning and resourcing. It’s not something you can throw one or two people at for a week and move on.
Anyway things are getting better now. More people got into search and info retrieval since I dropped this list. And there’s a great growing community out there of people who find and adore the problem space. For those who enjoy reading, I’m glad you do! For the people who don’t, … :)
Happy searchin’!