There's fairly aggressive bot activity when you run a search engine, which is why many of us are behind cloudflare or similar bot-mitigation services.
Best guess is the bots are attempting to manipulate Google or Bing's suggestion algorithms this way (since a lot of small search engines are basically just forwarding results from bigger engines). Unfortunately they can be very aggressive to the point of almost bringing down your service. I saw up to 30k searches per hour before I cried uncle and got behind cloudflare.
I do offer an API, and I suspect some others in the indie search space might as well (or might at least be willing to set one up). That way I can rate limit on a per-consumer basis without affecting everyone.
Searx and SearxNG support a few of these, but instances get blocked and unblocked left and right.
eTools is a metasearch engine that uses commercial APIs for its search providers, so it does not get blocked. However, each search is a bit expensive for eTools, so users might get blocked by eTools instead.
+1 for andisearch also. Totally different approach. Have barely seen it talked about on here. The results are dope, however. I very much appreciate marginalia as well. I applaud the programmers making new and interesting tools in this space.
Credit to them for trying some new things on the UI front, but it looks like the organic results are from Bing (like most other alt/privacy search engines). It would be interesting to learn more about how/if they plan to build their own index, or set themselves apart.
IMO, Kagi and Brave search are the two best alternative general search engines right now.
I think Mojeek is a better alternative "general" engine than Brave because Brave's ranking algorithm was optimized against Google SERPs (back when it was called Cliqz), making many SERPs too similar to Google's.
Kagi uses a mix of Teclis and other engines (claims to use Bing and Google) but the ability to adjust ranking yourself is its wild card. Neeva is similar, combining its own index with some ability to influence ranking.
But personally, I'm trying to reduce my use of "one engine to rule them all" and instead use specific engines for specific tasks they're good at.
Mojeek is excellent, and because they use 100% their own index they have a much higher hill to climb.
When I say "Best"for a general search engine, my definition is that it would fulfill the needs of myself and my non-technical family members. Kagi and Brave Search both do that while being different enough to not be just another Bing clone. I use Mojeek often, think it is great, and having their own index is a tremendous asset, but it doesn't quite meet that full definition yet.
An additional thing of interest. The text of the results such as heading and summary on andisearch is not the same as the other search engines. I think it displays more of each web page's content from its index.
Interesting. With search 'java lambda function equivalent' I see results different to Bing, Google and DuckDuckGo. Greater difference still for "elon musk latest news". My guess, they use Bing where they lack their own index.
Curious as to how these engines accept new sites into their ranks. A big problem most of us have had is spammy results outranking everything else by building large, fake networks of sites that boost each other's rankings via interlinking. Many of the higher end networks are undetectable, as they have legitimate content and never link to more than 1 other internal site (among a mix of external sites, some affiliated with still other networks).
I use Personalized PageRank giving disproportionate voting power to a bunch of people whose entire identity is their passionate dislike for the commercial web. To get a really high rank in my search engine, you need to convince a those people to link to your site.
One approach is to have human testers. When a low quality site gets high rank, you investigate in detail how that happened and downrank the linking sites.
It shouldn't be that hard to find the bad network if you're systematically investigating all the time. Google has people testing search results often.
The problem is that this is fairly expensive. But quite possibly not the largest cost a search engine would have.
Hey all - creator here. It looks like next page of results does not work currently because wrong query param (should be "q" instead of "topics"). Easy enough to manually change if you need it.
As a few of you noticed, narrow searches do not work very well because this is not a general web search engine and has a tiny index compared to Google. Use Teclis to discover more about a broader topic you are interested in and to discover writing from 'clean' websites on the web.
> As a few of you noticed, narrow searches do not work very well because this is not a general web search engine and has a tiny index compared to Google. Use Teclis to discover more about a broader topic you are interested in and to discover writing from 'clean' websites on the web.
Are you getting better results with vector search?
I've been looking at this problem with my search engine as well. I've recently side-loaded all of stackoverflow and stackexchange, and searching in that part of the index is still not great at finding narrow results like you can on bigger search engines, when that reasonably speaking should be possible.
I think, beyond the fact that my index is DIY and fairly crude, algorithms like BM25 are designed to identify topical keywords, and they do that rather well, but narrow searches go far beyond merely the topic and often involve words that aren't important to the document but are important to some particular context within it.
I may have some ideas to get around this, but they're fairly half baked. Experiments are needed.
Not OP but I am working on a search engine with vector ranking. Why do you say that vector search would help with narrow queries? In my experience, semantic search helps broaden the query to search for adjacent ideas without exact term marches.
> "Hybrid approaches that use vector search for broad matches and rerank using BM25"
Hybrid approaches, e.g. Learning To Rank, normally do it the other way around, given the main benefit of hybrid is to mitigate the cost (time) of vector search, i.e. use a non-vector search (e.g. BM25) to get a broadly relevant set of results first (and quickly), and then the much more computationally expensive vector search to rerank the smaller results set. There are various approaches to try to make vector search more viable across large corpuses, e.g. Locality Sensitive Hashing and Approximate Nearest Neighbour Search, but if you've implemented one of those than I'm not sure there'd be any benefit in retaining a hybrid approach.
> Why do you say that vector search would help with narrow queries?
I was just asking whether he'd seen better results. I haven't experimented very much with it on my search engine. It's as crude as they get, and in part I want to see how far I can push old fashioned 1970s search algorithms :P
Vector search is good for broad searches. Narrow searching is a problem of crawling, not ranking IMO. Teclis crawls a very particular and small portion of the web, which is the main reason it can not find results for more specific searches.
I was a little surprised to see Fandom.com results come up in one of my test searches, given that they are notorious for being very far from "clean" (I counted 25+ uBO blocked when checking the page in Vivaldi, which is far above the threshold of 5 mentioned on your page). Might be worth looking at in more detail.
Also, Marginalia Search link on front page is broken.
Really great work! It is exciting that people are working on alternative indexes of content, especially ones that prioritize content written by individuals for smaller audiences. The uBlock heuristic is an interesting way to capture that.
Disclaimer: We’re a research group that is also working on a new kind of search engine. Our approach is a little different though. We think that information is now scattered across different semi-open silos, so the future of search will not look like a search bar and ten blue links to web pages.
I'm also very interested in new paradigms to explore the internet. I've built a sort of explorable graph of adjacent websites based on my search engine database, was on Show HN a while back:
If you click 'similar' under any site, you get a list of its neighbors.
I think it would be neat to extend the metaphor not to just websites, but ... I dunno, something more general, links, topics, what have you. Like a browsable web of connected things. Maybe like with a bookmarking or annotation system. I think it could be super neat. Still a bit of a hand-wavy idea, but I want to build it, or someone else to build it.
Yep, graph structure of topics enables users to wander around topic space. Good for less directed, more exploratory searching.
Another graph that is useful is the graph of people -> topic clusters. See https://twitterverse.net/ . Such a graph can help rank content from people deeply invested in a particular topic, and its hard to fake because they would presumably have to trick all their peers about their expertise
We'll have our own Show HN soon but it's great to see similar ideas bouncing around. Would love to connect over email to learn more about your thoughts.
> The way detection works is we count the number of uBO blocked requests on the page, and if too many (threshold is set to 5), we kick it out, leaving only "clean" pages in the index.
I'm genuinely surprised there were any pages left to crawl.
Unfortunately this also kicks out genuinely useful blogs and other pages that are otherwise helpful but happen to be using a platform or framework that makes a few block-worthy requests.
I can't figure out if all of Wikipedia is in the removed set or just ranked too low to show up in results. On the browser, the site seems clean.
This is such a fantastic search engine. Obviously not perfect, but the search results are information rather than blogspam/ads/etc. Breath of fresh air.
Funnily enough I somehow ranked #1 for "ADHD" but I don't know what's particularly special about my landing page. Does your crawler look prioritize/crawl HN by any chance?
This is a really cool concept. Reminds me of the old(er) days when the web was a bit quieter and there wasn't an entire apparatus designed to steal your attention and focus.
This seems like a really good way to do research as well: people offering information without the expectation of getting paid for it.
This is the first time I've tried Teclis and it was a very positive experience. Always happy to see anything new in this space (search engineering?) and this seems particularly aligned with my interests.
My queries didn't get (obviously) mangled behind the scenes! Thank you for treating me with respect. Having said that, Teclis doesn't seem to treat alternate spellings of '-ise' words (e.g. normalise/normalize) as equivalent -- this is one case of auto-correction that I do appreciate in other search engines.
I just noticed the semantic search mode tip. I haven't tried it yet, but I like that it's not the default way to interpret my query.
I found it easy to find "technical" results and even (relevant) websites that I've never seen or heard of within the first ~10 hits. I wonder about the link between "non-commercial" as Teclis defines it and authentic, non-abusive, or otherwise desirable search results.
Also good:
- I didn't need to turn on javascript.
- clear info on the front page (the info itself and the fact that it's right there)
- results are actual normal links
- result snippet is normal selectable text (not a giant link)
Love the idea. The first few things I searched had very few results, and when I got into more 'mainstream' topics, I was surprised to still see Quora et al in the results (I get a "7" flag on my uBlock icon when I visit I Quora page so I'm not quite sure how that ties in the with '5' threshold mentioned on the homepage).
Does this also exclude wikipedia? One of the first queries I usually try on search is literally "test", and I usually expect a wikipedia article for testing (either as an assessment or a scientific test or a programatic test) on page 1 or 2, but here there was none.
Projects on GitHub (if it found anything, it was shitty, unmaintained forks)
Current events like the war in ukraine
Wikipedia articles
Terms found on websites I host or frequent which do not serve any form of advertisement (not indexed apparently, the hits were completely irrelevant with zero matching terms)
I just want the old Google search back. It's very frustrating when I submit a search term within quotes and I get back a bunch of pages that don't contain that term.
Beyond search results, a significant amount of googles value is in its “apps”, or whatever they call the functional snippets like calculator, translator etc. built into the engine.
I wonder if it would make sense to have cross platform plug-ins, so that all of these interesting nascent search engine efforts could automatically benefit from new plug-ins and an ecosystem could start to develop.
It’s great to have an alternative but obviously it’s such a huge effort the efficiency of development will be important.
It is not meant to search a specific term like angular documentation, but to explore a topic like angular - and for that I see very interesting results.
https://teclis.com works just fine, site was submitted using its http link (maybe mods can change it).
I hate to say it but for "best laptop" I'd rather get a typical SEO affiliate result than a 4 year old article talking about why they think MacBooks from 10 years ago were the best laptop.
The fact is "best laptop" is what's called a commerical intent query where people are looking to make a purchase. They want recent results and recent products, not informational articles
Yeah, "best laptop" seems a strange search phrase to list as the first example for this engine. "best laptop 2022" does return a couple of results, both of them at least somewhat useful (Consumer Reports and PCWorld).
> The fact is "best laptop" is what's called a commerical intent query
However, there's a place for a search engine that doesn't see it as one - there've been quite a few times when I was trying to research a topic, when the search engine assumed it was a commercial intent query and made it almost impossible to get the historical view I needed from the search.
Someone needs to create a search engine specifically for products and services. One thing I end up doing is searching places like reddit.
"cool shirts for summer" and then search places like Reddit, fashion forums, etc. basically all areas where UGC is relatively authentic. And then toggle it for "paid for blogs", like strategist, wirecutter, rtings, etc.
First page results are interesting, paging to the second page gives me a:
"A query would help :-)"
One thing I noticed playing with Teclis is that it gives useful results for 'A vs B' queries. I don't know a single other search engine that still delivers remotely useful results for this type of query.
I know it's not meant for narrow results, but I still chuckled a bit when searching my own nick surfaced an entire serp with nothing but unrelated blog posts about being dyslexic. Wonder if it's just because "drusepth" sounds similar enough to "dyslexic"?
Review sites are littered with advertising likely preventing any results from being indexed. It also doesn't help that most reviews are now in the form of video.
Does anyone know of a decent search engine that searches and shows results of only vintage type sites - you know, the ones built with the old school HTML tables or frontpage kind of stuff? Often, these are the most valuable in content, with less promotional bullshit like a random popup asking to signup for a newsletter or perhaps some dubious GDPR notice with all personal data collection toggles set to "off".
Other ones worth checking out include:
- https://search.marginalia.nu/ (A non-commercial search engine)
- https://wiby.me/ (Tends to have those really weird and cool indie sites)
- https://searchmysite.net/ (An index of personal websites)
- https://indieweb-search.jamesg.blog/ (Search IndieWeb websites)
- https://millionshort.com/ (Ignore the first million results from Google)