Hacker News new | past | comments | ask | show | jobs | submit login
Stract: Open-souce, non-profit search engine (stract.com)
437 points by FLpxpyJ 11 months ago | hide | past | favorite | 108 comments



Everyone here is complaining about the search results - but instead I think we should all take some time to appreciate that someone worked hard to create a search engine (including the scraper / crawler part) and making it open source (AGPL).

The results will be improved over time I guess, and for the few search queries I've done - I'm fairly happy with the results.

Kudos to the authors!


I'm presently on leave at a university in Croatia and my research group's name is Group for Applications and Services on Exascale Research Infrastructure, so the acronym is GASERI. [1]

The first result for gaseri on Stract is our presentation of group's research work, and the second result is our landing page.

Can't complain.

Viva open source, viva AGPL search engine!

[1] https://group.miletic.net/


I've tried with an exotic keyword of mine that I use as testing for search engines.

Worked well.


I searched for a few things that were of the class "you should have this and match DDG/Kagi/G/Y/B" and the top 2/3 were matching.

That's pretty good for a new-ish player.


I've been doing some extensive searching for a particular topic for months using primarily Google, and I just found a bunch of sites that I had not previously found just running one query on this.

I think that as with ChatGPT vs Bard, the result space is so huge, there are going to be many strength/weakness tradeoffs for any given query.


> I think we should all take some time to appreciate...

There are lots of things in the settings that I really like too: The Manage Optics (Copycats removal, Hacker News, Fediverse, more...), Site Rankings and Explore (similar sites).

I like the overall thinking behind this search engine. It feels like they are creating the kind of search engine I would spec out myself. It starts off on a strong foundation.

I do hope they crawl many of the older pages. Google, Bing (DDG), etal... seem to only index pages for the last 10-15 years or so.

Quibble: "Allow usage statistics" is on by default which leads to this:

"We primarily store the text you used for the search, and which results (if any) you clicked on."

If you regularly clear cookies or have an extension that does this: it's something to keep in mind.

While I intend to keep it _on_ to help the author(s) to help me find stuff - I hope it isn't used to once again make the popular sites - popular. In the process, bury the unpopular older sites (that have nearly completely vanished).


It is super hard to match the 'big' players. But straight programming a proper index listing is hard.


>Everyone here is complaining about the search results

It worked to find my obscure website for my company that has only been around for 3 years. I'd say that's pretty damn good.


> type google > get anything but google

??


It's an interesting case. The kneejerk reaction is "it should return google.com". But really... why? If I wanted google.com, I'd add the .com. If I wanted to search for something I wouldn't search for google first.

I guess my top candidates would be: wikipedia page about google, google stock chart, recent news about google. google.com would never be a result I want to click. The current results are not amazing, but also do we care what the results for that one are? It's like someone telling you "cow" and expecting you'll know the context of what they're thinking of at the moment. Maybe a heading like "I have no idea what you're on about, here are some clickable ideas: google news, google stock price, ..." would be the best solution?


It should absolutely return the actual thing you search for if it exists, and as the first result.

It would require an extraordinary circumstance to justify anything else, like if a thing exists, but overwhelmingly the entire net is full of some other reference that is probably what everyone is looking for.

The overwhelming majority of people typing "google" into a search are not looking for the Wikipedia page about google.


I was amazed looking at people entering the first part of the domain into the search bar, then searching and clicking the first result. That was before chrome had the integrated google search in the address bar.

I had multiple people (albeit small sample size) who went to google.com, entered the domain they wanted (e.g. news.ycomb) then clicked the first suggestion which ran the actual search and then clicked on the first hit.


While if you wanted to do so you would probably just add the .com and skip the middle man, I know of quite a few people who use search to get to sites even though they know their URLs.

That isn’t to say I hate what this site is doing, I think it is quite neat, but I do think we have to consider that there are more ways used to get to Google than just entering Google.com


Those people should be educated how to create and then (re)use bookmarks/favorites. It's a better tool than searching the same domain every day or multiple times per day.


Wonderful project, congratulations! I love the speed, clean design, many options, multilingual results, overall very impressive!!

Some quibbles/points to consider: * I can't find anything on the people/organisation behind, and can onl guess from the Terms that the team is based in DK. * Search results are broad and interesting, maybe a bit more weighting for the joint occurrence of terms would be great. * Developing a site weight over time might be interesting, maybe even with user votes. Currently minor and major sites appear all together and e.g. a search for "Donald" gives me an interesting ranking order that gives neither the most famous Donald's nor the most reliable sites firet (not problematic per se - my fault for entering an unclear search term) * There are some interesting result patterns, with often official sites quite low. For instance search for "EU" with some term like subsidy (in any of the languages I speak) gives me random project websites but nothing from any of the official EU websites, or "Microsoft 365" (sorry...) gives me no MS website. * Very minor but hopefully a very easy fix: at least on Firefox mobile there is no direct way to add the search to my search engines, I had to add it manually. For other engines I can long press.on the search field and then get the option.

Great work, keep it up! I will certainly start using this :-)


"maybe even with user votes. "

I'm so mind-blown that this does not exist yet. Free ranking feedback (live training of the algo!) + better search results for everyone. win-win


> Free ranking feedback

Free spam SEO ranking in practice. A spam site has 1000x the incentive to upvote its result than you have to downvote it. YaCy did a distributed index with filtering lists and you effectively had to keep a list of who you trust / your own filter.


Kinda solved with accounts, and/or subs, and monitoring/fraud detection? Or just turn it off for product related searches but keep it on for information related?


Have you seen any large site with user reviews? Amazon? Yelp? Any feedback on a page that matters will be gamed, because anything over what you spent to game it is pure profit. This is not a solved issue in any meaningful way.


Paid users only would probably push it in the right direction. It is unfortunate that the internet has moved into everything is advertising as a business model. I get the appeal but cognitively paying so I can keep my attention seems like a way better model.


To make Stract usable for me (slightly reduced vision), I had to apply the following custom CSS:

``` html, body, div, td, th, p, h1, h2, h3, h4, h5, b, i, strong, li, button { font-family: ui-sans-serif, sans-serif !important; webkit-font-smoothing: antialiased; font-weight: 400; text-rendering: geometricPrecision; } ```


> immensely customizable -- We aim to give you the ability to customize everything about the search. You can block sites, boost sites, prioritize links from specific sites and much, much more.

Great! Can I use more than one optic? The drop-down list seems to allow only 1.

> Oh, and if we ever become evil (maybe by changing our motto) please take our code and start a competitor.

The most important part is the index data, what would be the deal with that?


Sources: https://github.com/StractOrg/stract

Backend in Rust (axum web framework, rocksdb), frontend with Svelte.


VERY cool product. I have a quibble. I searched for "cool pokemon to use" and the top result was "How to use Paypal on Amazon" from "online-tech-tips.com". Understandable that the search results are not perfect - the second result was a perfect match for what I searched for - but anyway, clicking the "dials" icon gets me the following options:

"""Do you like results from online-tech-tips.com? (thumbs down, thumbs up, or banned emoji options) <a href='make-an-llm-do-something-stupid.com">Summarize result</a> """

IMO this feedback widget and (maybe) its backing API could use work. It's not that I like or don't like results from online-tech-tips.com; it's that they're a bad result for the specific context of this search.


The option to return from only sites popular on HN, blogroll, and the other "manage optics" settings are incredibly cool and useful, I could see myself using this just for that feature alone.

Exciting stuff.


I'm more pessimistic about how would that drive bad actors to HN polluting the site.


I take it you aren't showing dead threads? If you look at newer submissions you'll see people voting bad stuff to death. HN is insanely decent at community driven self moderation. Not to knock the mods who put a lot of work into the site of course, but I assume the community's own self-moderation helps some.


"Polluting HN" is more than just the vitriol thats flagged - there are plenty of self-promoting, downright wrong, or comments clearly unrelated to the posted article but comment author gets to segue to their hobby-horse (off-topic discussions are annoyingly frequent, IMO). HN is better than most, but its not immune to being gamed. Once there is a financial incentive for it, it will become more common - see how Twitter turned out after offering monetary incentives for engagements.


I mean, is your post off-topic? It is a tangent. Tangents are cool. You may or may not like one, that is OK.


> I mean, is your post off-topic?

My comment directly answered parent comment on an issue pertaining to search engines. How is that a tangent or remotely off-topic? This meta-discussion, on the other hand...


I think HN could cope by weighting upvotes/downvotes/flags more heavily based on the age and reputation of the user taking that action. It was distribute moderation responsibility to users a bit more.


I just set up a YaCy jail on my truenas box at home. It's a distributed p2p system.

Haven't actually used it yet since I'm currently paying for kagi and it's good, and I only just set it up yesterday.

But this just struck me, I just said 2 things there and this post is yet another, between kagi, yacy, and now stract, not just 3 different names but 3 different types of solution to a problem, and all seemingly actually viable, that have popped up recently after decades of no one really feeling like they needed anything else.

I think something is changing.


clearly labelled, contextual ads based on your current search query and a subscription option without ads

Perfect! This is the way the god of the internet intended search engines to work.

But DuckDuckGo does the same and currently provides superior results based on a very brief test.

So good luck with that.


DuckDuckGo also allows you to switch off ads, for free, without any fuss or adblocker needed. Just go to the settings page.

Although if you aren't going to support DDG with ad revenue, I'd suggest supporting with a donation if you can afford it and value their service.


I really don't mind helping DDG take advertisers for all they are worth as long as it doesn't cost me my privacy or waste too much of my time.

And if they take something away from Google in the process --- that's just an extra bonus.

Turnabout is fair play don't you think? Google has worked very hard to take privacy away from users.


What would happen to DDG if Microsoft stopped letting them use Bing? What does MS get out of this relationship?


What does MS get out of this relationship?

My guess --- a portion of the ad revenue.


DDG is amazing. To good to be viable I’m usually thinking.


DDG sucked hard enough I'm now paying for kagi.


DDG rubbed me the wrong way when they decided to "filter misinformation" which is an arbitrary and biased thing. I don't side with Russia or anything, but I've seen this sort of thing get out of hand.


It's a green flag for me when services do this because it shows that they are taking an ethical approach to what they are providing. Too many companies throw ethics out the window and punt the responsibility to the end-user. Misinformation, like anything else, can be regulated responsibly. It is not the worst situation if the media is controlled to skew towards a certain viewpoint. It is the worst situation if the service doesn't give you a slap in the face when you routinely engage in misinformation. There is never any good outcome from misinformation, while it only erodes society's ability to cohesively react to situations in the healthiest possible way.


One of my favorite topics is treated as misinformation and completely scrubbed from google and Bing. The idea seems to be the assumption that one must believe all that one reads. Imagine a discussion where everyone agrees?

Stract produced only good results. Unusually good.


> The idea seems to be the assumption that one must believe all that one reads. Imagine a discussion where everyone agrees?

So I don't know what topic you're on about but the one you replied to I do. And in that context, these two statements unfortunately do not add up as the second one is a huge stretch from the first.

The misinformation in context (Russian propaganda, probably on rt.com or something) is designed to mislead. Not inform; mislead. This really is two steps further than a discussion between two equal discussion partners who disagree. A healthy discussion like that is based on arguments which are supported by premises. Russian propaganda is based on bullshit. It tries to spread as much bullshit as possible, then see what sticks.

Defaults should be reasonable for the general public, the average user. They should be harmless. The term used for that nowadays is SFW.

You don't want NSFW content by default; a search engine should by default remove that, but leave the option open to the user to do sift through it. For example, you don't want naked ladies on your screen at work or at home (if your wife or kids are parents are watching it might turn awkward).


> but leave the option open to the user to do sift through it.

I would be okay with this, similar with how on HN you can see dead threads.


>The misinformation in context (Russian propaganda, probably on rt.com or something) is designed to mislead. Not inform; mislead.

As a 3rd party, I can assure you I feel the same way towards the American propaganda that dominates search results, social media, and more. No, this isn't whataboutism. No, you can't point at minute differences and use them to pretend there's a huge difference between you and Russia. No, you can't tell me America is better because you don't go to jail when you impotently complain about your government. You are, and always have been, a bigger threat to the world than Russia ever was, a bigger hypocrite, a more violent monster, and a far more effective propagandist.

It's some real 1984 shit and it's sad citizens of America and its vassal states (all of Europe, Canada, Australia) don't see it.


America (and really any other non-authoritarian state) is better because it allows change. Doesn't matter where we are now. What matters is that we can change it and that the systems that exist reasonably protect that ability.


I must know


Wanted to say congrats on launching! I'm building a search engine myself, I can tell a lot of work went into this.

I think the biggest thing you overlooked are page titles. When you issue a query it's a bit hard to quickly scan and judge what a site is about because the page titles are missing.


How do you crawl the web? Do you follow links around? How do you reach a page that isn't linked from anywhere you've crawled?


I'm just using common crawl for now


I mean that's what web crawling is, right? By extension, you just can't reach a page unless you stumble upon a link to it _somewhere_. Google gives you an option to submit a link and schedule a crawl that way, so that's another option if it's not being linked to from anywhere.


Fast, feels clean and uncluttered to use and the search results are fairly high quality. I like the “optic” idea.

After reading the about page, I’m not sure what the developers are trying to achieve? Perhaps a sort of alternative-Universe Google search funded by search-context AdWords?


Really like the explore feature. It lets you put in a url and shows you similar sites. Very promising project. Love to see people actually thinking about what search would be rather than rehashing decades old ideas.


Seems surprising ok for coding related queries ('celery rate limit'), I'm curious about their scraping setup, building that out must be quite a big task.


Searching for "adventure game studio", neither the website that has the forums or the GitHub repository is in the first page. Most results on the first page of search are really old things. Neither Wikipedia or repology that has the package infos are anywhere in the results.


Thought of grabbing like a big chunk of the way back machine and having THAT in the index? There’s always so much good stuff that gets nuked, and being able to search across it properly would be potentially very interesting.


Holy shit, I never thought of this before but this is an excellent idea for a search engine feature. It would work really well as one of Stract's optics!


You need engineers?


I really think the existence of this project is a Good Thing. Massive kudos to the people working on it. Previously I was always disappointed that our only options seemed to be open source meta search engines and closed source search engines, with most of the latter being corporate surveillance calitalist hellscapes, or anonymized portals to the same (Google and Bing, Duck Duck Go and Startpage), with only Kagi being an exception, although still closed source. It seemed like a really uniquely bad landscape, given that in most other areas of software there are at least some FLOSS alternatives to proprietary platforms that actually implement their core functionality, regardless of their relative quality. Stract finally changes that, which means several good things to me. First, you get actual, real accountability for them to stick to their stated privacy goals. Second, you get the ability for a wide variety of people to contribute to and influence the project, and/or learn from the project how to do this stuff themselves. And finally, most importantly, since it's a full reimplementation from indexing up, it's an opportunity to innovate on and experiment with the fundamentals, instead of just rearranging deck chairs on the titanic like e.g. Startpage. Thats really great :)


A few suggestions:

- search results seem to be somewhat case sensitive, which is a massive problem for me when for instance searching programming terms

- in general the matching algorithm seems way too strict, only matching against the exact thing you entered, which makes it very difficult to profitably search for things where you don't have exact specific terms in mind, like perhaps computer errors or genres of things. I think a lot of people's problem with how liberally Google interprets your search results is not that it interprets them liberally necessarily, but that it doesn't respect the other options it provides for trying to match things more strictly. As long as you provide something like Google's quote mechanism and actually respect it, I feel like it would be a lot better to match things more liberally by default. Maybe some amount of fuzzy searching, and matching by synonyms. Also you could probably just use a dictionary of synonyms to do that instead of whatever statistical model Google is probably using, in order to ensure more predictable results.

- as someone else somewhere in this thread mentioned, it seems like stuff like a, an, and the, are all matched against and waited equally to other words. This, especially combined with the fact that it only matches words exactly makes the search results feel way too brittle and unforgiving


Tried searching for Dota (the video game), and the game’s website is buried by a bunch of SEO spam. It might not even have been crawled because it doesn’t appear on the first or second page.


Where does their crawl come from?


That looks quite promising! Thank you for crediting tantivy in the github README, that's well appreciated! Ping me if I can help with anything.


In swagger/open API, why is everything a post?

I tried the first endpoint get suggestions and tried searching for Gemini or Gemin hoping it would at least auto complete a word but the result set is empty.

https://stract.com/beta/api/docs/#/autosuggest/route


I just set it as my default search engine for a day. It's not quite there for my use cases. Can we help improve the search results?


Optics are a great idea, something we don’t see on other engines.

Fully open source -`ღ´-

Haven’t dig in to see what’s powering the search, I think DDG uses Bing


So is this truly its own search engine / crawler / etc... and not using anyone else's searchs? I know ddg / kagi often use results from bing and other places, so just want to make sure.

also, how can I add this to my firefox search inside the address bar / search field?


> also, how can I add this to my firefox search inside the address bar / search field?

Navigate to https://stract.com/ then focus the url field: Firefox will display the new search engine at the bottom of the suggestions, on the "This time, search with:" line.


I didn't see that option on mobile, but I got it added. Click the search provider icon in the search bar, go to search settings, then manage search providers, then add new. Add it with this url: https://stract.com/search?q=%s


That was hard to discover, thanks for your explanation.

I was looking in "Firefox Settings -> Search -> Search Shortcuts" for a way to add it. I guess the functionality is not used very often, but it would be nice to have a hint on how to add new Search Engines there.


thank you, wow that's buried deep. Been using firefox for forever, and don't think I've ever noticed / seen that button. Thank you.


Search still needs some improvement, I typed "gundam watch order reddit" and was expecting some reddit links, but none of the results are reddit links. Perhaps there's another way to limit search results to a particular site here?


If you're looking for the answer to this question the "relation graph" from AniDB is probably the best thing around: https://anidb.net/anime/715/relation/graph


Normal way is "site:reddit.com"


That is Google way, rather than "normal"


I dont use google, that's how it works on ddg and kagi


The site: operator seems to work in most search engines these days.


Stract's githib page says they support site: queries


The only current search engine that I can use in my native language, Catalan, is Google. I can't wait for a project like this one to get good at that.


Congrats!

I tried to search for a particular domain data but neither search nor the explore would have the domain listed. What's the process to get unlisted domains indexed?


awhh I can see DMOZ (https://en.wikipedia.org/wiki/DMOZ) is no longer! That used to be the seed for crawling the internet I believe, for search engines.


A static version (archive) of DMOZ is still available at http://www.odp.org/


Sad to say this for a promising idea, but the search results are objectively terrible. If it wants to succeed, it needs to nail the primary use case.


It's failing (completely wrong results) my goto query for testing search engines: "best sub 10 usd Linux single board computer" Try it out


Damn Pine64 has some fun stuff happening.

Also I noticed DuckDuckGo performed much better than Google with this benchmark.


It has the same problem I experienced on Google search: When I search my projects it doesn't found it even when I use exact wording, adding github, npm, and username into search query doesn't help...


Great stuff!

Just want to mention, when I search for “ExpressLRS use uart on older f4 fcs” it gives me about 15 results, but only the first two are unique. The other 13 are a literal copy of the first, both in content and in URL. Probably best to filter for uniqueness


This is a very cool search engine. Still suprised why this was made though. I thought there was a lot of other search engines already around that werw open-source. Anyways, interesting in seeing what changes as time goes on.


Generates some pretty interesting results. No way to make it my default search engine?


On Firefox just add its search URL (https://stract.com/search?q=%s I think, it's somewhere above in this thread) to your list of search engines then set it as default. Idk about chrome


Thank you. I spent 5 minutes and could not find a way to add this URL in Firefox. I'm just saying, I like the search engine, I hope that they would like to have users. I would like to be one of those users but have no clue how to add them to my browser so until that's fixed they're basically a non-starter.


Can someone provide a bit of background how the crawling part works?


If you are interested in setting up your own non-profit org marketplace or know someone who does, I made an example one using free tools (https://donate.pcblues.com/) that costs me only about $10 USD per month to host the example because it is just a hosted linux VM and not Saas or software subscription based. I configured the VM myself and then "just" installed the software and configured it. It hasn't been down for ages. I only just remembered to check it. It does everything from merch and service websites to escrowed payment transactions, user reputation, etc.


It's just an offer to communicate, not a business.


Very impressive, and kudos to the developers and originators.

I just hope Stract doesn't go 'corporate' the way DDG did. :(


Fantastic!

This is what I've been waiting for for 10 years, since Google removed the feature: a search engine which realizes 99% of the time, I want to search Discussions, and gives me the option to only show those. (reddit, forums, mastodon etc). This cuts down the SEO crap by 99.99%.

That said, the results aren't great, hopefully it's something that improves as they index more pages. For example reddit doesn't seem to be indexed, why not? It's a goldmine of user content (even if the frontpage is 99% astroturfed US neolib propaganda).


So I have put two inquires of my local country but they didn't shown up


>how many bits are in a byte

I checked 11 pages and none of the results were relevant.


I searched instead for "size byte bits", third result has the answer. It seems like the engine gives equal weight to all words in the search, so "are", "in" and "a" throw it off.


excellent! I'm tired of search engines that optimize for natural language queries because the inevitable trade-off is that they become useless at keyword/exact queries.


I searched for "calories in 450 gm of steak" and the top 3 results were:

1. Brexit as the start of the reversal of neoliberal globalization - softpanorama.org 2. Directory Search - Fulshear-Katy Area Chamber of Commerce - chamberorganizer.com 3. The 100 Best New Products of 2020 - gearpatrol.com

And none of the Page 1 results were related to my search query...


I would give stract.org a shot tho


Supports negative prompts! Will happily switch to this if I can figure out how to add it to firefox.


this is a neat thing, i like it, i'll add it to my list of search engines i use


The search bar should really be full width.

It can be very annoying to have your query not fit it while the window has plenty of room left.


I think a lot of people will now go and benchmark queries only to report back disappointed with results.

Trying to build generalized search engine for the modern internet that will come close to Google/Bing would require a "tech megaproject" level of investment and commitment. Most likely only to end up with the same optimizations and architecture as existing big-search and the very similar level of experience.

I think it's a better direction to build a search based on more limited amount of topic-based data and focus on great match engine within, then - just aggregate the relevant ones together. Far more maintainable also on the crawling part. I can use google/bing to find the Honda dealership or read keyboard reviews, or get 50 most useful unix commands.

I also wonder if with the rise of LLMs, while it still may not be feasible in such large scale production environment, those can serve as guides/agents to also improve the query itself and not the results of the query, for example - a chat-like search where user answers shift the relevancy metrics for returned documents. This would fit perfectly for smaller but open source, customizable and thematic search.

That being said. I think it's great that project as such pop up more often. (Phind.com was also on my radar this year)


I searched for "horror movies" and the first result was a lemmy community that has literally "616 subscribers" "30 Posts" and "76 Comments" which is about as dead as you would expect from a lemmy instance.

I also searched for "league of legends", and it couldn't find its homepage.

I think its ranking algorithm may need improvement.

Edit: also, I'd rather not say this, but do we really need another DuckDuckGo? I don't think Google fails at its job because of financial incentives. I think it might fail at this job simply put because the web of 2024 isn't the web of 1990. For example, the lemmy result, it's a link aggregation about horror movie articles. The search engine could literally do the job of the link aggregator, as it has a SERP that aggregates links, and yet it's aggregating links to link aggregators. Why are the search engines doing this? Because it's 2024. I wish someone tried a new approach at this problem rather than just copying Google's design and saying "it's Google but not yucky".


[flagged]


The reason I'd rather not say it is exactly because I don't want to sound like I'm just coming in here and shitting on things.

I think there are no open source search engines because the instant you realize you have to periodically scrape the trillions of web pages in your index for updates you just give up because there is no way you can afford that without a solid business plan, which is hard to have when you are a search engine with no users, because you have no results, because you can't scrape trillions of webpages to build an index. Hence, I don't think it makes sense to try to make a general-purpose search engine, specially as Google has mastered that art and Google results look like that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: